stuck flows

Added by tve about 3 years ago

I’m trying to do some development on the angular front of housemon and I’m having problems with stuck flows. Once I reload the browser (for example by modifying a coffee file in devmode) the fact that the websocket connection gets closed causes things in flow to back up. Somehow closed connections don’t result in properly removed gadgets. Sometimes the problem starts with:

> E0524 15:43:44.062478 28597 flow.go:103] ***** PANIC: runtime error: send on closed channel
> E0524 15:43:44.062557 28597 flow.go:113] /home/src/golang/src/ flow.(*Gadget).sendTo()
> /home/src/golang/src/ flow.(*Gadget).sendTo()
> E0524 15:43:44.062608 28597 flow.go:113] /home/src/golang/src/ flow.(*wire).Send()
> /home/src/golang/src/ flow.(*wire).Send()
> E0524 15:43:44.062648 28597 flow.go:113] /home/src/golang/src/ network.(*streamRpcResults).Run()

sometimes with:

> E0524 15:44:03.753161 28597 gadget.go:175] send timed out y {/reading/RFg212i14 map[ms:1400971433587 val:map[temp:11] loc:desk typ:OwTemp id:RFg212i14]}

This type of stuff isn’t easy to locate as far as I can tell. Also, it would be really nice to have a more robust architecture where a failure on the display-client end doesn’t affect data collection.

Replies (4)

RE: stuck flows - Added by jcw about 3 years ago

Agreed - I’ve run into this several times as well. Cleanup isn’t quite right, and that comes up on each lost websocket connection.
Tricky to debug indeed (else it’d have been fixed right away). I’ll need to write more extensive tests to get to the bottom of this.

RE: stuck flows - Added by tve about 3 years ago

The problem is in\#L98 This should be properly wired in like in Dispatcher and it shouldn’t just borrow the enclosing circuit’s output wire…

But the bigger problem is that there’s no way to terminate a gadget from its output. In the RPC case, once an Attach command has been started there’s no way to tell it to stop, it just keeps outputting changes until it blocks or fails on the send. This has a domino effect because puts and subscriptions are locked together in publishChange in database.go. This seems like a really bad idea because it means that if there is one slow consumer connected to a subscription it dictates the rate at which data can be inserted into the database. IMHO, there has to be some decoupling and slow consumers need to receive a disconnect if they fall too far behind.

RE: stuck flows - Added by tve about 3 years ago

After looking at the problem for a while I see two options. One is to add an Abort() method to Circuitry and call that when there is a fatal error outputting stuff. The Abort() method would then go through all gadgets and close all input channels. But there are gadgets, like the DataSub one, that do not sit in a for m := range g.In { ... } loop. So these gadgets will continue spinning.
The alternative I see is to return an error in Send() on an ouput. Instead of blindly writing g.Out.Send() gadgets would then be responsible for doing something like if g.Out.Send() != nil { return }. I think I’m going to go down this route and see what happens…
A third alternative is to panic in Send(), but that’s a Go anti-pattern as far as I know.

RE: stuck flows - Added by tve about 3 years ago

I created an issue for this: