Project

General

Profile

multi-message messages

Added by tve almost 3 years ago

Something I’m seeing in the code that, to be brutally honest, I dislike a lot is gadgets sending “compound” messages as multiple messages. For example, the database responds with two messages. The RF12demo gadget sends message pairs, etc. That goes smack against all the flow stuff, is inefficient (multiple goroutine context switches unless the channels have buffering) and leads to awful receiver code. Why are you not using more complex types? It seems to me that an extension of the Tag type would be very helpful and would clean things up considerably. Something like:

type Map map[string] interface{}
func (*Map) GetString(tag string) string {
  return map[tag].(string);
}

This is basically the equivalent of passing JSON around, but instead it’s just a map with string keys. As an example, the database Get operation could expect an input with
{ operation: "get", key: "fooey" } and outputs { operation: "get", key: "fooey", value: "baz" }


Replies (22)

RE: multi-message messages - Added by jcw almost 3 years ago

It may not be totally avoidable: the TimeStamp gadget inserts timestamps before each message. So there is an ordering in the stream which already matters, and since decoders shouldn’t care about whether the stream has timestamps or not, they are set up to act only on what they care about, and just pass the rest through. If I had added a “timestamped message” as type, then decoders would have been less flexible.

In concatenative languages (Forth, Joy, Factor), there’s a similar issue: do you put stuff on the stack as is, or do you “flip” things into vectors or other structures. I think in that context, what you see is a frequent transition between different forms, to flatten cq group stuff as it passes through the chain. Same for vector languages, APL, J, etc: shape often changes all the time.

I suspect that we’ll see something similar here: some generic flattening/grouping gadgets to reshape the message stream as it flows through the system. In FBP, there are begin/end markers (I don’t recall the exact term), with some processes (gadgets) having stacks internally to collect cq unwrap state. I think they call it bracketing. Bit like a token stream with {/}’s in C, or open/close elements in HTML. Sometimes you need context, sometimes you’re scanning for stuff and want to ignore the structure.

I’m not opposed to map[string]interface{} (or even Tag{"...",[]interface{}}, which is really a tagged vector), but I’m trying to keep stuff as flat as possible for the moment, just to see how far it goes. My suggestion would be to try things, and perhaps create some gadgets which go from one form to another, just to find out which structures help simplify gadgets and which don’t.

Maybe the trick is to look at the false hits, i.e. if there were no structure at all, which gadgets would get in trouble? There is clearly a need to detect either shape or datatype, since otherwise every message could only be treated in the same way.

RE: multi-message messages - Added by jcw almost 3 years ago

> As an example, the database Get operation could expect an input with { operation: “get”, key: “fooey” } and outputs { operation: “get”, key: “fooey”, value: “baz” }

Why not try it? Just create a gadget which takes that type of input and place it in front of the current LevelDB gadget, feeding it what it currently wants. Then you can evaluate whether the creation/use of such structures simplifies the code w.r.t. senders and receivers.

RE: multi-message messages - Added by jcw almost 3 years ago

Note to self: one way to look at gadgets is as functions or “operators” (making the association with concatenative languages even clearer). Composition is of course simply a pipeline.

RE: multi-message messages - Added by tve almost 3 years ago

> It may not be totally avoidable: the TimeStamp gadget inserts timestamps before each message.

Mhh, if you declared that all messages are of type map[string]interface{} that could also have some very nice applications. The timestamp could be added and downstream gadgets don’t need to know to ignore it. And you could easily piggyback context information, for example, when handing the database a get request I could add arbitrary context info into the map that would be carried along in the output so the receiver doesn’t just get the database result, it also gets the context. Or I could add a “provenance” element to messages, feed them through a whole circuit and then associate the outputs with the original provenance.

RE: multi-message messages - Added by jcw almost 3 years ago

Ok, so what you’re saying is: make all messages rich (and indeed as you say: JSON-object-like). What I’ve been doing so far is: make the flow a linear stream and single out what you need, using order (and probably even counts/distances) to identify certain pieces. With the Tag as one exception, to support an infinite number of “out-of-band” message escapes.

There are no doubt trade-offs in performance (# of messages and channel buffering), but it’s not as clear cut as it may seem: a “tag” can remain in force until changed, and imply nesting, whereas with rich messages, that tag may have to be include in each one.

Also reminds me of lexing vs parsing, and SAX vs XSLT.

I think it’s worth exploring both avenues. This comes down to the granularity of it all, something that also comes back when deciding how to carve up the database key/value store.

Maybe we’ll find out that both directions have good uses, each with their own set of idioms and utility gadgets? They can definitely coexist, IMO.

RE: multi-message messages - Added by tve almost 3 years ago

I hear what you write. Here’s another example: your JeeBoot gadget is tied to a single RF group and to a specific message format understood by the RF12demo sketch. I have a different message format because it’s “pre-digested” by my RF-UDP gateway and in order to feed it to the JeeBoot gadget I basically have to emulate the demo sketch and I have to wrap an RF group mux/demux around it. If you had a “richer” message format with separate values for message type, source node, dest node, group id, and payload most likely your JeeBoot gadget would be more reusable.

RE: multi-message messages - Added by tve almost 3 years ago

I’m probably just not seeing the beauty of this. I’m trying to write a UDP gadget that mimicks the serial+rf12demo gadgets. I just spent 10 minutes trying to understand what the RF12demo gadget does. Probably an indication of me being dense ;-). But basically I have to reverse engineer the code, reverse engineer the state it’s building up and/or producing. Looking down the pipeline, the Readings gadget is another example where there is a pretty complex state machine at work that has certain assumptions about what comes in and in which order. Tracing which part of the state gets cleared when and which doesn’t and why takes a bit of work (location gets cleared when receiving , other gets cleared when sending a reading, asof, , and never get cleared). Looking at the RF12demo gadget on might conclude that starts a “sequence” but that’s the not really the case since Reading will accumulate “other” messages from before and glom them into the reading. Ah, at least from Reading’s perspective the tag ends a sequence, it seems. The logic of all this still escapes me, I will have to read more code to discover that…
It strikes me that the dispatcher would also be much simpler with “rich” messages: there would be none of this waiting for outputs to drain using a marker.
Oh well, back to trying to make the UDP gadget produce message sequences that make it look like RF12demo…

RE: multi-message messages - Added by jcw almost 3 years ago

> I hear what you write.

Same here :)

I understand that aggregation helps by grouping everything as data structures in instead of items.

Will need to ponder on this. Structures will probably lead to another set of trade-offs, but I’m not committed to any single choice at this stage.

RE: multi-message messages - Added by jcw almost 3 years ago

I’m very ambivalent about this. I can see the benefit of structured messages (one flat level, at least), but it really trickles down to everything if every gadget needs to deal with map[string]interface{} all the time. The input of ReadFileText is a string, with maps we need to introduce a key name, the input of CalcCrc16 is a []byte slice, again: which data item should it operate on? The input of Timer is a duration (as parsed string), we need to wrap it in a map just to get that value in. Etc.

For CalcCrc16, we could decide that it always operates on the “bytes” entry of the map. Or even have it look for a “payload” entry, and use its string value as the entry name to use, i.e. indirection. But that sure complicates things. What if I want to CRC different parts of a message? (bit contrived as example, but I’m sure there are other more useful cases).

One advantage of maps is that it simplifies the data model a bit: currently, gadgets recognise input based on datatype/structure, or tags, or both. When everything is a map, there can be conventions that the “bytes” entry is always a []byte, for example. It’s a bit like defining static structs, except that any item may be present or absent. If present, the type would have to match to whatever it is documented as (but that could be interface{}).

The current flow.Tag is sort of a single-entry map. It’s fairly important in the current design, in that it adds an out-of-band escape facility for messages. It could have been flattened to type Tag string, followed by whatever is tagged, but that would have complicated the gadgets, I think.

One thing with maps is mutability. If gadgets on a pipeline start adding and modifying entries in these maps, then that’s probably ok. Once there is a fanout though, and the messages go through multiple gadgets, there will be an issue of how to prevent nasty side-effects. One solution would be to add a map-copying-gadget, which creates a new shallow copy of the map.

So where do we take this? I’m really not convinced yet that having everything as maps makes life simpler.
Yes - in some of the circuits, there are probably some fairly well-defined data structures flowing through these “wires”.

One solution I can think of, would be to use some sort of assembly/disassembly mechanism. So in addition to this (using CRC as example):

for m := range g.In {
    if bytes, ok := m.([]byte); ok {
        m = ... bytes ...
    }
    g.Out.Send(m)
}

We could perhaps implement something like:

m := ???
for m.Assemble(g.In) {
    if bytes, ok := m.Get("bytes").([]byte); ok {
        result := ... bytes ...
        m.Set("bytes", result)
    }
    m.Send(g.Out)
}

With “m” using an “Assembler” of some sort. That would do a bunch of channel reads, collecting state internally, and return the combined state as an object, presumably a map[string]interface{}, but it could be anything, really. One question is where to obtain this assembler from - a separate registry?

IOW, can we come up with a mechanism which easily lets us deal with numerous kinds of message structures?

Flow has a “Transformer” gadget, which hasn’t really been fleshed out much yet. The idea being that any function can act as gadget too, taking inputs and producing outputs. It doesn’t have control over flow, it works as 1:1 message processor.

Maybe there’s a more sophisticated way to define transformer-like gadgets which only deal with message stream shape and act as “impedance matchers” between other gadgets? If we get this sort of mechanism right, we could even flip the design to use maps most of the time, yet support “flatter” streams when needed. I just don’t have enough insight yet to decide what approach will turn out to be the most convenient (and with sufficient performance, of course).

One of the design ideas planned for later, is to introduce typed channels. So that a gadget would define a “<-chan string” for an input which really has to be a string, and thus avoid having to do the conversion in all such gadgets. Flow would then need to transparently insert whatever converter / transformer / assembler is needed to connect some output to such a typed input. It would also be of use in the diagram editor, to know which wires can be connected, and which ones are impossible.

RE: multi-message messages - Added by jcw almost 3 years ago

As I ponder on all this, I’m less and less inclined to make any sweeping changes to the basic flow.Message type sent across channels.

The current interface{} type is very very flexible in combination with flow.Tags. It’s also a superset of using maps.

Performance is no issue with HouseMon (the Odroid has been running for almost two weeks, with 15 minutes total CPU usage after a million messages over MQTT). Even with Tosqa, I can now generate step pulses as Flow messages at the rate of nearly 1000 per second without serious problems (which is a bit crazy, but I wanted to see how far the “real-time” aspects of JeeBus could be taken).

But what we could do, is consider using maps across the RF12demo > decoder> readings -> database pipeline.
It might simplify things. IOW, no changes at the Flow level, just stream mods which affect a bunch of gadgets in JB and HM.

The power of all this dataflow stuff still amazes me. This is truly generic plumbing. We just need to harness it well :)

RE: multi-message messages - Added by tve almost 3 years ago

You mentioned 1:1 wires vs. N:M. One of the things to consider is that the fan-in part is more tricky with the message sequences than with message structs. For example, you can’t easily feed multiple rf pipelines into a Readings gadget because if the messages from multiple upstreams get interleaved you end up with garbage. And this type of bug can be very difficult to spot and reproduce. Instead you have to build smart stream merger gadgets that know about the message boundaries and hold up one stream when the other is mid-sequence.

WRT CalcCrc16 being simple and becoming more complicated if every message was a struct, I’m not sure what to say. Optimizing for these super-simple cases doesn’t seem like the smartest thing. I actually don’t quite see the value of making CalcCrc16 a gadget. It seems to me that collapsing the pipeline that reads Intel hex files into a single gadget (or two with the raw file reader being separate) would make the code simpler and easier to understand. Is a CalcCrc16 gadget really that much more reusable than a CalcCrc16 function?

What I find to be the most difficult part is to understand what the inputs and outputs of a gadget are. Without looking at all other gadgets that I see used in front of and after a given gadget I typically can’t figure it out. There are so many hidden assumptions. I’ve been wondering whether one could specify a validation for the input stream to help the matter and make things more robust. An input stream could probably be specified using a regular expression, but it’s not obvious that a RE is actually readable in the end. Like, can you write down what the Readings gadget expects as input and what it produces as output? Assuming you haven’t looked at it for a few days, try to do it by only looking at the Readings code itself.

RE: multi-message messages - Added by jcw almost 3 years ago

Which is why I said:

> But what we could do, is consider using maps across the RF12demo > decoder> readings > database pipeline.
I agree that the Readings gadget is cleaning up the mess created before it :)
But I don’t want to force map[string]interface{} upon every part of a system based on Flow.
The current design is more general, and includes the ability to use maps wherever we prefer to do so.
Yeah, CalcCrc16 was pushing it. But it’s almost as much work to write either way. I probably would aggregate at a higher level now, but even when I wrote a little G-code parsing setup for Tosqa, I ended up splitting the code in 4 gadgets. It gives me a lot more options to mess with things at different stages, or to replace parts with more advanced versions later.
As for being hard to understand: aren’t you being a bit too harsh at this stage? This is development code, still greatly in flux as I look for the right balance in the design of it all
while at the same time having an already-working setup…

RE: multi-message messages - Added by lightbulb almost 3 years ago

Well,

FWIW, I’m generally a supporter of the Unix philosophy of one tool for one job, instead of enhance and bend for another task (making tool more complicated), build a new tool (out of existing tools).
I like the FBP concept, as most definitely makes you think this way.
In this regard CalcCrc16 is following that mantra! It does mean that it ‘could’ be used somewhere else (but so could a package offering same as func impl)

As for the hard to understand… well, I struggle also, but I’m starting to get a feel, and once you have a small ‘circuit’ working, its easy to integrate, although the last sub.Out pub.In issue was hard to fathom.
I like the json approach actually, its just the cause/effect is not so straight fwd. I also think we need a ‘message trace’ tool, but this means a modification to flow, and would probably slow down a production system too much.

I think using maps for the decoder output to readings maybe good, as this is the only way at the moment that I can see a nice clean way to create multiple (aliased) decoder outputs from a single source node.
Right now, I am recycling a fake message back in my circuit as if it came from a real nodes packet (group0, node N) and putting in back in to a ‘pipeline’ style component that feeds rf12toDatabase. I’m sure there is a better way, and I’ll discover it shortly, but any pointers would help?

RE: multi-message messages - Added by jcw almost 3 years ago

One more note on CalcCrc16: it’s not really about the call, but what happens between different calls and how they are coupled together. In the case of IntelHexToBin, there is a []byte output, which needs to be passed along as well as fed to the CRC calculation. This can either be done with a rich object such as a map, storing the bytes next to the CRC, or as I did it, by passing things sequentially and injecting an extra tagged attribute. Functionally, hex and crc’s are fairly unrelated operations - so yes, I still think the split into two gadgets makes sense.

The ability to write silly, small, and dumb gadgets, combined with optional rich typing and very good in-process performance, really feels right to me. I’d actually love to explore using this same mechanism for a vectorised relational algebra engine - but not this year :)

Let’s not dwell on this. Let’s figure out how to make things simpler for HouseMon by having a map come out of the rf12demo gadget, for example.

I’m re-reading Paul Morrison’s FBP book - very useful, now that I’ve gone through an implementation myself. It’s oddly archaic in some sense, in that it deals with text as if it was still a stack of punch cards, but when you ignore such representation details, many of the topics discussed here appear to come up there as well. He’s mentioning “Descriptors”, which seems to be similar to keys in a map (or attributes in a JSON object).

RE: multi-message messages - Added by tve almost 3 years ago

I just spent a long time debugging the dispatcher to end in an impasse. Here’s what I’m trying to do: I’m trying to dispatch on the first byte of the message payload which I’m using as a message type. So instead of dispatching on the node type using the NodeMap I’m replacing the NodeMap by a MessageMap. Seems like a simple swap of one lookup module for another with everything else the same. Doesn’t work. The NodeMap maps node_id->decoder and the node_id is the first message that travels down the wire. The MessageMap maps the first byte of a message to a decoder but the message contents is basically the last message that travels down the wire. The result is that the dispatcher makes a mess. Dunno how to fix all this without rewriting a slew of gadgets… Seems like this could be handled more easily if messages were structured.

RE: multi-message messages - Added by ohweh almost 3 years ago

I’m facing the same issues. Right now, all my sensors are grouped by function, e.g. all RoomNodes use 11 as Node ID, all Infrared Nodes use 12, EC3K nodes use 22, PCA 301 adapters use 24, etc. pp. Basic idea is to save address space as much as possible while still keeping and sharing them within the same RF12 group. So to distinguish between the nodes (within each type), each one uses it’s own sub address (1-3 bytes depending on the node type), and this “sub” address is part of the payload itself. I.e. similar to JCWs shared Radio Blip nodes. I’ve got it working with the current code HM base, but not without hardcoding a lot at places that shouldn’t be touched and will definitely hinder me from pulling further updates.

There’s a second thread from light bulb going into the same direction. Not regarding “sub addressing”, but global addressing instead (i.e. same groups used on different bands, there’s a strong demand for location awareness). I even want to extend that: Imagine a wider area that needs to be covered by RF12, you might want (or even need) to use more than one receiver. Or have relayers in place, i.e. same information is coming in from different receivers and/or protocols. As soon as you need to reply back, it’s not enough to know the sensor type from a node map, instead you need to know the path the incoming packet went along.

IMHO: The addressing scheme needs to be extended to be future save

  • receiver name needs to be carried along
  • subaddress as part of the scheme (including strings… just imagine you’d like to send the name of a tv channel along as “sub address”, or an IP address, or an “agent” name)
  • strong separation of location and device name
  • allow to specify one or more “device alias” for a given “device name”. Using one (or more) “device alias(es)” pointing to the same device name within the web frontend, but still keeping the original device name for DB storage. This way it’ll be possible to show up the same device using different names (i.e. once as light switch, and once as current measuring device), but keep it unique within the database during lifetime. Even after renaming it.

/ow

P.S.: Let me add a specific example: the OokRelay Gadget. In my environment it is receiving data from a barometer plug in the living room as well as KSX 300 on the roof. Without modification, the barometer plug and the KSX both deliver “temperature” as sensor data. And both of them use the same location, which is “OokRelay”. And therefore temperature of both sensors overide themselves. Unless modifying the gadgets to call w.Out.Send(flow.Tag{“”,“xxxxxx”}) with their correct and hardcoded locations. Working, but not that nice… And now imagine you’re receiving data from or feeding data to multiple FS20 devices? you can’t properly assign them a location (e.g. light switch dark room :-)

RE: multi-message messages - Added by lightbulb almost 3 years ago

@ohweh,

Yes - I feel your pain….

FWIW, I have a working system here on my test rig - you have probably read my other posts about this issue (location awareness etc).

I have resolved the multi-message/single node issue (MMSN) (where a node send multiple packets that must be treated as if they came from separate ‘entities’) , and the location awareness issue (multiple networks addressable by separate locations) by mostly revamping the ‘dispatch’ concept, and extending/enhancing the nodeMap concept. Give me a few more days trialing and I’ll put up a separate repo, but this is starting to get a little hectic as I feel like i’m already diverging from HM-Core, rather like I did with HM0.7…..I think the MMSN could be made into a pull on jcw’s master rep, or at least a branch, but the location stuff has taken me far away from HM-Core.

—lightbulb

RE: multi-message messages - Added by ohweh almost 3 years ago

@lightbulb: sure, I’ve read all your recent posts and am also aware of your fork on github. The latter helped me a lot to start going, I’m already using some of your code for my own purposes as they’re good examples to start with. And I’m curious what’ll be next :)

@jcw: Don’t get me wrong, IMHO the overall design of HM0.9 looks very promising and I have some believe that good design decisions have been made. Even in this very early development stage (with its inevitable bugs, flaws and lack of documentation), it’s already possible to get basic things working, that’s great. But I’ve also got the impression, that we’re facing some very basic routing/addressing issues regarding the message envelope. The sooner these issues get addressed, the better, as less code has to be changed afterwards.

Cheers
ow

RE: multi-message messages - Added by tve almost 3 years ago

Quick status update. I ended up writing my own gadgets based on a PacketMap type, which is a map[string]interface{}, and which provides convenience functions for accessing various common types. For example, to access the group field of a PacketMap message as an int I write message.Int("group") and that auto-converts the field to an int if it’s a float and produces an error if auto-conversion doesn’t make sense. I’ve also written my own dispatcher which dispatches based on one field of a PacketMap, so it takes the field name as an input. use that to first dispatch based on the sketch type and then based on the first byte in the message, which I use as a “sketch module” discriminator. This way I can write decoders for sketches or for modules (that I include in many sketches).

I didn’t extend the existing dispatcher or other types although I could have. I wasn’t sure anyone else is interested in my version. Happy to produce a pull request if desired. The PacketMap can be found at https://github.com/tve/flow/blob/master/packet\_map.go, including the dispatcher, and my gadgets at https://github.com/tve/widuino/tree/master/gadgets

Overall, while I find the concepts of flow elegant and nice, I find debugging hell. When an error occurs the stack backtrace is of virtually no use and when the right messages don’t arrive at the far end I have to add tons of logging to see where things gets mis-recognized or otherwise not processed the way I thought.

RE: multi-message messages - Added by lightbulb almost 3 years ago

@tve,

What debugger are you using?

I have used both GDB and CGDB, my IDE (IntelliJ) has inbuilt support for Go debugging, that whilst basic (gdb based) is still better than printf etc.

If I need more control/detail, I use CGDB.

make sure you are using a ‘recent’ gdb build, especially if on darwin.

cgdb works quite well on RPI/BBB(Arm) although mostly they are targets for me and not dev hosts.

—lightbulb

RE: multi-message messages - Added by jcw almost 3 years ago

Yes, there’s a painful compromise between passing all data around as a long sequence of dumb/generic types or passing around compound structures. With the former, you end up keeping track of the current context much of the time (and there may be performance issues). With the latter, you may need to constantly massage the data to “fit” a specific gadget, although proper design might help avoid that most of the time.

FWIW, I added a “flow.Map” type a while back to support what seems to be the equivalent of @tve’s PacketMap. No utility functions yet, but we can probably unify and merge the two designs - i.e. I could add the interface functions to Flow.

There are several other issues which need to be addressed IMO, some of them might be quite complex:

  • dynamic creation and insertion of gadgets in a circuit with the Dispatcher is messy and needs a better mechanism
  • related: there should be a way for gadgets to introspect (and hopefully also alter) their circuit’s structure
  • need to find a clean way to expose access to shared resources, such as the database
  • better debugging, perhaps an easy way to watch what’s going on in any wire, in the proper time order
  • more convenient circuit choice/setup/format, as illustrated by @lightbulb’s “cirget” tool

Note: I expect day-to-day usage to change a lot once there is a working visual circuit editor, but that’s still some way off (and partly a chicken-and-egg problem, since that editor is built using flow as well).

I’ll add some issues on the issue tracker to make sure these action items don’t sink to the bottom, as these forum posts inevitably do over time.

RE: multi-message messages - Added by tve almost 3 years ago

@lightbulb, thanks for reminding me of gdb, I’ll have to try that again and see whether it helps at all. I mostly wanted to see what happens to a packet as it moves through the circuit and since that’s not sequential execution I’m not sure that gdb helps much.

@jcw, I didn’t remember the Map type in jeebus or I would have tried to use it. However, the PacketMapDispatcher has to be in flow and Map is in jeebus, so I would have run into a problem for sure.

The thing that pushed me over the edge to rewrite things as Map packets is that with message sequences it’s not just the content that matters but also the ordering. You don’t just need to know that some gadget requires a ‘node’ and a ‘SketchType’ message as input and produces a ‘Reading’ message as output, you also have to know in which order they have to appear. If I remember correctly I got stuck on the dispatcher where I wanted to write a gadget that, for certain nodes, dispatches on the content of the packet. Well, the node message comes first, but the content comes later, and the tag has to come first. So I would have had to buffer up a whole message sequence and then re-emit once I got the raw packet content. That seemed very unappealing to me.

    (1-22/22)