Comments (26)
@limx0 Can you clarify what you mean by "perfect fashion"?
Edit: I looked at the zip def again.
Putting the naming issues aside, would other people find something useful where one stream will be guaranteed to be emitted while combining it with another stream?
from streamz.
I have some experiments where that is doable, especially where you know that the number of elements will be the same.
# cumulative average
a = Stream()
b = scan(add, a)
c = scan(lambda x, y: x + 1, a, start=0)
d = zip(b, c)
e = map(divide, d)
for data in data_source:
a.emit(data)
Edit: bug in scan lambda needs two args
from streamz.
Also keen to have a zip_longest
as you mentioned above, I can see that this could definitely be useful is some case.
-A-B-C-D-E---------------------------------------------F------
+
-----------------1-----------------------2-3-4-5-----------------
=
-----------------A1-B1-C1-D1-E1--------------------F5-----
What about zip_product
as a name?
Edit: Nevermind I was thinking it did more of a product-like combination. I'm not 100% sure what your stream above ^^ is doing
Edit2: Yep, zip_latest
seems appropriate
from streamz.
zip_product
works for me. I have a version of this already in the heterogeneous sibling library: https://github.com/xpdAcq/SHED/pull/34/files#diff-65d9b3aec4fc1da045d510c0bf1badf7R979
(I can change the name, but it seems to do the job. Relevant tests: https://github.com/xpdAcq/SHED/pull/34/files#diff-dfb8639c0c54113885f051bbf83937ccR640)
from streamz.
In a corollary should we have an equivalent to zip_longest
?
from streamz.
good question. I think we may want to first go back and review existing streams. Here's CombineLatest
from reactiveX:
The CombineLatest operator behaves in a similar way to Zip, but while Zip emits items only when each of the zipped source Observables have emitted a previously unzipped item, CombineLatest emits an item whenever any of the source Observables emits an item (so long as each of the source Observables has emitted at least one item). When any of the source Observables emits an item, CombineLatest combines the most recently emitted items from each of the other source Observables, using a function you provide, and emits the return value from that function.
We've modified it by adding emit_on
. I'd say that we should stick to what they use and only emit once. Basically, our result with emit_on=a
would be like that figure but with only "2A", "3D", "4D, and "5D" as a result. It would make things simpler in that we don't have to worry about buffering (and consequently backfilling).
Consequently, should we start making figures for the methods here? I can go ahead and begin. We just need to agree the medium. I generally use inkscape.
from streamz.
can you clarify about zip_longest
? let's use their figure for zip here as an example. how would the result of zip_longest
differ from their suggested output?
from streamz.
I am a very visual art challenged person (I can appreciate it, just not create it) so I'd say go ahead with whichever medium works for you, thank you!
from streamz.
hehe same, but i guess i can get something started, later this weekend hopefully, as a PR. If we end up using it, we should remember to squash commits with image changes to save space...
from streamz.
for zip_longest
we'd emit a 5D assuming that we decided on the top one being the longest
.
from streamz.
Maybe a better view:
Streams coming into a node can be "lossy" or "lossless". For most operator nodes (map
, filter
, accumulate
) this is not a problem, we only have one stream and that makes it automatically lossless. However, for multi-stream nodes (zip
, combine_latest
) which emit a tuple this becomes a problem.
-
combine_latest
is lossy, we don't buffer any of the data coming in so if we push 5 pieces of data intoa
and none intob
we will not see the first 4 assuming that the next item is fromb
(all 5 ifemit_on=a
). If we pre-loadcombine_latest
with data we will lose it. -
zip
is also lossy. If the number of items in either stream is uneven we may never see them.
We may want the ability to specify lossless streams coming into the multi-stream nodes.
For example a combine_latest
with the a
stream set to lossless will buffer the a
stream items and then when a b
comes along emit a series of ab
items which combine the new b
with all the buffered a
.
So maybe zip_longest
is congruent to a lossless combine_latest
?
from streamz.
Something like
-A-B-C-D-E---------------------------------------------F------
+
-----------------1-----------------------2-3-4-5-----------------
=
-----------------A1-B1-C1-D1-E1--------------------F5-----
from streamz.
Here is an implemented example (in the event model)
xpdAcq/shed-streaming#34
from streamz.
@ordirules you might look at tikz to make these examples, we may be able to make them with streams (which would mean that if/when we update the code the examples will also be updated) and then insert them into the docs.
from streamz.
thanks, I used to use asymptote which I really like but it's probably out of date :-D. I'll take a look at this later. LaTEX is always a plus!
from streamz.
Note that combine_latest
without an emit on is lossless, since every piece of data that comes through will come out at some point.
from streamz.
@CJ-Wright doesn't the name combine_latest
imply a lossy function? I read that to be "you will always have the most recent data from each source", without making guarantees that you will see every piece of data from every source. My thoughts are we should use a different function name if you want to have a lossless version?
from streamz.
That's fair. I'm rather bad and at naming things, any suggestions?
Edit: Although I didn't think that zip
also implied a lossy nature.
from streamz.
I agree on that - I think zip
implies a lossless function that waits for all inputs before emitting a result (as we would expect with the builtin zip.
Edit: (General rumblings unrelated to topic)
TBH, in practice I'm not really sure what sort of real-time system is likely to emit values in such a perfect fashion? I would never use zip personally because I would always be worried about one of my data sources breaking and throwing out the entire stream.
Something like
-a-b-c--
-1-exception-3--
is going to yield 1a, b3, which (to me) is unexpected and could throw out sense of order in your system
from streamz.
By perfect fashion I mean; having multiple sources of data that emit elements at the (almost) exact same time and without fail in a way that your zip function continues to emit as intended.
This is more of a practical issue than a library/theory note. What I mean is while zip
works fantastically well in the simple test case of
a = Stream()
b = Stream()
c = s.zip(a, b)
a.emit(1)
b.emit('a')
a.emit(2)
b.emit('b')
But in the wild, where your a
and b
streams are some external data sources that are not perfectly reliable or emit data at the exact same time, taking zip
at face-value may lead to unintended consequences and trip some people up.
Edit: This is more of a caution for using external sources, if you have control over all of your inputs it's probably fine
from streamz.
Yep, I'm sure they would. My general thought is leave combine_on_latest
and zip
exactly as they are. In my opinion they work as expected and I wouldn't like to see them changed
from streamz.
Maybe I should make a hybrid zip_latest
node.
from streamz.
@mrocklin @ordirules Thoughts on a name?
from streamz.
Just catching up. Actually, I think maybe if it's just for the use case you mentioned earlier with the diagram, it could be combined_latest(wait_first=True)
or something. So:
s = Stream()
s2 = Stream()
s3 = s.combine_latest(s2, wait_first=True)
s3.map(print)
would give:
s1 : A--B------------C---------D-
s2 : -------1-----2---------3--4---
s3 : ------A1B1----C2-------D4-
as opposed to:
s1 : A--B------------C---------D-
s2 : -------1-----2---------3--4---
s3 (no wait) : ------b1--------C2-------D4-
Which is just as you mentioned, combine_latest
, but with some buffering.
I may have missed something. If yes, could you provide a use case and schematic?
from streamz.
from streamz.
I don't have too much preference one way or another (between creating a separate node and putting a flag on combine_latest
). Just let me know what the consensus is and I'll go with that (unless BDFL wants to weigh in one side or the other).
A name consensus would be nice too!
(side note I wish that github supported voting for this kind of things)
from streamz.
Related Issues (20)
- missing positional argument: 'topic' in to_mqtt HOT 5
- How to parametrize stream/pipeline creation? HOT 2
- Passing Username and Password to from_mqtt() HOT 4
- Dynamically add upstreams to zip HOT 2
- Dropping `pkg_resources`
- AttributeError: 'Output' object has no attribute '_ipython_display_' HOT 3
- flatten doesn't work with iterables without defined length HOT 2
- visualizing streams and changing variables during runtime HOT 5
- Is it possible to use event time/syntethic time rather than system time? HOT 2
- Time based lookback window? HOT 1
- Streamz not working in Jupyterlite
- Collect does not allow awaitable sinks
- Quickstart lacks conda/environments/streamz_dev.yml HOT 6
- Hello from LorryStream and Kotori / Streamz is cool HOT 1
- Add pytest fixture to clean up the IO loop HOT 4
- Streamz hvplot resets zoom and pan on each update HOT 3
- streamz's typing system can't work properly in vscode HOT 1
- Streamz with websocket not steaming any data HOT 5
- Parallel streams with buffers HOT 3
- Is streamz not maintained anymore? What happened to cuStreamz? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streamz.