GithubHelp home page GithubHelp logo

Parallel streams with buffers about streamz HOT 3 OPEN

john-jam avatar john-jam commented on September 26, 2024
Parallel streams with buffers

from streamz.

Comments (3)

martindurant avatar martindurant commented on September 26, 2024

Streamz is not a parallelism framework, but it can be concurrent ("async") for tasks that spend most of their time waiting. If your functions were async and you used asyncio.wait, they would have a shorter total run time. However, whether you can actually get parallelism depends on exactly what calls you make, and mixing CPU with IO is always tricky. Often a separate worker thread would end up running CPU loads (but python's GIL means you still might not get parallelism).

Dask can well be the parallelism engine for you, and it has various cluster topologies you can set up. From streamz's point of view, dask is a handy way to hand of mini-batches of events; but it could also be long-running tasks like downloads, in theory. In fact, if download/process is all you are doing, you can just use dask without srteamz (the delayed or client.submit patterns).

Note that no one is developing streamz these days, but I believe it can do what you want, if you have the interest to dig in.

from streamz.

john-jam avatar john-jam commented on September 26, 2024

@martindurant Thanks for your prompt and useful answer!

When you indicate to use async methods, does streamz support this? When I try, it indicates that my download_file method was never awaited. Or maybe you were indicating to use async methods and asyncio.wait outside of streamz?

Anyway, I guess Dask can do what I want as you mentioned but I need some sets of tasks (e.g. download + process) to be executed on the same dask worker since it can be a different machine (different fs).

I didn't catch the last commit date, but streamz still looks useful! Thanks

from streamz.

martindurant avatar martindurant commented on September 26, 2024

When you indicate to use async methods, does streamz support this?

Yes, there should be some examples of this.

Dask can do what I want as you mentioned but I need some sets of tasks (e.g. download + process) to be executed on the same dask worker

There are various ways to do this kind of thing, but having shared storage is a useful thing for a cluster. I'm not immediately sure how you would phrase "download X can happen anywhere, but processing must happen where its associated download happened".

from streamz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.