GithubHelp home page GithubHelp logo

Comments (5)

bartvm avatar bartvm commented on June 18, 2024

I think this is a great idea, and something I've been thinking about as well. The main thing I ran into: What kind of interprocess communication to use that is fast enough not to turn into a bottleneck as well as user-friendly. I haven't worked with a lot of them, but I guess that options are:

  • Sockets
  • (Named) pipes
  • Shared memory

Then there's a loong list of libraries that we could use to manage any of these. It seems silly to program all of this from scratch; it must be a problem that has been solved a thousand times.

So my fist suggestion that I think is worth looking into is limiting the server to transferring NumPy arrays, and using ZeroMQ (PyZMQ) for transferring the data. A few reasons:

  • Although sockets are theoretically not as fast as shared memory, they're a lot more user-friendly
  • Using sockets could potentially allow (eventually) for multiple clients, eventually allowing us to e.g. pass data to multiple clients on a cluster
  • ZeroMQ is an excellent library
  • It seems to be a popular solution (e.g. python-matlab-bridge and zmqnumpy) and it seems straightforward to get started with.

I imagine then that we would have a server module which basically takes as configuration an entirely configured datastream. It reads batches from this data stream into a pool as you said. Each query made to the sever results in either (a) a batch of data being returned or (b) a StopIteration message (signaling the end of an epoch). On the receiving side, the model simply takes as argument a ServerDataStream which is in charge of replicating the behaviour of an iterator based on the data it receives from the server. Data could optionally be compressed with e.g. bloscpack before transferring.

The main thing I'm unclear about: Does the server need to use multiprocessing/multi-threading and have one thread respond to server requests while the other fills the pool? I guess that otherwise we might be too slow to respond to requests from the client. So then we would have a server and a client. The client just requests batches, these requests are blocking obviously. The server has two threads: One responds to requests and returns batches from a pool, blocking if the pool is empty. The other thread adds batches to the queue until the queue is full (at which point it blocks and waits till more data is needed).

from fuel.

bartvm avatar bartvm commented on June 18, 2024

I think we should consider documenting this implementation a bit and adding it to Fuel: https://github.com/lukemetz/cuboid/blob/master/cuboid/datasets.py

I believe there are a few shortcomings (which I mention in mila-iqia/blocks#339 (comment)), but it also has a few advantages, namely no need for a separate process, and nice and simple code. For cases where you're computation-bound this could give a quick speedup, while we could work on a more optimized separate server based on what I mentioned so far for the really heavy cases.

from fuel.

dwf avatar dwf commented on June 18, 2024

from fuel.

bartvm avatar bartvm commented on June 18, 2024

Closed via #56.

from fuel.

udibr avatar udibr commented on June 18, 2024

the merged files are not on master...

from fuel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.