A bit like Bokeh, it would be great to have the option of launching a Fuel server whic

Closed via <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Wishlist: "server" process to do preprocessing in a separate thread about fuel HOT 5 CLOSED

mila-iqia commented on August 23, 2024

Wishlist: "server" process to do preprocessing in a separate thread

from fuel.

Comments (5)

bartvm commented on August 23, 2024

I think this is a great idea, and something I've been thinking about as well. The main thing I ran into: What kind of interprocess communication to use that is fast enough not to turn into a bottleneck as well as user-friendly. I haven't worked with a lot of them, but I guess that options are:

Sockets
(Named) pipes
Shared memory

Then there's a loong list of libraries that we could use to manage any of these. It seems silly to program all of this from scratch; it must be a problem that has been solved a thousand times.

So my fist suggestion that I think is worth looking into is limiting the server to transferring NumPy arrays, and using ZeroMQ (PyZMQ) for transferring the data. A few reasons:

Although sockets are theoretically not as fast as shared memory, they're a lot more user-friendly
Using sockets could potentially allow (eventually) for multiple clients, eventually allowing us to e.g. pass data to multiple clients on a cluster
ZeroMQ is an excellent library
It seems to be a popular solution (e.g. python-matlab-bridge and zmqnumpy) and it seems straightforward to get started with.

I imagine then that we would have a server module which basically takes as configuration an entirely configured datastream. It reads batches from this data stream into a pool as you said. Each query made to the sever results in either (a) a batch of data being returned or (b) a StopIteration message (signaling the end of an epoch). On the receiving side, the model simply takes as argument a ServerDataStream which is in charge of replicating the behaviour of an iterator based on the data it receives from the server. Data could optionally be compressed with e.g. bloscpack before transferring.

The main thing I'm unclear about: Does the server need to use multiprocessing/multi-threading and have one thread respond to server requests while the other fills the pool? I guess that otherwise we might be too slow to respond to requests from the client. So then we would have a server and a client. The client just requests batches, these requests are blocking obviously. The server has two threads: One responds to requests and returns batches from a pool, blocking if the pool is empty. The other thread adds batches to the queue until the queue is full (at which point it blocks and waits till more data is needed).

from fuel.

bartvm commented on August 23, 2024

I think we should consider documenting this implementation a bit and adding it to Fuel: https://github.com/lukemetz/cuboid/blob/master/cuboid/datasets.py

I believe there are a few shortcomings (which I mention in mila-iqia/blocks#339 (comment)), but it also has a few advantages, namely no need for a separate process, and nice and simple code. For cases where you're computation-bound this could give a quick speedup, while we could work on a more optimized separate server based on what I mentioned so far for the really heavy cases.

from fuel.

dwf commented on August 23, 2024

On a similar topic, I recently wrote this for an against-my-will-project-that-shall-remain-nameless, which may prove useful to have kicking around in fuel: https://gist.github.com/dwf/fd13ca098b1cb45e9011

from fuel.

bartvm commented on August 23, 2024

Closed via #56.

from fuel.

udibr commented on August 23, 2024

the merged files are not on master...

from fuel.

Wishlist: "server" process to do preprocessing in a separate thread about fuel HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs