Comments (5)
I think this is a great idea, and something I've been thinking about as well. The main thing I ran into: What kind of interprocess communication to use that is fast enough not to turn into a bottleneck as well as user-friendly. I haven't worked with a lot of them, but I guess that options are:
- Sockets
- (Named) pipes
- Shared memory
Then there's a loong list of libraries that we could use to manage any of these. It seems silly to program all of this from scratch; it must be a problem that has been solved a thousand times.
So my fist suggestion that I think is worth looking into is limiting the server to transferring NumPy arrays, and using ZeroMQ (PyZMQ) for transferring the data. A few reasons:
- Although sockets are theoretically not as fast as shared memory, they're a lot more user-friendly
- Using sockets could potentially allow (eventually) for multiple clients, eventually allowing us to e.g. pass data to multiple clients on a cluster
- ZeroMQ is an excellent library
- It seems to be a popular solution (e.g.
python-matlab-bridge
andzmqnumpy
) and it seems straightforward to get started with.
I imagine then that we would have a server module which basically takes as configuration an entirely configured datastream. It reads batches from this data stream into a pool as you said. Each query made to the sever results in either (a) a batch of data being returned or (b) a StopIteration
message (signaling the end of an epoch). On the receiving side, the model simply takes as argument a ServerDataStream
which is in charge of replicating the behaviour of an iterator based on the data it receives from the server. Data could optionally be compressed with e.g. bloscpack
before transferring.
The main thing I'm unclear about: Does the server need to use multiprocessing/multi-threading and have one thread respond to server requests while the other fills the pool? I guess that otherwise we might be too slow to respond to requests from the client. So then we would have a server and a client. The client just requests batches, these requests are blocking obviously. The server has two threads: One responds to requests and returns batches from a pool, blocking if the pool is empty. The other thread adds batches to the queue until the queue is full (at which point it blocks and waits till more data is needed).
from fuel.
I think we should consider documenting this implementation a bit and adding it to Fuel: https://github.com/lukemetz/cuboid/blob/master/cuboid/datasets.py
I believe there are a few shortcomings (which I mention in mila-iqia/blocks#339 (comment)), but it also has a few advantages, namely no need for a separate process, and nice and simple code. For cases where you're computation-bound this could give a quick speedup, while we could work on a more optimized separate server based on what I mentioned so far for the really heavy cases.
from fuel.
from fuel.
Closed via #56.
from fuel.
the merged files are not on master...
from fuel.
Related Issues (20)
- KeyError: "Unable to open object (Object 'image_features' doesn't exist)" HOT 1
- Fixed HOT 1
- Built-in datasets: Convert still fails HOT 4
- Add support to make bucket to variable length data HOT 2
- Fuel Dataset Import error HOT 1
- Error when unpickling TextFile with text using encoding: "maximum recursion depth exceeded"
- Mapping won't work with mapping_accepts=dict and add_sources HOT 2
- Unicode error/crash HOT 3
- HDF5 version of ImageNet (ilsvrc 2012) and CIFAR-10 datasets. HOT 1
- Search over documentation gives wrong links
- ServerDataStream example is outdated: argument is missing
- CelebA Dataset: dropbox unstable HOT 2
- The installation process can't find build_ext. HOT 3
- pip install git+https://github.com/mila-udem/fuel.git@stable HOT 1
- [Feature Request] option to make batch size fixed HOT 1
- ImportError: libgfortran.so.1: cannot open shared object file: No such file or directory
- Installation setup.py error on Mac HOT 1
- I downloaded fuel from git and used this command to install it error when I installed fuel
- I downloaded fuel from git and used this command to install it "python setup.py install" but I got this error HOT 2
- Could you offer the whl binary file of the fuel on windows?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuel.