nicholas-leonard / dp Goto Github PK
View Code? Open in Web Editor NEWA deep learning library for streamlining research and development using the Torch7 distribution.
License: Other
A deep learning library for streamlining research and development using the Torch7 distribution.
License: Other
Make a test file for dataset related unit tests.
DataTensors are too slow. We need to rethink them.
I would like them to be used as modules:
A model requests a view and a tensor type.
A DataTensor has backward(view, (gradInput|type)) and forward(view, [input|type]) methods.
The view is a string specifying the view of the Space : 'bf', 'bhwc', 'chwb', 'bhf', etc.
When a gradInput/input tensor is provided, it will be stored in cache with key (view,type).
When backward/forward methods are called without gradInput/input tensors, then a tensor of the requested view and type is provided.
Internally, the a tensor must be efficiently converted to different views.
The root view is the one provided when backward/forward is called with a tensor. It is automatically made contiguous (with parallel or the like, this can lead to a speedup). Then it is stored at self.input/gradInput, and in the tensor_cache with the correct view and type.
When a backward/forward is called without a tensor (with a type), we look for the view and type in the tensor_cache. If found, it is returned. Else, we look for the (view, type) pair in the module_cache. If found, we forward/backward through it. Else, we need to build a module that will transform the root view to the requested view.
After each doneBatch, the tensor_cache must be emptied.
Like a module, the DataTensor is constructed once as the output of a Model or DataSet.
Build a Model hosting the following Module assembly: nicholas-leonard/cunnx#8.
Should be able to load hierarchy from torch dump.
In the model graph, state are the quivers in between model nodes.
Which one?
Make a Convolution class that inherits Model. It should have the same feel as Neural.
I believe the easiest solution is to introduce new attributes:
Gater learns -> Experts learn -> Gater learns -> Experts learn ->...
what are the advantages/disadvantages of this approach vs just implementing Modules?
what could we eliminate:
what could be augmented:
We need to start considering distributing the models over many machines using :
For this to work, a model would need to be able to handle multiple forward, backward and update requests on different states in any order. This means that models would need to maintain different states using the memento design pattern. To be order invariant, we would store mementos in a map instead of a stack.
Each process has an async tcp server to handle incoming requests. Each request is associated to a batch which is a coroutine. The appropriate batch is thus resumed by the tcp server when received. The batch is resumed with the received state. A batch is yielded by proxies of remote models upon transmitting a state. It yields any data that is shared among coroutines like models, etc. As it is a coroutine, when a batch is resumed, it returns from the previous yield, and continues execution from there.
A proxy allows a local model to stand-in for a remote model. This proxy would be initialized with access to the async singleton which would allow it to spawn its remote if not available, transmit/receive states from its remote. When it transmits, it does so using an async tcp client.
The main process is initialized by propagating N batches using an async tcp client. We will require an AsyncPropagator which call coroutines for each batch propagation. This AsyncPropagator is called by the tcp server when receiving requests from the outside in order to resume a batch. Each BatchPropagator coroutine needs a batch to process. When it returns, the AsyncPropagator samples a new batch, and creates a new co-routine. So the experiment synchronizes every epoch.
Mediator will need to be adapted to allow models to communicate.
This problem is very complex. So I would just like to use this section to simplify the problem and its resolution to some more basic principles which we could follow and apply:
Has axes = {'b','s','f'}
. Output of nn.LookupTable, input of nn.Temporal*
I am having some issues trying to align this with the datasets in the datasource. For example, what if we need multiple validation sets? Then the solution might be a composite dataset. (Why is the solution always a composite?)
Sée DataTensor. Abstract away SQL into macrotendor ops. Allows managing lot's of days of data processing.
CompositeTensor?
Such that we don't have to constrain the norm for every batch.
http://data.neuflow.org/data/billionwords.tar.gz gives a 404.
For mixing softmax trees.
Make a unit test file for CUDA related functionality.
configurations:
So we need to allow the user to specify a gate.
BillionWords can use but a subset of BW train for training.
Uses sentence clusters.
A hierarchy of parameterized softmaxes.
One big memory allocation.
Similar to SparseOutLinear, but in blocks.
Softmax axe performed on variable length outputs.
Input is a table of two tensors: one is inputs, the other is target indices.
We narrow each softmax down to the target indices.
Return a column vector of targets.
Assumptions :
Cuda Memory :
We need a forward kernel:
Forward :
Optional :
Used in conjunction with SoftmaxTree.
Combined with async, we could do away with postgresql.
hey, i created a cheatsheet for torch here:
https://github.com/torch/torch7/wiki/Cheatsheet
When dp is ready, please feel free to include it at the appropriate place at that page.
It is really hard to abstract away the differences in cuda and non-cuda tensors. Ideally, the user need only say experiment:cuda() and/or datasource:cuda(), and we use cuda as much as possible.
Obstacles:
nn.modules
are implemented in cunntorch.Tensor
methods and functions are implemented in cutorch.If I have a Concat container which takes as input a single datatensor and broadcasts its modules, some might require a cuda tensor, others a float tensor. How can I solve this problem efficiently and easily at the same time?
feature()
, image()
, etc).
BaseTensors
to be read-only.
nn.Module
and dp.Batch
already reuse the torch.Tensors from batch to batch.BaseTensor
views and reuse work done by previous ones?A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.