nicholas-leonard / dp Goto Github PK

View Code? Open in Web Editor NEW

343.0 39.0 140.0 3.87 MB

A deep learning library for streamlining research and development using the Torch7 distribution.

License: Other

CMake 0.38% Lua 99.39% Python 0.23%

dp's Issues

DataSet tests

Make a test file for dataset related unit tests.

DataTensor optimizations

DataTensors are too slow. We need to rethink them.

I would like them to be used as modules:

starts with contiguous()
When a type-casting is required, use nn.Copy()
When a reshape is required, use nn.Reshape()
When a transpose is required, use nn.Transpose()

A model requests a view and a tensor type.

A DataTensor has backward(view, (gradInput|type)) and forward(view, [input|type]) methods.
The view is a string specifying the view of the Space : 'bf', 'bhwc', 'chwb', 'bhf', etc.

When a gradInput/input tensor is provided, it will be stored in cache with key (view,type).
When backward/forward methods are called without gradInput/input tensors, then a tensor of the requested view and type is provided.

Internally, the a tensor must be efficiently converted to different views.
The root view is the one provided when backward/forward is called with a tensor. It is automatically made contiguous (with parallel or the like, this can lead to a speedup). Then it is stored at self.input/gradInput, and in the tensor_cache with the correct view and type.
When a backward/forward is called without a tensor (with a type), we look for the view and type in the tensor_cache. If found, it is returned. Else, we look for the (view, type) pair in the module_cache. If found, we forward/backward through it. Else, we need to build a module that will transform the root view to the requested view.

After each doneBatch, the tensor_cache must be emptied.

Like a module, the DataTensor is constructed once as the output of a Model or DataSet.

BlockSparse Model

Build a Model hosting the following Module assembly: nicholas-leonard/cunnx#8.

SoftmaxTree Model

Should be able to load hierarchy from torch dump.

Batch nn.TemporalMaxPooling

State

In the model graph, state are the quivers in between model nodes.

OpenSource Licence

Which one?

Convolution2D Model

Make a Convolution class that inherits Model. It should have the same feel as Neural.

documentation : tutorial for neuralnetwork

Recurrent Model

dp.Recurrent() (@daydreamt)

Expose Module:backwardUpdate through Model-Visitor

I believe the easiest solution is to introduce new attributes:

backward (which can be switched to updateGradInput)
updateParameter (which can be switched to accUpdateGradParameters)

Online phases learning for BlockMixture

Gater learns -> Experts learn -> Gater learns -> Experts learn ->...

Unit tests for datatensor

Model vs Module

what are the advantages/disadvantages of this approach vs just implementing Modules?

state : istate, ostate, cstate, gstate provide more powerful propagations.
visitors : maxnorm, momentum, etc
cuda : discrepancies between GPU/CPU modules are handled internally.
like Apply in theano, where state is like Variable.

what could we eliminate:

predecessor, successor (models should only communicate via state).

what could be augmented:

states : do we want to abstract these in a torch.class instance or keep it as a table?
- could be similar to pylearn2.space but with dp.Batch and other such states added.
- a kind of mediator between connected models and datasource, etc.

cunn.TemporalMaxPooling

Parallel + Async = distributed models

We need to start considering distributing the models over many machines using :

For this to work, a model would need to be able to handle multiple forward, backward and update requests on different states in any order. This means that models would need to maintain different states using the memento design pattern. To be order invariant, we would store mementos in a map instead of a stack.

BatchPropagator coroutines

Each process has an async tcp server to handle incoming requests. Each request is associated to a batch which is a coroutine. The appropriate batch is thus resumed by the tcp server when received. The batch is resumed with the received state. A batch is yielded by proxies of remote models upon transmitting a state. It yields any data that is shared among coroutines like models, etc. As it is a coroutine, when a batch is resumed, it returns from the previous yield, and continues execution from there.

A proxy allows a local model to stand-in for a remote model. This proxy would be initialized with access to the async singleton which would allow it to spawn its remote if not available, transmit/receive states from its remote. When it transmits, it does so using an async tcp client.

The main process is initialized by propagating N batches using an async tcp client. We will require an AsyncPropagator which call coroutines for each batch propagation. This AsyncPropagator is called by the tcp server when receiving requests from the outside in order to resume a batch. Each BatchPropagator coroutine needs a batch to process. When it returns, the AsyncPropagator samples a new batch, and creates a new co-routine. So the experiment synchronizes every epoch.

Mediator will need to be adapted to allow models to communicate.

Principles for distributing coroutines with RPC

This problem is very complex. So I would just like to use this section to simplify the problem and its resolution to some more basic principles which we could follow and apply:

A remote object has local data that it uses to process all commands.
Message are transmitted as serializable commands that call remote objects.
A remote object is mapped to an id in a global object map so that it can be retrieved.
The command is executed with this global object map as an argument.
Each process listens and reacts to messages by unserializing and executing them on the object map.
The propagator initializes the system with a fixed number of commands, each in its own session, and then listens for returned commands.
The master limits the number of concurrent sessions in the system.
The propagator only initiates a new command upon receiving a reply from one of those it initiated.
The master can also react to cmd-line messages for commands like kill, etc.
The slave processes start off with only the address of the master.
The slave processes query the master for the location of new remote objects.
The master is configured locally to know the location of all remote objects.
The master maintains different sessions which are a sequence of potentially parallel commands.
Some remote objects will need to receive multiple commands from different sources before reducing these in order to submit the next command.
Each session has its own states and mementos which are distributed throughout the system.
Different sessions may share the same objects, but never at the same time.
Sessions should avoid interfering with each other's progress.
Each process-session pair can be represented as a coroutine.
Sessions may spawn new sessions.
A coroutine yields to the server after it sends its command to a remote object.
A coroutine is resumed by the server when the sent command sends back its results, which are encapsulated as a command to resume execution of the calling client.
A coroutine executes as command function with access to session and global object maps.
When a coroutine yields, the calling server coroutine goes back to sleep.
A message always replies with another message containing a command to resume the yielded process.
If coroutine A calls coroutine B which calls coroutine C which returns, then C will call coroutine B which will call coroutine A which will return control to the original propagtor coroutine.
A composite command is a command that calls multiple commands before yielding.
A composite command encapsulates sub commands in their own coroutines.
A composite command spawns its own tcp server.
A spawned tcp server may be cached for later use in a sessions object.
A session object may be cached for later use in a global object.
A proxy should send all its messages to different processes before yielding.
Some proxies are actually composite proxies associated to different remote objects.
A composite proxy must accumulate results before returning a command.
A composite proxy will have its own tcp server for listening to replies.
A composite proxy will encapsulate each component proxies in a coroutine and have the tcp server resume them using commands.
When all component proxy coroutines return, the composite proxy will return.
A mediator can be used to broadcast a command on a channel.
A remote object can subscribe a command callback to a mediator channel.
A command callback is a function that generates a command from its arguments.
A remote object subscribes to a channel by sending a subscription command to the mediator.
A mediator runs its own async tcp server.
A mediator server reacts to commands requesting the broadcasting of data to a channel.
A channel is a composite proxy used to send data to different remote objects.
A command sent through a mediator channel cannot be communicated across sessions.
A database runs its own async tcp server.
A database uses a local storage mechanism like files, sqlite or hdf5.
A database centralizes datastorage to facilitate backups.
The master runs as a tcp server in its own process.
The propagator sends a command to the master to initiate new sessions.

dp.SequenceTensor

Has axes = {'b','s','f'}. Output of nn.LookupTable, input of nn.Temporal*

Cutorch.sort

http://nvlabs.github.io/cub/classcub_1_1_block_radix_sort.html#aa6b164f496c319269f83502d7fc53512

accUpdate Dictionary

torch/nn#33

Composite design pattern for Experiment, Propagator and such.

I am having some issues trying to align this with the datasets in the datasource. For example, what if we need multiple validation sets? Then the solution might be a composite dataset. (Why is the solution always a composite?)

SQLTensor

Sée DataTensor. Abstract away SQL into macrotendor ops. Allows managing lot's of days of data processing.

Multi-input/multi-target DataSet

CompositeTensor?

MaxNorm has a periodicity

Such that we don't have to constrain the norm for every batch.

Billionwords link is dead.

http://data.neuflow.org/data/billionwords.tar.gz gives a 404.

Tiny BillionWords dataset (one million examples)

Dictionary Model (nn.LookupTable)

SoftmaxForest

For mixing softmax trees.

Memory Leak

Cuda tests.

Make a unit test file for CUDA related functionality.

Loss : nn.Criterion adapter

Perplexity feedback

Batch cunn.TemporalConvolution

NLP (BillionWords dataset) example

Profile Dictionary, Neural, Convolution1D, and SoftmaxTree

Factor out Print, AddConstant from model/ and add to model/module/

new BlockSparse configurations

configurations:

NReLU + SoftMax : it will bound the scale of the gater
SoftMax + Balance : continuous alternative to NReLU
SoftMax : might jus work.

So we need to allow the user to specify a gate.

BillionWords subset

BillionWords can use but a subset of BW train for training.

Uses sentence clusters.

cunn.SoftmaxTree

A hierarchy of parameterized softmaxes.
One big memory allocation.
Similar to SparseOutLinear, but in blocks.
Softmax axe performed on variable length outputs.
Input is a table of two tensors: one is inputs, the other is target indices.
We narrow each softmax down to the target indices.
Return a column vector of targets.

Assumptions :

Children and parents are identified by a contiguous sequence of keys (makes indexing easy)

Cuda Memory :

weight matrix
- size : nChildNode x inputSize
- each parent node has an nChildren x inputSize block
bias vector
- size : nChildNode
neuron vector
- size : batchSize x nChildNode x (act, grad, ...)
index of children :
- pointer to parent
- used to obtain path to root
index of parent :
- pointer to array of children indices
- childStart (pointer to weight matrix or bias vector + childStart)
- nChildren

We need a forward kernel:

inputs :
- indices of target children (one per example)
- all above Cuda Memory objects
- outputs :
- matrix of node paths :
  - NULL terminated
- vector of activations
  - batchSize x 1
- spread
- one block per batchFrame
- one thread per inputNeuron

Forward :

bottom up get next parent
matrix vector multiplication : y = b + x A -> store y
s = softmax(y) -> store s
a = narrow(s) -> store a
product of s
repeat from 1 until reach root
return product

Optional :

Parent and children indexes could stay on CPU
Implemented in dp;

cunn.LogSoftMax optimization

Convolution1D Model

Push ConfusionMatrix:batchAdd to optim

cutorch.renorm

TreeNLL Loss

Used in conjunction with SoftmaxTree.

rock spec

Perplexity for NLL

FileLogger used torch-hdf5

Combined with async, we could do away with postgresql.

Batch nn.TemporalConvolution

CMake file for rockspec

cheatsheet

hey, i created a cheatsheet for torch here:
https://github.com/torch/torch7/wiki/Cheatsheet

When dp is ready, please feel free to include it at the appropriate place at that page.

tensor types.

It is really hard to abstract away the differences in cuda and non-cuda tensors. Ideally, the user need only say experiment:cuda() and/or datasource:cuda(), and we use cuda as much as possible.

Obstacles:

Not all nn.modules are implemented in cunn
Not all torch.Tensor methods and functions are implemented in cutorch.

If I have a Concat container which takes as input a single datatensor and broadcasts its modules, some might require a cuda tensor, others a float tensor. How can I solve this problem efficiently and easily at the same time?

Currently, we leave it to _forward and _backward methods to specify the tensor views (feature(), image(), etc).
- We could just add a parameter to these views which asks for the type of the returned tensor.
- We could automate this with an activation() method, which could use the self._tensor_type as default.
- Subclasses that have different input views could hard-code something else.
We could also force the tensors returned by BaseTensors to be read-only.
- They are only used for activations and their gradients, which are written once.
- Unless of course we decide to reuse them from batch to batch...
- But this would be a bad idea and would lead to little or no speedup since nn.Module and dp.Batch already reuse the torch.Tensors from batch to batch.
- So then, this way, we can cache different tensor types such that a different modules can call different BaseTensor views and reuse work done by previous ones?
- Yup.

nicholas-leonard / dp Goto Github PK

dp's Issues

BatchPropagator coroutines

Principles for distributing coroutines with RPC

Recommend Projects

Recommend Topics

Recommend Org

Jobs