nicholas-leonard / dp Goto Github PK

A deep learning library for streamlining research and development using the Torch7 distribution.

License: Other

CMake 0.38% Lua 99.39% Python 0.23%

dp's Introduction

dp Package Reference Manual#

dp is a deep learning library designed for streamlining research and development using the Torch7 distribution. It emphasizes flexibility through the elegant use of object-oriented design patterns.

Documentation

This package includes lots of documentations and tutorials which you will find hosted on readthedocs. If you prefer, you can consult the docs using github.

dp's People

Contributors

Stargazers

Watchers

Forkers

ebenolson paidi fiskio hycis grisson sagarwaghmare69 lixiangnlp 5urprise luenah diz-vara vseledkin auser jashmenn edwardt 7404n binyam invinciblejha rcoppolo ajaytalati dwiel hugo-w k8si yusuke0519 zyu0910 khellan nilday timothywangdev bouzaieni ramin-git doxav cjviper salemameen denominator d0ugal deepakjnath spyatakov mindis yanweifu seragentp digitalgenius sukbong nikhilagrima tanger830 temerick jnhwkim oztc eulerreich kalyanp jaiabhayk mady902 animesh-garg erosennin rockt ywelement kirk86 dineshj1 wavelets wgapl arspin tsaijn apvijay liuxiaozeeee williford ghostcow jz3707 hongyunnchen bilguun ortil lijian8 linzichuan shivam11 fone4u pyadolla elbamos dydcfg bdutta19 timmynth hitesh3011 zjucsxxd rshravan hfxunlp mfolnovic cequencer asiap23 papamadeleine2022 calube cpt4cid digideskio shyamalschandra rainstrom nileshkulkarni kathegaara sherriiie annie16 caldweln sauravthakur12 igotyooo nathanie eywalker crowsonkb

dp's Issues

dp.SequenceTensor

Has axes = {'b','s','f'}. Output of nn.LookupTable, input of nn.Temporal*

Profile Dictionary, Neural, Convolution1D, and SoftmaxTree

DataSet tests

Make a test file for dataset related unit tests.

Multi-input/multi-target DataSet

CompositeTensor?

DataTensor optimizations

DataTensors are too slow. We need to rethink them.

I would like them to be used as modules:

starts with contiguous()
When a type-casting is required, use nn.Copy()
When a reshape is required, use nn.Reshape()
When a transpose is required, use nn.Transpose()

A model requests a view and a tensor type.

A DataTensor has backward(view, (gradInput|type)) and forward(view, [input|type]) methods.
The view is a string specifying the view of the Space : 'bf', 'bhwc', 'chwb', 'bhf', etc.

When a gradInput/input tensor is provided, it will be stored in cache with key (view,type).
When backward/forward methods are called without gradInput/input tensors, then a tensor of the requested view and type is provided.

Internally, the a tensor must be efficiently converted to different views.
The root view is the one provided when backward/forward is called with a tensor. It is automatically made contiguous (with parallel or the like, this can lead to a speedup). Then it is stored at self.input/gradInput, and in the tensor_cache with the correct view and type.
When a backward/forward is called without a tensor (with a type), we look for the view and type in the tensor_cache. If found, it is returned. Else, we look for the (view, type) pair in the module_cache. If found, we forward/backward through it. Else, we need to build a module that will transform the root view to the requested view.

After each doneBatch, the tensor_cache must be emptied.

Like a module, the DataTensor is constructed once as the output of a Model or DataSet.

SoftmaxForest

For mixing softmax trees.

cunn.LogSoftMax optimization

Perplexity feedback

rock spec

Expose Module:backwardUpdate through Model-Visitor

I believe the easiest solution is to introduce new attributes:

backward (which can be switched to updateGradInput)
updateParameter (which can be switched to accUpdateGradParameters)

FileLogger used torch-hdf5

Combined with async, we could do away with postgresql.

State

In the model graph, state are the quivers in between model nodes.

Billionwords link is dead.

http://data.neuflow.org/data/billionwords.tar.gz gives a 404.

Composite design pattern for Experiment, Propagator and such.

I am having some issues trying to align this with the datasets in the datasource. For example, what if we need multiple validation sets? Then the solution might be a composite dataset. (Why is the solution always a composite?)

accUpdate Dictionary

torch/nn#33

Unit tests for datatensor

cheatsheet

hey, i created a cheatsheet for torch here:
https://github.com/torch/torch7/wiki/Cheatsheet

When dp is ready, please feel free to include it at the appropriate place at that page.

Loss : nn.Criterion adapter

documentation : tutorial for neuralnetwork

tensor types.

It is really hard to abstract away the differences in cuda and non-cuda tensors. Ideally, the user need only say experiment:cuda() and/or datasource:cuda(), and we use cuda as much as possible.

Obstacles:

Not all nn.modules are implemented in cunn
Not all torch.Tensor methods and functions are implemented in cutorch.

If I have a Concat container which takes as input a single datatensor and broadcasts its modules, some might require a cuda tensor, others a float tensor. How can I solve this problem efficiently and easily at the same time?

Currently, we leave it to _forward and _backward methods to specify the tensor views (feature(), image(), etc).
- We could just add a parameter to these views which asks for the type of the returned tensor.
- We could automate this with an activation() method, which could use the self._tensor_type as default.
- Subclasses that have different input views could hard-code something else.
We could also force the tensors returned by BaseTensors to be read-only.
- They are only used for activations and their gradients, which are written once.
- Unless of course we decide to reuse them from batch to batch...
- But this would be a bad idea and would lead to little or no speedup since nn.Module and dp.Batch already reuse the torch.Tensors from batch to batch.
- So then, this way, we can cache different tensor types such that a different modules can call different BaseTensor views and reuse work done by previous ones?
- Yup.

Tiny BillionWords dataset (one million examples)

Model vs Module

what are the advantages/disadvantages of this approach vs just implementing Modules?

state : istate, ostate, cstate, gstate provide more powerful propagations.
visitors : maxnorm, momentum, etc
cuda : discrepancies between GPU/CPU modules are handled internally.
like Apply in theano, where state is like Variable.

what could we eliminate:

predecessor, successor (models should only communicate via state).

what could be augmented:

states : do we want to abstract these in a torch.class instance or keep it as a table?
- could be similar to pylearn2.space but with dp.Batch and other such states added.
- a kind of mediator between connected models and datasource, etc.

Cutorch.sort

http://nvlabs.github.io/cub/classcub_1_1_block_radix_sort.html#aa6b164f496c319269f83502d7fc53512

Batch cunn.TemporalConvolution

OpenSource Licence

Which one?

Parallel + Async = distributed models

We need to start considering distributing the models over many machines using :

For this to work, a model would need to be able to handle multiple forward, backward and update requests on different states in any order. This means that models would need to maintain different states using the memento design pattern. To be order invariant, we would store mementos in a map instead of a stack.

BatchPropagator coroutines

Each process has an async tcp server to handle incoming requests. Each request is associated to a batch which is a coroutine. The appropriate batch is thus resumed by the tcp server when received. The batch is resumed with the received state. A batch is yielded by proxies of remote models upon transmitting a state. It yields any data that is shared among coroutines like models, etc. As it is a coroutine, when a batch is resumed, it returns from the previous yield, and continues execution from there.

A proxy allows a local model to stand-in for a remote model. This proxy would be initialized with access to the async singleton which would allow it to spawn its remote if not available, transmit/receive states from its remote. When it transmits, it does so using an async tcp client.

The main process is initialized by propagating N batches using an async tcp client. We will require an AsyncPropagator which call coroutines for each batch propagation. This AsyncPropagator is called by the tcp server when receiving requests from the outside in order to resume a batch. Each BatchPropagator coroutine needs a batch to process. When it returns, the AsyncPropagator samples a new batch, and creates a new co-routine. So the experiment synchronizes every epoch.

Mediator will need to be adapted to allow models to communicate.

Principles for distributing coroutines with RPC

This problem is very complex. So I would just like to use this section to simplify the problem and its resolution to some more basic principles which we could follow and apply:

A remote object has local data that it uses to process all commands.
Message are transmitted as serializable commands that call remote objects.
A remote object is mapped to an id in a global object map so that it can be retrieved.
The command is executed with this global object map as an argument.
Each process listens and reacts to messages by unserializing and executing them on the object map.
The propagator initializes the system with a fixed number of commands, each in its own session, and then listens for returned commands.
The master limits the number of concurrent sessions in the system.
The propagator only initiates a new command upon receiving a reply from one of those it initiated.
The master can also react to cmd-line messages for commands like kill, etc.
The slave processes start off with only the address of the master.
The slave processes query the master for the location of new remote objects.
The master is configured locally to know the location of all remote objects.
The master maintains different sessions which are a sequence of potentially parallel commands.
Some remote objects will need to receive multiple commands from different sources before reducing these in order to submit the next command.
Each session has its own states and mementos which are distributed throughout the system.
Different sessions may share the same objects, but never at the same time.
Sessions should avoid interfering with each other's progress.
Each process-session pair can be represented as a coroutine.
Sessions may spawn new sessions.
A coroutine yields to the server after it sends its command to a remote object.
A coroutine is resumed by the server when the sent command sends back its results, which are encapsulated as a command to resume execution of the calling client.
A coroutine executes as command function with access to session and global object maps.
When a coroutine yields, the calling server coroutine goes back to sleep.
A message always replies with another message containing a command to resume the yielded process.
If coroutine A calls coroutine B which calls coroutine C which returns, then C will call coroutine B which will call coroutine A which will return control to the original propagtor coroutine.
A composite command is a command that calls multiple commands before yielding.
A composite command encapsulates sub commands in their own coroutines.
A composite command spawns its own tcp server.
A spawned tcp server may be cached for later use in a sessions object.
A session object may be cached for later use in a global object.
A proxy should send all its messages to different processes before yielding.
Some proxies are actually composite proxies associated to different remote objects.
A composite proxy must accumulate results before returning a command.
A composite proxy will have its own tcp server for listening to replies.
A composite proxy will encapsulate each component proxies in a coroutine and have the tcp server resume them using commands.
When all component proxy coroutines return, the composite proxy will return.
A mediator can be used to broadcast a command on a channel.
A remote object can subscribe a command callback to a mediator channel.
A command callback is a function that generates a command from its arguments.
A remote object subscribes to a channel by sending a subscription command to the mediator.
A mediator runs its own async tcp server.
A mediator server reacts to commands requesting the broadcasting of data to a channel.
A channel is a composite proxy used to send data to different remote objects.
A command sent through a mediator channel cannot be communicated across sessions.
A database runs its own async tcp server.
A database uses a local storage mechanism like files, sqlite or hdf5.
A database centralizes datastorage to facilitate backups.
The master runs as a tcp server in its own process.
The propagator sends a command to the master to initiate new sessions.

Convolution1D Model

NLP (BillionWords dataset) example

Factor out Print, AddConstant from model/ and add to model/module/

Batch nn.TemporalMaxPooling

cunn.SoftmaxTree

A hierarchy of parameterized softmaxes.
One big memory allocation.
Similar to SparseOutLinear, but in blocks.
Softmax axe performed on variable length outputs.
Input is a table of two tensors: one is inputs, the other is target indices.
We narrow each softmax down to the target indices.
Return a column vector of targets.

Assumptions :

Children and parents are identified by a contiguous sequence of keys (makes indexing easy)

Cuda Memory :

weight matrix
- size : nChildNode x inputSize
- each parent node has an nChildren x inputSize block
bias vector
- size : nChildNode
neuron vector
- size : batchSize x nChildNode x (act, grad, ...)
index of children :
- pointer to parent
- used to obtain path to root
index of parent :
- pointer to array of children indices
- childStart (pointer to weight matrix or bias vector + childStart)
- nChildren

We need a forward kernel:

inputs :
- indices of target children (one per example)
- all above Cuda Memory objects
- outputs :
- matrix of node paths :
  - NULL terminated
- vector of activations
  - batchSize x 1
- spread
- one block per batchFrame
- one thread per inputNeuron

Forward :

bottom up get next parent
matrix vector multiplication : y = b + x A -> store y
s = softmax(y) -> store s
a = narrow(s) -> store a
product of s
repeat from 1 until reach root
return product

Optional :

Parent and children indexes could stay on CPU
Implemented in dp;

new BlockSparse configurations

configurations:

NReLU + SoftMax : it will bound the scale of the gater
SoftMax + Balance : continuous alternative to NReLU
SoftMax : might jus work.

So we need to allow the user to specify a gate.

Recurrent Model

dp.Recurrent() (@daydreamt)