GithubHelp home page GithubHelp logo

nicholas-leonard / dp Goto Github PK

View Code? Open in Web Editor NEW
344.0 39.0 140.0 3.87 MB

A deep learning library for streamlining research and development using the Torch7 distribution.

License: Other

CMake 0.38% Lua 99.39% Python 0.23%

dp's Introduction

dp Package Reference Manual#

Join the chat at https://gitter.im/nicholas-leonard/dp

dp is a deep learning library designed for streamlining research and development using the Torch7 distribution. It emphasizes flexibility through the elegant use of object-oriented design patterns.

Documentation

This package includes lots of documentations and tutorials which you will find hosted on readthedocs. If you prefer, you can consult the docs using github.

dp's People

Contributors

auser avatar crowsonkb avatar d0ugal avatar dineshj1 avatar diz-vara avatar dwiel avatar eddiepierce avatar erosennin avatar eulerreich avatar eywalker avatar gitter-badger avatar hycis avatar jashmenn avatar jnhwkim avatar ketranm avatar khellan avatar mfolnovic avatar nicholas-leonard avatar rcoppolo avatar rookie32 avatar seragentp avatar soumith avatar temerick avatar timothywangdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dp's Issues

dp.SequenceTensor

Has axes = {'b','s','f'}. Output of nn.LookupTable, input of nn.Temporal*

DataTensor optimizations

DataTensors are too slow. We need to rethink them.

I would like them to be used as modules:

  • starts with contiguous()
  • When a type-casting is required, use nn.Copy()
  • When a reshape is required, use nn.Reshape()
  • When a transpose is required, use nn.Transpose()

A model requests a view and a tensor type.

A DataTensor has backward(view, (gradInput|type)) and forward(view, [input|type]) methods.
The view is a string specifying the view of the Space : 'bf', 'bhwc', 'chwb', 'bhf', etc.

When a gradInput/input tensor is provided, it will be stored in cache with key (view,type).
When backward/forward methods are called without gradInput/input tensors, then a tensor of the requested view and type is provided.

Internally, the a tensor must be efficiently converted to different views.
The root view is the one provided when backward/forward is called with a tensor. It is automatically made contiguous (with parallel or the like, this can lead to a speedup). Then it is stored at self.input/gradInput, and in the tensor_cache with the correct view and type.
When a backward/forward is called without a tensor (with a type), we look for the view and type in the tensor_cache. If found, it is returned. Else, we look for the (view, type) pair in the module_cache. If found, we forward/backward through it. Else, we need to build a module that will transform the root view to the requested view.

After each doneBatch, the tensor_cache must be emptied.

Like a module, the DataTensor is constructed once as the output of a Model or DataSet.

State

In the model graph, state are the quivers in between model nodes.

tensor types.

It is really hard to abstract away the differences in cuda and non-cuda tensors. Ideally, the user need only say experiment:cuda() and/or datasource:cuda(), and we use cuda as much as possible.

Obstacles:

  • Not all nn.modules are implemented in cunn
  • Not all torch.Tensor methods and functions are implemented in cutorch.

If I have a Concat container which takes as input a single datatensor and broadcasts its modules, some might require a cuda tensor, others a float tensor. How can I solve this problem efficiently and easily at the same time?

  1. Currently, we leave it to _forward and _backward methods to specify the tensor views (feature(), image(), etc).
    • We could just add a parameter to these views which asks for the type of the returned tensor.
    • We could automate this with an activation() method, which could use the self._tensor_type as default.
    • Subclasses that have different input views could hard-code something else.
  2. We could also force the tensors returned by BaseTensors to be read-only.
    • They are only used for activations and their gradients, which are written once.
    • Unless of course we decide to reuse them from batch to batch...
    • But this would be a bad idea and would lead to little or no speedup since nn.Module and dp.Batch already reuse the torch.Tensors from batch to batch.
    • So then, this way, we can cache different tensor types such that a different modules can call different BaseTensor views and reuse work done by previous ones?
    • Yup.

Model vs Module

what are the advantages/disadvantages of this approach vs just implementing Modules?

  • state : istate, ostate, cstate, gstate provide more powerful propagations.
  • visitors : maxnorm, momentum, etc
  • cuda : discrepancies between GPU/CPU modules are handled internally.
  • like Apply in theano, where state is like Variable.

what could we eliminate:

  • predecessor, successor (models should only communicate via state).

what could be augmented:

  • states : do we want to abstract these in a torch.class instance or keep it as a table?
    • could be similar to pylearn2.space but with dp.Batch and other such states added.
    • a kind of mediator between connected models and datasource, etc.

Parallel + Async = distributed models

We need to start considering distributing the models over many machines using :

For this to work, a model would need to be able to handle multiple forward, backward and update requests on different states in any order. This means that models would need to maintain different states using the memento design pattern. To be order invariant, we would store mementos in a map instead of a stack.

BatchPropagator coroutines

Each process has an async tcp server to handle incoming requests. Each request is associated to a batch which is a coroutine. The appropriate batch is thus resumed by the tcp server when received. The batch is resumed with the received state. A batch is yielded by proxies of remote models upon transmitting a state. It yields any data that is shared among coroutines like models, etc. As it is a coroutine, when a batch is resumed, it returns from the previous yield, and continues execution from there.

A proxy allows a local model to stand-in for a remote model. This proxy would be initialized with access to the async singleton which would allow it to spawn its remote if not available, transmit/receive states from its remote. When it transmits, it does so using an async tcp client.

The main process is initialized by propagating N batches using an async tcp client. We will require an AsyncPropagator which call coroutines for each batch propagation. This AsyncPropagator is called by the tcp server when receiving requests from the outside in order to resume a batch. Each BatchPropagator coroutine needs a batch to process. When it returns, the AsyncPropagator samples a new batch, and creates a new co-routine. So the experiment synchronizes every epoch.

Mediator will need to be adapted to allow models to communicate.

Principles for distributing coroutines with RPC

This problem is very complex. So I would just like to use this section to simplify the problem and its resolution to some more basic principles which we could follow and apply:

  1. A remote object has local data that it uses to process all commands.
  2. Message are transmitted as serializable commands that call remote objects.
  3. A remote object is mapped to an id in a global object map so that it can be retrieved.
  4. The command is executed with this global object map as an argument.
  5. Each process listens and reacts to messages by unserializing and executing them on the object map.
  6. The propagator initializes the system with a fixed number of commands, each in its own session, and then listens for returned commands.
  7. The master limits the number of concurrent sessions in the system.
  8. The propagator only initiates a new command upon receiving a reply from one of those it initiated.
  9. The master can also react to cmd-line messages for commands like kill, etc.
  10. The slave processes start off with only the address of the master.
  11. The slave processes query the master for the location of new remote objects.
  12. The master is configured locally to know the location of all remote objects.
  13. The master maintains different sessions which are a sequence of potentially parallel commands.
  14. Some remote objects will need to receive multiple commands from different sources before reducing these in order to submit the next command.
  15. Each session has its own states and mementos which are distributed throughout the system.
  16. Different sessions may share the same objects, but never at the same time.
  17. Sessions should avoid interfering with each other's progress.
  18. Each process-session pair can be represented as a coroutine.
  19. Sessions may spawn new sessions.
  20. A coroutine yields to the server after it sends its command to a remote object.
  21. A coroutine is resumed by the server when the sent command sends back its results, which are encapsulated as a command to resume execution of the calling client.
  22. A coroutine executes as command function with access to session and global object maps.
  23. When a coroutine yields, the calling server coroutine goes back to sleep.
  24. A message always replies with another message containing a command to resume the yielded process.
  25. If coroutine A calls coroutine B which calls coroutine C which returns, then C will call coroutine B which will call coroutine A which will return control to the original propagtor coroutine.
  26. A composite command is a command that calls multiple commands before yielding.
  27. A composite command encapsulates sub commands in their own coroutines.
  28. A composite command spawns its own tcp server.
  29. A spawned tcp server may be cached for later use in a sessions object.
  30. A session object may be cached for later use in a global object.
  31. A proxy should send all its messages to different processes before yielding.
  32. Some proxies are actually composite proxies associated to different remote objects.
  33. A composite proxy must accumulate results before returning a command.
  34. A composite proxy will have its own tcp server for listening to replies.
  35. A composite proxy will encapsulate each component proxies in a coroutine and have the tcp server resume them using commands.
  36. When all component proxy coroutines return, the composite proxy will return.
  37. A mediator can be used to broadcast a command on a channel.
  38. A remote object can subscribe a command callback to a mediator channel.
  39. A command callback is a function that generates a command from its arguments.
  40. A remote object subscribes to a channel by sending a subscription command to the mediator.
  41. A mediator runs its own async tcp server.
  42. A mediator server reacts to commands requesting the broadcasting of data to a channel.
  43. A channel is a composite proxy used to send data to different remote objects.
  44. A command sent through a mediator channel cannot be communicated across sessions.
  45. A database runs its own async tcp server.
  46. A database uses a local storage mechanism like files, sqlite or hdf5.
  47. A database centralizes datastorage to facilitate backups.
  48. The master runs as a tcp server in its own process.
  49. The propagator sends a command to the master to initiate new sessions.

cunn.SoftmaxTree

A hierarchy of parameterized softmaxes.
One big memory allocation.
Similar to SparseOutLinear, but in blocks.
Softmax axe performed on variable length outputs.
Input is a table of two tensors: one is inputs, the other is target indices.
We narrow each softmax down to the target indices.
Return a column vector of targets.

Assumptions :

  • Children and parents are identified by a contiguous sequence of keys (makes indexing easy)

Cuda Memory :

  • weight matrix
    • size : nChildNode x inputSize
    • each parent node has an nChildren x inputSize block
  • bias vector
    • size : nChildNode
  • neuron vector
    • size : batchSize x nChildNode x (act, grad, ...)
  • index of children :
    • pointer to parent
    • used to obtain path to root
  • index of parent :
    • pointer to array of children indices
    • childStart (pointer to weight matrix or bias vector + childStart)
    • nChildren

We need a forward kernel:

  • inputs :
    • indices of target children (one per example)
    • all above Cuda Memory objects
    • outputs :
    • matrix of node paths :
      • NULL terminated
    • vector of activations
      • batchSize x 1
    • spread
    • one block per batchFrame
    • one thread per inputNeuron

Forward :

  1. bottom up get next parent
  2. matrix vector multiplication : y = b + x A -> store y
  3. s = softmax(y) -> store s
  4. a = narrow(s) -> store a
  5. product of s
  6. repeat from 1 until reach root
  7. return product

Optional :

  • Parent and children indexes could stay on CPU
  • Implemented in dp;

new BlockSparse configurations

configurations:

  1. NReLU + SoftMax : it will bound the scale of the gater
  2. SoftMax + Balance : continuous alternative to NReLU
  3. SoftMax : might jus work.

So we need to allow the user to specify a gate.

Convolution2D Model

Make a Convolution class that inherits Model. It should have the same feel as Neural.

Cuda tests.

Make a unit test file for CUDA related functionality.

SQLTensor

Sée DataTensor. Abstract away SQL into macrotendor ops. Allows managing lot's of days of data processing.

BillionWords subset

BillionWords can use but a subset of BW train for training.

Uses sentence clusters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.