GithubHelp home page GithubHelp logo

davidmascharka / mynn Goto Github PK

View Code? Open in Web Editor NEW
21.0 7.0 1.0 1.17 MB

Pure Python/NumPy neural network library extending MyGrad

Home Page: https://pypi.org/project/mynn/

License: MIT License

Python 100.00%

mynn's Introduction

MyNN

A pure-Python neural network library based on the amazing MyGrad.

mynn was created as an extension to mygrad for rapid prototyping of neural networks with minimal dependencies, a clean code base with excellent documentation, and as a learning tool.

Installation Instructions

If you already have MyGrad installed, clone MyNN, navigate to the resulting directory, and run

python setup.py develop

If you don't have MyGrad installed, then you can run

git clone https://github.com/rsokl/MyGrad.git
cd MyGrad
python setup.py develop

Then clone and install this repository.

Quickstart

Please see the example notebooks for a gentle introduction.

mynn's People

Contributors

davidmascharka avatar rsokl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

iancoolidge0

mynn's Issues

Unicode variable names

In implementing some of the optimizers, I'm wondering what everyone's thoughts are on using non-ascii unicode variable names. For example, the adadelta implementation currently sits as:

self.g[idx] = self.rho * self.g[idx] + (1 - self.rho) * grad**2
dx = -np.sqrt(self.dx[idx] + self.eps) / np.sqrt(self.g[idx] + self.eps) * grad
self.dx[idx] = self.rho * self.dx[idx] + (1 - self.rho) * dx**2

However, this could easily be written this way to better match the paper:

self.g[idx] = self.ρ * self.g[idx] + (1 - self.ρ) * grad**2
Δx = -np.sqrt(self.Δx[idx] + self.ɛ) / np.sqrt(self.g[idx] + self.ɛ) * grad
self.Δx[idx] = self.ρ * self.Δx[idx] + (1 - self.ρ) * Δx**2

My initial thought: using unicode names can potentially result in some additional clarity. For non-user-facing code, this might be a good thing. We probably ought not have unicode names in function signatures, as some users may have difficulty in typing unicode characters.

Update examples

  • liveplot changed to noggin
  • everything should migrate when the mygrad merge happens
  • black the notebooks for code styling

Network Layers

These are the computational layers you can use to create a model. These are essentially simple wrappers around MyGrad Tensors that hold any necessary parameters. These may take a parameter initializer that creates parameters according to the provided initialization scheme.

These should have a parameters list and define a forward and backward pass. The parameters list will be able to be passed to an Optimizer, or accessed raw. These may also define saving and loading functionality.

The Layers

  • ConvNd
  • BatchNormNd
  • Dense layer
  • Any activations that need a dedicated layer (adaptive layers like PReLu)
  • Recurrent layers (plain RNN, LSTM, GRU)
  • Dropout
  • GRU
  • other recurrent layers?

Note the lack of dropout, pooling, and reshaping layers, which can be simply performed using the NumPy operations in a forward pass.

Unit Tests

Monolithic issue for unit testing things. We'll construct a list here; comment to add things to it.

  • Mutation: ensure that operations do not modify their input

Add support for exporting models

The current solution looks like iterating through the model parameters and np.saveing all the weights, then np.loading them in a new model. We should support actually serializing a model

Examples

There should be examples of using the library for things.

  • Spiral dataset
  • MNIST
  • RCNN

Feel free to propose additional examples as well. These are all in-progress.

Initializers

An initializer should create a MyGrad Tensor according to some initialization scheme. Something like:

my_tensor = initializers.normal(10, 10) # 10x10 Tensor drawn from a normal distribution

The Initializers

GRU Layer

This is just a wrapper around the MyGrad GRU call but it's a nice piece to have to do all the bookkeeping of the variables.

Add CI

so we know things work. Will be very helpful in merging with MyGrad and as we add tests.

Optimizers

An optimizer should take an iterable of parameters at creation and perform some optimization over those parameters based on a loss that is passed to it. It can also be used to null the gradient of each one of those parameters. Something like the following:

# create a model in `model` and loss function in `loss_func`
optim = optimizers.sgd(model.parameters, learning_rate=0.01, momentum=0.99)

for batch in training_set:
    optim.null_gradients()
    outputs = model.forward(batch)
    loss = loss_func(outputs, targets)
    optim.step() # backprop into `model.parameters` and update each one

The Optimizers

Documentation webpage

One should exist. @rsokl I'm not sure how much interplay there will be between the MyGrad docs page and the MyNN docs page. Something we can discuss.

Activation Functions

This can certainly be open to debate, especially regarding which of/whether these belong in MyGrad instead of here. Common activation functions should be easily accessible to people, so they don't need to write their own little wrappers for things. These may include

  • Minimum (MyGrad)
  • Maximum (MyGrad)
  • ReLU (which is just maximum(x, 0) but a convenient wrapper)
  • Hard tanh
  • tanh (it's in mygrad)
  • Relu6 (not really common and I hate it, so I'm not implementing it. I just needed to share my feelings.
  • ELU
  • SELU
  • Leaky ReLU
  • PReLU
  • GLU
  • Sigmoid
  • Soft sign
  • Softmax
  • Log softmax

Loss Functions

A loss function should take model outputs and target values and return the loss. Something like:

# assuming model outputs in `outputs` and targets in `targets
loss_func = losses.L1()
loss = loss_func(outputs, targets)

The Losses

  • L1
  • MSE
  • Negative log-likelihood
  • KL divergence
  • Cross-entropy
  • Balanced cross-entropy (weighting factor alpha on each class)
  • Focal loss [softmax and not]
  • Smooth L1 (Huber)

Discussion

This can host comments and discussion for now.

Some additional things that would be nice to have but need some thought:

  • Saving/Loading models
  • A Model class
  • A Trainer that handles data loading, training, etc
  • Learning Rate schedulers (possibly subsumed under Trainer)

Performance vs Clarity

We should discuss the mertis of a performance/clarity tradeoff in the library. We can take advantage of some of the math functions in MyGrad to write some neural network utilities incredibly clearly and concisely, at the cost of some performance hit.

For example, L1 before and L1 after. It's immediately obvious just looking at the current L1 implementation what it's doing; all you need is to understand how the primitives backprop. If you know that (or trust that they are) then you don't need to see details.

Another benefit of using the MyGrad primitives where applicable is the fact that we stay in step if MyGrad ops get rewritten; we don't need to update both implementations if anything changes for some reason.

However, since you're relying on the MyGrad primitives to perform the backward pass on its own, you lose out on some performance benefits of writing these operations specifically. Some of the operations incur about a 2x slowdown versus writing the forward and backward explicitly, which is not ideal. For example, I believe log softmax was about twice as slow, I believe.

Currently, my strategy is to write all operations using MyGrad primitives that we can. This helps improve development speed. My initial idea is that once we get more fully-featured we can perfom some analysis of where slowdowns are coming from and focus on optimizing those. There's little point in optimizing an op if it's not used very much or if its overhead versus another op is miniscule.

Thoughts on this tradeoff?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.