GithubHelp home page GithubHelp logo

nninit's People

Contributors

anibali avatar kaixhin avatar sniklaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nninit's Issues

calcFan in nninit.orthogonal

Is there a reason why nninit.orthogonal does not use calcFan and instead calculates the fanIn / fanOut without taking the underlying module into consideration? Thanks!

inconsistencies with nninit.orthogonal

I am experiencing inconsistencies with the orthogonal initialization. In the example below, both modules have the same number of weights but the latter is significantly faster to initialize.

th> nn.SpatialConvolution(100, 100, 3, 3):init('weight', nninit.orthogonal)
nn.SpatialConvolution(100 -> 100, 3x3)
                                                                                  [7.6399s]
th> nn.SpatialConvolution(100, 100, 3, 3).weight:nElement()
90000   
                                                                                  [0.0006s]
----

th> nn.SpatialConvolution(100 * 3 * 3, 100, 1, 1):init('weight', nninit.orthogonal)
nn.SpatialConvolution(900 -> 100, 1x1)
                                                                                  [0.0605s]
th> nn.SpatialConvolution(100 * 3 * 3, 100, 1, 1).weight:nElement()
90000   
                                                                                  [0.0006s]

Is this a desired behavior or a bug? The cause of this is that nninit.orthogonal uses fanIn and fanOut to determinethe size of the matrix that is ought to be orthogonalized, and it does not seem to be the right way of doing it.

local fanIn = sizes[2]
local fanOut = sizes[1]
for d = 3, #sizes do
    fanIn = fanIn * sizes[d]
    fanOut = fanOut * sizes[d]
end

----

nn.SpatialConvolution(100, 100, 3, 3)
fanIn: 900
fanOut: 900

----

nn.SpatialConvolution(100 * 3 * 3, 100, 1, 1)
fanIn: 900
fanOut: 100

Thank you for this very handy library.

Why isn't "convolution-aware initialization" redundant?

Plancherel's theorem implies that orthogonality in the spatial domain is equivalent to orthogonality in the frequency domain. From my understanding, CAI doesn't do anything special in the frequency domain aside from simply initializing the filters of each kernel such that they form an orthonormal set. If my understanding is correct, then vanilla orthogonal initialization should accomplish the same thing, making CAI redundant.

See this Gist for a simple demo illustrating my point.

Better API

Currently nninit is a bit clunky (in an effort to avoid side-effects). I would like to modify nn.Module to have something like an init method, with an API along the lines of:

nn.Linear(4096, 1000):init('weight', 'xavier', 'normal'):init('weight', 'sparse', 0.3):init('bias', 'constant', 0)

I think returning the module and therefore being able to chain calls makes it a lot more elegant. The current way of entering parameters can also be discussed. Any thoughts @soumith / @skaae?

Specification for `eye`

The eye function is wrong for the convolutional layers. In the 2D case every filter can abide by the specification for torch.eye, and the same can be extended for 3D along the diagonal. In 1D perhaps the closest is a vector of 1s? This solution would be the most consistent with torch.eye, which is good.

Alternatively, considering that these are convolutions, the identity would be the delta function (i.e. a 1 as close to the middle of a tensor as possible). Asking @bshillingford to clarify what he thinks makes more sense.

nngraph

How would you use it for nngraph layers?
Edit2: It seems to be problem with cudnn layers?

edit: This seems awesome addition to torch :)

LSTM Support

Although this library works fine with nngraph, it would be good to also support rnn - specifically the LSTM module. Given the new API introduced with #2, how can the elements of the cell be initalised individually? Any feedback @nicholas-leonard?

A notable reason to support this would be to implement the large forget gate bias introduced in:

Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.

The idea of nninit is to allow experimentation with initialisations/free maintainers from implementing "best practices".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.