beronx86 / theano-hf Goto Github PK
View Code? Open in Web Editor NEWThis project forked from boulanni/theano-hf
General purpose Hessian-free optimization in Theano
License: Other
This project forked from boulanni/theano-hf
General purpose Hessian-free optimization in Theano
License: Other
I wrapped my Hessian-free code in a generic class, usable as a black-box to train your models if you can provide the cost function as a Theano expression. It includes all the details in Martens (ICML 2010) and Martens & Sutskever (ICML 2011) crucial to make it work: - Tikhonov damping with the Levenberg-Marquardt heuristics, - Gauss-Newton matrix products (you specify an Theano expression `s` to section your computational graph in 2), - Proper handling of batches and mini-batches (an example SequenceDataset class is provided for variable-length input) - Conjugate gradient (CG) with information sharing, backtracking, preconditioning and terminations conditions. - Structural damping for RNNs. It relies heavily on the Rop. In practice, I could make it work without hassle for a feed-forward network, an RNN with different objectives, NADE (Larochelle) and a more complex model (RNN-NADE) that ties two scans together, so it seems reasonably flexible. Only the gradients and Gauss-Newton matrix products (95% of the computation) are in Theano, CG and the training logic is in python. It runs on GPU, but for the models I tried, it was a bit slower. Hessian-free is slow, you need CG batch sizes of 1000+ (don't skimp on this), but you can get really better results than SGD from it with almost zero tweaking. There is an option to save and recover a checkpoint of training and do early stopping. I included an RNN example that can memorize an input for 100 time steps (example_RNN). Launch it on 4 cores, come back in 8 hours, and you should have at least one nice solution with 0 error on the validation set. In comparison, SGD can solve this problem about 0.0% of the time. It is available here: https://github.com/boulanni/theano-hf If you use this software for academic research, please cite the following paper: [1] N. Boulanger-Lewandowski, Y. Bengio and P. Vincent, "Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription", Proc. ICML 29, 2012. Author: Nicolas Boulanger-Lewandowski University of Montreal, 2012
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.