GithubHelp home page GithubHelp logo

liuq901 / currennt Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 49.05 MB

Open source neural network toolkit.

License: GNU General Public License v3.0

CMake 0.18% C++ 54.67% Cuda 33.46% C 11.51% Python 0.17%

currennt's Introduction

CURRENNT is written in C++/CUDA and runs on Windows and Linux. It requires a
CUDA-capable graphics card with a compute capability of 1.3 or higher (i.e. at
least a GeForce 210, GT 220/40, FX380 LP, 1800M, 370/380M or NVS 2/3100M)

+=============================================================================+
| Structure                                                                   |
+=============================================================================+
1. Building on Linux
2. Building on Windows
3. Command line options
    3.1 Common options
    3.2 Feed forward options
    3.3 Training options
    3.4 Autosave options
    3.5 Data file options
    3.6 Weight initialization options
4. Network configuration
5. NetCDF data files
    5.1 General structure
    5.2 Regression tasks
    5.3 Classification tasks
6. Tests
7. Examples

+=============================================================================+
| 1. Building on Linux                                                        |
+=============================================================================+
Building on Linux requires the following:
* CUDA Toolkit 5.0
* GCC 4.6 or higher
* NetCDF scientific data library
* Boost library 1.48 or higher

To build CURRENNT execute the following commands within the directory
containing this README file:
#> mkdir build && cd build
#> cmake ..
#> make
(Ensure that CMakeLists.txt is located in the parent directory of "build").

This produces the executable file 'currennt'.

You might get many warnings like 'Cannot tell what pointer points to, assuming
global memory space'. These are totally OK and there seems to be no way to
suppress them.


+=============================================================================+
| 2. Building on Windows                                                      |
+=============================================================================+
Building on Windows requires the following:
* Visual Studio 2010
* CUDA Toolkit 5.0
* Boost library 1.52.0 or higher

The NetCDF sources and required DLLs are already there, so you do not have to
download them yourself.

To build CURRENNT do the following:
1) Open 'currennt.sln' in Visual Studio 2010
2) Ensure that you installed the Boost library in 'C:/boost_1_52_0'. 
   NOTE: If you build Boost from source, make sure you build the x64 version
   (default is x86). This is done via
     bjam msvc architecture=x86 address-model=64 stage
   on the command line from the Boost source directory.
   If you want to use another version or another installation path, you need 
   to modify  the project files for the 'currennt_lib' and 'currennt' projects. 
   You can do this by changing the include and library search paths under the
   project settings or you can edit the project files directly which is much
   simpler. To do this, open the files 'currennt_lib/currennt_lib.vcxproj'
   and 'currennt/currennt.vcxproj' and replace every occurence of
   'C:/boost_1_52_0' with the path (may be a relative path) to your Boost
   installation.
3) Ensure that the solution configuration is set to 'Release' on 'x64' (in the
   toolbar right below the main menu)
4) Compile the whole solution via 'Build -> Build Solution' (F7)

You might get many warnings like 'Cannot tell what pointer points to, assuming
global memory space'. These are totally OK and there seems to be no way to
suppress them.

The build process creates currennt.exe in the Release directory.
If you have not installed the NetCDF library and its dependencies in your
global system path, you have to copy them from the CURRENNT source directory 
into the Release directory.


+=============================================================================+
| 3. Command line options                                                     |
+=============================================================================+
CURRENNT offers many command line options. You can get a short description of
them by running the program with the '--help' switch. A more detailed
description of the options can be found in the following subsections.

+-----------------------------------------------------------------------------+
| 3.1 Common options                                                          |
+-----------------------------------------------------------------------------+
--help
  Shows a brief description of every available option.
  
--options_file <file.cfg>
  This option reads the command line options from <file.cfg> in which every
  option is set in a separate line in the form of 'option = value'. This is a
  positional option, i.e. you can omit the name of the option and just run the
  trainer as 'currennt mynet.cfg'. Go into the directory 'examples' for an
  example of this feature. Options set on the command line have priority, i.e.
  if you set 'max_epochs = 5' in <file.cfg> and set 'max_epochs = 10' on the
  command line, the final value of this option will be 10.
  
--network <file.jsn>
  Sets the file which defines the structure of the neural network. If the file
  contains weights, these weights are used instead of initializing them using a
  random distribution. The default value of this option is 'network.jsn'. The
  structure of this file is explained in section 4.
  
--cuda <true/false>
  Enables CUDA. If you explicitly set this option to 'false' then the CPU will
  be used for all computations. This only makes sense for debugging or for
  training very small networks (<< 100,000 weights). CUDA is enabled by
  default.
  
--parallel_sequences <value>
  In order to speed up computations, the trainer processes a set of multiple
  training sequences, which we call a fraction, in parallel. I.e. if you set
  this value to 50 then the trainer will split the whole data set in fractions
  of 50 sequences each and calculates all those sequences concurrently. This
  is the ultimate option to boost the training speed but you should be careful
  because choosing a value != 1 means that you do not do true online learning
  any more if that's important for your task. If you can afford it, choose a
  value as high as possible (if you go too high the program tells you that
  there is not enough GPU memory available). This can speedup the training over
  20x compared to true online learning! Section 3.3 contains more information
  about this option.
  
--random_seed <value>
  Sets the seed for the random number generators. This option allows you to
  re-produce training results on the same machine. Useful for debugging but
  usually you should leave it at its default value of 0 which causes the
  program to create the seed itself. 

+-----------------------------------------------------------------------------+
| 3.2 Feed forward options                                                    |
+-----------------------------------------------------------------------------+
In 'feed forward mode', the trainer does not train a network but uses an
already trained network (specified by the --network option) and computes only
the forward pass to obtain the network outputs for certain input sequences.

--ff_input_file <file.nc>
  Sets the name of the data file (in NetCDF format, see section 5) that
  contains the network input sequences for the feed forward mode. Data files
  used in feed forward mode must have the same structure as training data files
  (see section 5), but the vectors of outputs or target classes may be missing.
  If you set static noise (via '--input_noise_sigma', see below) != 0, then the
  noise is also applied to the input sequences of this file. If you don't want
  that, ensure that static noise is set to 0.
  
--ff_output_file <file.csv>
  Sets the name of the output file that is produced in feed forward mode. The
  file format is CSV in which the first column contains the sequence tag from
  the input data file and the other columns contain the activations of the
  output neurons in temporally ascending order. I.e. if your network has an
  output layer with 2 neurons in it and the input data file contains two
  sequences with tags A and B whereas A has a length of 2 and B has a length of
  3 timesteps, then the resulting output file could look like
    A;0.1;0.2;-0.05;0.9723
    B;0.5;0.5;0.111;-0.22;0.82;0.3
  The default output file name is 'ff_output.csv'.

+-----------------------------------------------------------------------------+
| 3.3 Training options                                                        |
+-----------------------------------------------------------------------------+
--train <true/false>
  Enables training mode. If you want to train a network then you have have to
  set this option to 'true'. The default value is 'false' and hence the program
  is in feed forward mode by default.

--hybrid_online_batch <true/false>
  Enables hybrid online/batch training mode. If you set this value to 'true'
  then the trainer updates the network weights after every processed fraction
  (which consists of PS sequences, with PS being the number of parallel
  sequences set via the --parallel_sequences option). If you set this value to
  'false' which is the default value, then you're doing batch learning. If you
  want true online learning (weight updates after every sequence) then you have
  to set this value to 'true' and set '--parallel_sequences' to 1.

--shuffle_fractions <true/false>
  Enables shuffling of fractions during network training. Since only the
  fractions are shuffled, they always consist of the same sequences. The
  advantage of fraction shuffling over sequence shuffling (see below) is
  that it can be considerably faster if the lengths of your input sequences
  differ greatly. If you have roughly equally long sequences, prefer sequence
  shuffling. The default value is 'false'.

--shuffle_sequences <true/false>
  Enables shuffling of sequences within and across fractions. As opposed to
  fraction shuffling (see above), this mode truely randomizes the order of all
  sequences in each training epoch. If the lengths of your input sequences
  differ greatly, you can use fraction shuffling instead which should speed up
  the training at the cost of less randomized sequences.

--max_epochs <value>
  Sets the maximum number of training epochs. If the network does not converge
  after <value> epochs, the training is stopped, no matter what. The default
  value of this option is infinity.

--max_epochs_no_best <value>
  Causes the program to stop training if no new best error on the validation
  set could be achieved within the last <value> epochs. Be careful if you also
  use the '--validate_every' option (see below). If you choose to only
  calculate the validation error every 5 epochs, then the trainer might miss
  new best errors in epochs in which this error is not evaluated. The default
  value of this option is 20.

--validate_every <value>
  Causes the trainer to evaluate the validation error every <value> epochs.
  Choosing a value other than 1 speeds up training times but you might miss new
  best errors. The default value is 1.

--test_every <value>
  Causes the trainer to evaluate the error on the test set every <value>
  epochs. The default value is 1.

--optimizer steepest_descent
  Sets the type of optimizer to use. The default (and currently only) optimizer
  is a steepest descent optimizer with momentum ('steepest_descent'). Its
  parameters can be set via the options '--learning_rate' and '--momentum' (see
  below).

--learning_rate <value>
  Sets the learning rate for the steepest descent optimizer. The default value
  is 1e-5.

--momentum <value>
  Sets the momentum for the steepest descent optimizer. The default value is
  0.9.

--l2_norm <value>
  Sets the l2 norm for the steepest descent optimizer. The default value is 0.

--save_network <file.jsn>
  Sets the file name of the network file that is created when training is
  finished. The file will contain the network structure (the same as the one
  loaded from the file provided by '--network') and the trained weights. You
  can use this file to re-train the network by simply using it as value for
  the '--network' option. The trainer will then use these weights as initial
  weights instead of initializing them randomly.

+-----------------------------------------------------------------------------+
| 3.4 Autosave options                                                        |
+-----------------------------------------------------------------------------+
CURRENNT offers options to save to current training status after every
training epoch. Autosave is disabled by default but if you're training on a
large amount of data, you should really switch it on. If you don't and your
computer crashes during training, you have to start all over again.

The produced autosave files are in JSON format and can be used as network file
for the '--network' option. This means you can first train the network a few
epochs with configuration A and then continue training with a completely 
different configuration B by restarting the program with the autosave file.

WARNING: If an autosave file already exists, it will be overwritten!

--autosave <true/false>
  Enables autosave after every training epoch. The resulting files are named
  '<prefix>epoch0123.autosave' with <prefix> being the prefix configured via
  '--autosave_prefix'. Default is 'false'.
  
--autosave_prefix <value>
  Sets the filename prefix for autosave files. If you want your autosave files
  to be put in the directory 'mydir' and have filenames like
  'mynn-epoch012.autosave', then you should set <value> to 'mydir/mynn-'.
  The default prefix is empty, so the resulting files are put in the current
  working directory with names like 'epoch012.autosave'
  
--continue <file.autosave>
  Continues training from the provided autosave file. If you continue from an
  autosave file, all options set at the command line or in the options file are
  ignored as they are stored in the autosave file itself. If you really want to
  change these options, you can edit the autosave file since it is an ordinary
  JSON file. WARNING: If you do this (a) you have to be careful not to break
  anything (special characters, delimiters, ...) and (b) you lose the
  information about which options were used to obtain the autosave file in the
  first place. To continue training, you only need the autosave file and the
  training/validation/test data files.

+-----------------------------------------------------------------------------+
| 3.5 Data file options                                                       |
+-----------------------------------------------------------------------------+
--train_file <file.nc>
  Sets the NetCDF file which contains the sequences used as the training set
  during network training. The required structure of the data file is described
  in section 5. Does not have a default value and must be set in training mode.

--val_file <file.nc>
  Sets the NetCDF file which contains the sequences used as the validation set
  during network training. The required structure of the data file is described
  in section 5. Does not have a default value. If you don't provide a
  validation file, then training is done without it.

--test_file <file.nc>
  Sets the NetCDF file which contains the sequences used as the test set
  during network training. The required structure of the data file is described
  in section 5. Does not have a default value. If you don't provide a
  test file, then training is done without it.

--train_fraction <value>
  Sets the fraction of the training set that will be used. I.e. if you set this
  value to 0.5, then only the first half of the sequences from the file set via
  '--train_file' will be used.

--val_fraction <value>
  Sets the fraction of the validation set that will be used. I.e. if you set
  this value to 0.5, then only the first half of the sequences from the file
  set via '--val_file' will be used.

--test_fraction <value>
  Sets the fraction of the test set that will be used. I.e. if you set this
  value to 0.5, then only the first half of the sequences from the file set via
  '--test_file' will be used.

--input_noise_sigma <value>
  Sets the standard deviation of the static noise that is applied to the input
  sequences of the training and feed forward input set. Static noise is not
  applied to validation and test sets. The default value is 0.0 (no static
  noise)

+-----------------------------------------------------------------------------+
| 3.6 Weight initialization options                                           |
+-----------------------------------------------------------------------------+
The network weights are initialized randomly if the network file set via the
'--network' option does not contain any. They are initialized using either a
uniform or a normal distribution.

--weights_dist <uniform/normal>
  Sets the distribution to use for initializing the weights. Default is
  'uniform'.
  
--weights_uniform_min <value>
  Sets the minimum value for weights when using the uniform distribution.
  Default is -0.1.

--weights_uniform_max <value>
  Sets the maximum value for weights when using the uniform distribution.
  Default is 0.1.
  
--weights_normal_sigma <value>
  Sets the standard deviation for the normal distribution used to initialize
  the weights. Default is 0.1.

--weights_normal_mean <value>
  Sets the standard deviation for the normal distribution used to initialize
  the weights. Default is 0.
  
 
+=============================================================================+
| 4. Network configuration                                                    |
+=============================================================================+
The structure of a neural network is defined in a JSON file and passed to the
CURRENNT executable via the '--network' option. The structure of such files
is described in this chapter.

As an example, we will create a neural network for multiclass classification
tasks. Our training data consists of input sequences with patterns of 39 values
and target sequences with integers denoting 1 out of 51 target classes for each
timestep. Hence, we need a neural network with 39 input neurons and 51 output
neurons. Each of the output neurons shall represent the probability of one
timestep belonging to the corresponding target class. Hence, the sum over all
output neuron activations shall equal 1.

Our network shall contain a single hidden LSTM layer that is trained in both
positive and negative time directions (-> bidirectional LSTM). The hidden layer
shall contain 100 biased LSTM units. The final network file has the following
content:

{
    "layers": [
        {
            "size": 39,
            "name": "input_layer",
            "type": "input"
        },
		{
            "size": 100,
            "name": "hidden_layer",
            "bias": 1.0,
            "type": "blstm"
        },
        {
            "size": 51,
            "name": "output_layer",
            "bias": 1.0,
            "type": "softmax"
        },
        {
            "size": 51,
            "name": "postoutput_layer",
            "type": "multiclass_classification"
        }
    ]
}

It is very important to use the right braces and commas in the right places. If
the syntax is not 100% correct, CURRENNT will not accept the file. It is also
crucial to define the layers in the right order. The first layer must always be
the input layer (with type 'input') and the last layer must always be a post
output layer.

The so-called post output layer is always the very last layer in the network.
It does not have any trainable weights and its purpose is to evaluate the 
objective function during the forward pass and induce the error into the output
layer during the backward pass.

In this example, our network consists of an input layer with 39 neurons, a
hidden bidirectional LSTM layer, a feed forward output layer with 51 neurons
which use the softmax activation function and the post output layer suitable
for our multiclass classification task.

You can define an arbitrary number of hidden layers with different types and
sizes. See the directory 'examples' for more example network files.

Some important points:
* Input and post output layers do not require a bias value.
* You must provide a bias value for hidden and output layers. If you do not
  want to have bias connections, set the bias to 0.
* Layer names must be unique
* The post output layer must have the same size as the output layer

Available hidden and output layer types:
* feedforward_tanh     = Feed forward layer with tanh as activation function
* feedforward_logistic = Feed forward layer with a logistic activation function
* feedforward_identity = Feed forward layer with f(x)=x activation function
* softmax              = Feed forward layer with softmax activation function
* lstm                 = Unidir. LSTM layer with forget gates and peepholes
* blstm                = Bidir.  LSTM layer with forget gates and peepholes

Available post output layers:
* sse                                   = Sum of Squared Error objective function
* rmse                                  = Root Mean Squared Error objective function
* binary_classification                 = To be used with a ff logistic output layer
* multiclass_classification             = To be used with a softmax output layer
* connectionist_temporal_classification = To be used with a softmax output layer


+=============================================================================+
| 5. NetCDF data files                                                        |
+=============================================================================+
The data files used for CURRENNT are NetCDF (*.nc) files. An example is
provided in the directory 'examples' and its structure can be investigated by
running 'ncdump nc_file.nc'.

The following subsections describe the general structure of these files and the
required extensions for regression and classification tasks respectively.

+-----------------------------------------------------------------------------+
| 5.1 General structure                                                       |
+-----------------------------------------------------------------------------+
The data files need to contain the following dimensions:
* numSeqs         = Number of sequences
* numTimesteps    = Total number of timesteps
* inputPattSize   = Size of each input pattern (= number of input neurons)
* maxSeqTagLength = Maximum length of a sequence tag

The required variables are:
* char seqTags(numSeqs, maxSeqTagLength)    = Tag (name) for each sequence
* int seqLengths(numSeqs)                   = Length of each sequence
* float inputs(numTimesteps, inputPattSize) = Input Patterns

+-----------------------------------------------------------------------------+
| 5.2 Regression tasks                                                        |
+-----------------------------------------------------------------------------+
Additional dimensions:
* targetPattSize = Size of each output pattern (= number of output neurons)

Additional variables:
* float targetPatterns(numTimesteps, targetPattSize) = Target patterns

+-----------------------------------------------------------------------------+
| 5.3 Classification tasks                                                    |
+-----------------------------------------------------------------------------+
Additional dimensions:
* numLabels = Number of target classes

Additional variables:
* int targetClasses(numTimesteps) = Target classes (one for each timestep)


+=============================================================================+
| 6. Tests                                                                    |
+=============================================================================+
The directory 'tests' contains the scripts which check if the library produces
the correct output for a predefined input. Currently there is only one test
'test1' which traines a small deep RNN for one epoch and compares the trained
weights with reference weights obtained with RNNLIB. To run the test, execute
the 'run.py' Python script from within the directory 'test1'. The result will
be printed on the screen. Make sure that you have built CURRENNT as described
above before you run a test.


+=============================================================================+
| 7. Examples                                                                 |
+=============================================================================+
The directory 'examples' contains example configurations that show how to use
CURRENNT for classification and regression. 

The examples use an options file 'config.cfg' that contain the configuration
for CURRENNT and a JSON file 'network.jsn' to specify the network topologies.
Make sure that you have built CURRENNT as described above before you run the
examples.

The following examples are provided:

+-----------------------------------------------------------------------------+
| 7.1 Multi-class classification / Speech recognition                         |
+-----------------------------------------------------------------------------+
The directory 'speech_recognition_chime' contains
a noisy small vocabulary speech recognition task with 51 words from the
2nd CHiME Speech Separation and Recognition Challenge. 

The two *.nc files contain the training and validation sequences as 39
Mel-frequency cepstral coefficient (MFCC) features per 10ms time step. Only the
data for speaker 1 is provided in order to keep the download size reasonable. 

The two directories 'no_subsampling' and 'subsampling' contain different
network topologies (with and without activation subsampling layers) that are
trained by executing the shell script 'run.sh' in each of these folders. 
On Windows, the batch file 'run.bat' does the same.

For a detailed description of the task and learning procedure, refer to

Martin Wöllmer, Felix Weninger, Jürgen Geiger, Björn Schuller, Gerhard Rigoll:
"Noise Robust ASR in Reverberated Multisource Environments Applying Convolutive
NMF and Long Short-Term Memory", Computer Speech and Language, Special Issue on
Speech Separation and Recognition in Multisource Environments, Elsevier, 17
pages, 2013.

+-----------------------------------------------------------------------------+
| 7.2 Regression / Autoencoding                                               |
+-----------------------------------------------------------------------------+
The directory 'speech_autoencoding_chime' contains an example for a regression
task, mapping noisy speech MFCC features to clean speech MFCC features, similar
to autoencoding.  The input data is the same as for the speech recognition
task, just the targets are different.  You can run the example via 'run.sh' 
or 'run.bat' as in the above.

For a detailed description of the task and learning procedure, refer to

Felix Weninger, Jürgen Geiger, Martin Wöllmer, Björn Schuller, Gerhard Rigoll:
"The Munich Feature Enhancement Approach to the 2013 CHiME Challenge Using
BLSTM Recurrent Neural Networks", Proc. 2nd CHiME Speech Separation and
Recognition Challenge held in conjunction with ICASSP 2013, IEEE, Vancouver,
Canada, 01.06.2013.

currennt's People

Contributors

liuq901 avatar

Watchers

James Cloos avatar  avatar

Forkers

lijian8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.