GithubHelp home page GithubHelp logo

flomlo / ntm_keras Goto Github PK

View Code? Open in Web Editor NEW
139.0 10.0 30.0 374 KB

An implementation of the Neural Turing Machine as a keras recurrent layer.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

ntm_keras's Introduction

Changelog 0.2:

  • API CHANGE: Controller models now must have linear activation. The activation of the NTM-Layer is selected by the new parameter "activation" (default: "linear"). For all the stuff that interacts with the memory we now have very precise handselected activations which asume that there was no prior de-linearisation. This requirement on the controller will probably be final.
  • There is now support for multiple read/write heads! Use the parameters read_heads resp. write_heads at initialisation (by default both are 1).
  • The code around controller output splitting and activation was completely rewritten and cleaned from a lot of copy-paste-code.
  • Unfortunately we lost backend neutrality: As tf.slice is used extensivly, we have to either try getting K.slice or have to do a case distinction over backend. Use the old version if you need another backend than tensorflow! And please write me a message.
  • As less activations have to be computed, it is now a tiny little bit faster (~1%).
  • Stateful models do not work anymore. Actually they never worked, the testing routine was just broken. Will be repaired asap.

The Neural Turing Machine

Introduction

This code tries to implement the Neural Turing Machine, as found in https://arxiv.org/abs/1410.5401, as a backend neutral recurrent keras layer.

A very default experiment, the copy task, is provided, too.

In the end there is a TODO-List. Help would be appreciated!

NOTE:

  • There is a nicely formatted paper describing the rough idea of the NTM, implementation difficulties and which discusses the copy experiment. It is available here in the repository as The_NTM_-_Introduction_And_Implementation.pdf.
  • You may want to change the LOGDIR_BASE in testing_utils.py to something that works for you or just set a symbolic link.

User guide

For a quick start on the copy task, type

python main.py -v ntm

while in a python enviroment which has tensorflow, keras and numpy. Having tensorflow-gpu is recommend, as everything is about 20x faster. In my case this experiment takes about 100 minutes on a NVIDIA GTX 1050 Ti. The -v is optional and offers much more detailed information about the achieved accuracy, and also after every training epoch. Logging data is written LOGDIR_BASE, which is ./logs/ by default. View them with tensorboard:

tensorboard --logdir ./logs

If you've luck and not had a terrible run (that can happen, unfortunately), you now have a machine capable of copying a given sequence! I wonder if we could have achieved that any other way ...

These results are especially interesting compared to an LSTM model: Run

python main.py lstm

This builds 3 layers of LSTM with and goes through the same testing procedure as above, which for me resulted in a training time of approximately 1h (same GPU) and (roughly) 100%, 100%, 94%, 50%, 50% accuracy at the respective test lengths. This shows that the NTM has advantages over LSTM in some cases. Especially considering the LSTM model has about 807.200 trainable parameters while the NTM had a mere 3100!

Have fun playing around, maybe with other controllers? dense, double_dense and lstm are build in.

API

From the outside, this implementation looks like a regular recurrent layer in keras. It has however a number of non-obvious parameters:

Hyperparameters

  • n_width: This is the width of the memory matrix. Increasing this increases computational complexity in O(n^2). The controller shape is not dependant on this, making weight transfer possible.

  • m_depth: This is the depth of the memory matrix. Increasing this increases the number of trainable weights in O(m^2). It also changes controller shape.

  • controller_model: This parameter allows you to place a keras model of appropriate shape as the controller. The appropriate shape can be calculated via controller_input_output_shape. If None is set, a single dense layer will be used.

  • read_heads: The number of read heads this NTM should have. Has quadratic influence on the number of trainable weights. Default: 1

  • write_heads: The number of write heads this NTM should have. Has quadratic influence on the number of trainable weights, but for small numbers a huge impact. Default: 1

Usage

More or less minimal code example:

from keras.models import Sequential
from keras.optimizers import Adam
from ntm import NeuralTuringMachine as NTM

model = Sequential()
model.name = "NTM_-_" + controller_model.name

ntm = NTM(output_dim, n_slots=50, m_depth=20, shift_range=3,
          controller_model=None,
          return_sequences=True,
          input_shape=(None, input_dim), 
          batch_size = 100)
model.add(ntm)

sgd = Adam(lr=learning_rate, clipnorm=clipnorm)
model.compile(loss='binary_crossentropy', optimizer=sgd,
               metrics = ['binary_accuracy'], sample_weight_mode="temporal")

What if we instead want a more complex controller? Design it, e.g. double LSTM:

controller = Sequential()
controller.name=ntm_controller_architecture
controller.add(LSTM(units=150,
                    stateful=True,
                    implementation=2,   # best for gpu. other ones also might not work.
                    batch_input_shape=(batch_size, None, controller_input_dim)))
controller.add(LSTM(units=controller_output_dim,
                    activation='linear',
                    stateful=True,
                    implementation=2))   # best for gpu. other ones also might not work.

controller.compile(loss='binary_crossentropy', optimizer=sgd,
                 metrics = ['binary_accuracy'], sample_weight_mode="temporal")

And now use the same code as above, only with controller_model=controller.

Note that we used linear as the last activation layer! This is of critical importance. The activation of the NTM-layer can be set the parameter activation (default: linear).

Note that a correct controller_input_dim and controller_output_dim can be calculated via controller_input_output_shape:

from ntm import controller_input_output_shape
controller_input_dim, controller_output_dim = ntm.controller_input_output_shape(
            input_dim, output_dim, m_depth, n_slots, shift_range, read_heads, write_heads) 

Also note that every statefull controller must carry around his own state, as was done here with

stateful=True

TODO:

  • Arbitrary number of read and write heads
  • Support of masking, and maybe dropout, one has to reason about it theoretically first.
  • Support for get and set config to better enable model saving
  • A bit of code cleaning: especially the controller output splitting is ugly as hell.
  • Support for arbitrary activation functions would be nice, currently restricted to sigmoid.
  • Make it backend neutral again! Some testing might be nice, too.
  • Maybe add the other experiments of the original paper?
  • Mooaaar speeeed. Look if there are platant performance optimizations possible.

ntm_keras's People

Contributors

flomlo avatar keldlundgaard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ntm_keras's Issues

return_sequences=False

Hi,
I'm trying to use your NTM implementation to do sequence classification. Footnote 1 of your PDF says that return_sequences=False is broken. But is this still true? By setting that parameter, I am getting tensors of the right shape, i.e. [batch_size, output_size] with no time dimension.

Performance is not great yet, but I want to make sure that's due to finnicky hyper-parameters and not something still being broken with return_sequences.

Thanks!

I cannot load saved model

I implemented the NeuralTuringMachine layer in my model and trained it. It works fine but when I tried to load the saved model, I got "Unknown layer NeuralTuringMachine". I have tried different ways to load the model, including model_from_json and load_model. All the same problem.
Then I used the parameter custom_objects
model=model_from_json(open(model_destination + ".arch.json").read(), custom_objects={"NeuralTuringMachine":NeuralTuringMachine})
But got a error:
`/home/ymeng/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in from_config(cls, config)
1250 A layer instance.
1251 """
-> 1252 return cls(**config)
1253
1254 def count_params(self):

TypeError: init() takes at least 2 arguments (2 given)`

However I can use load_weights if I load an untrained model first (basically create a new model each time).

I suppose it is just a configuration problem and can be fixed. However I wonder if you have ever tried loading a saved model? The examples only train models but never load them.

Memory tends to be the same

Hi, thanks for your code, I use these to do sequence prediction task, I find there is a problem that the vector in different memory slots tend to be same. So do you know how to fix this problem.

_run_controller about lstm

I'm confused about the _run_controller, if I use the lstm as controller, the dimension of controller_input should be 3, so the controller_input = controller_input[:, None, :]

because the lstm call function should have the shape [batch_size, time_length, input_dim], but now we give it the shape [batch_size, 1, input_dim], and does it could learn the error through time?

Do we need controller.compile?

Hi Florian,

Thanks for your great code! I have a question though:
Do we really need this line of code (line 81):

controller.compile(loss='binary_crossentropy', optimizer=sgd, metrics = ['binary_accuracy'], sample_weight_mode="temporal")

from https://github.com/flomlo/ntm_keras/blob/master/main.py ?

I don't really think we need this. If we do need this, it would really confuse me. We already specify (line 43):

model.compile(loss='binary_crossentropy', optimizer=sgd, metrics = ['binary_accuracy'], sample_weight_mode="temporal")

from https://github.com/flomlo/ntm_keras/blob/master/model_ntm.py

controller is just a part of the NTM, and we should specific the ``compile" part only at NTM model. Am I correct?

I hope I made my question clear. Again, thanks a lot for your great work!

Best,

Cuong

question about statefulness

Apologies - it seems like the NTM just needs to be trained on more epochs than some of the other models I was using. This is no longer an issue.

I did have another question about the "stateful=True" for the controller though -- is this necessary? And could you explain your comment that "Stateful models do not work anymore"? Thank you!

Dimensionality error of NTM

Your code runs fine. However, when I changed the parameters like input_dim, output_dim, and use the 'LSTM' controller, I got an error.
For example, if input_dim=600, output_dim=13, n_slots=100, m_depth=256, batch_size=100, shift_range=3, read_heads=1, write_heads=1
model = model_ntm.gen_model(input_dim=input_dim, output_dim=output_dim, batch_size=batch_size, controller_model=controller, read_heads=read_heads, write_heads=write_heads, activation="sigmoid")

leads to:
`in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, debug_python_shape_fn, require_shape_fn)
674 missing_shape_fn = True
675 else:
--> 676 raise ValueError(err.message)
677
678 if missing_shape_fn:

ValueError: Dimensions must be equal, but are 620 and 856 for 'neural_turing_machine_9/lstm_7/MatMul' (op: 'MatMul') with input shapes: [100,620], [856,4196].
`

Confused about the entire dimension thing...

What should the input and output dimensions be for the test dataset. For example, I have a test dataset of dimensions (94,1,1). What should the output dimensions be? I am getting the error: Error when checking input: expected neural_turing_machine_22_input to have shape (None, 94) but got array with shape (1, 1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.