snowkylin / ntm Goto Github PK
View Code? Open in Web Editor NEWTensorFlow implementation of Neural Turing Machines (NTM), with its application on one-shot learning (MANN)
License: GNU Lesser General Public License v3.0
TensorFlow implementation of Neural Turing Machines (NTM), with its application on one-shot learning (MANN)
License: GNU Lesser General Public License v3.0
Hi there,
I'm trying to integrate a memory network into an A3C agent. For reference, I followed closely this implementation of A3C: https://github.com/awjuliani/DeepRL-Agents/blob/master/A3C-Doom.ipynb
My aim is to replace the LSTM layer with a MANN module. This might be a far-fetched question but do you have any advice for me when refactoring your MANN implementation for my particular purpose?
I tried to train this network to identify 13 classes with 5 RGB images for each class.
One image is like this.
I modified the network to work with RGB. But even after 100000 iterations cannot see any kind of convergence.
Do you think this network is not capable of remembering information in above-mentioned images? Because in character data set the information is not complex as much as in above type images.
In this line you calculate accuracies for a batch up to first 10 elements in the total list which has 50 indexes. I think you are trying to measure accuracy how the network would work with its memory and predict correctly when it sees the same image several times.
But I would like to have a clear definition. Can you please describe this?
Here as I understood the code this keep one model for each sequence length . And will update the weights for each model separatly . Why don't we update the weights as in dynamic sequence to sequence-to-sequence model .
Hello @snowkylin
please, I have question regarding the instance meaning.
is instance means the variant shapes of a letter? for example, when we are saying training with 10th instances, that is we will use 10 variances of each letter to train the network with.
did I have understood well?
thanks in advance.
Alaa
When I run 'python3 copy_task.py --rnn_num_layers 3 --rnn_size 64 --max_seq_length 10 --memory_size 20 --memory_vector_dim 8 --vector_dim 4' , I got an error about argument converting.
So, I added "type=int," or "type=float" into add_argument()s in copy_task.py.
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--mode', default="train")
parser.add_argument('--restore_training', default=False)
parser.add_argument('--test_seq_length', type=int, default=20)
parser.add_argument('--model', default="NTM")
parser.add_argument('--rnn_size', type=int, default=128)
parser.add_argument('--rnn_num_layers', type=int, default=3)
parser.add_argument('--max_seq_length', type=int, default=15)
parser.add_argument('--memory_size', type=int, default=128)
parser.add_argument('--memory_vector_dim', type=int, default=20)
parser.add_argument('--batch_size', type=int, default=10)
parser.add_argument('--vector_dim', type=int, default=8)
parser.add_argument('--shift_range', type=int, default=1)
parser.add_argument('--num_epoches', type=int, default=1000000)
parser.add_argument('--learning_rate', type=float, default=1e-4)
parser.add_argument('--save_dir', default='./save/copy_task')
parser.add_argument('--tensorboard_dir', default='./summary/copy_task')
args = parser.parse_args()
if args.mode == 'train':
train(args)
elif args.mode == 'test':
test(args)
I want to see a result figure at in your slide page 11. What arguments do I run copy_task.py with? Are the parameters above right?
Hello,
I have trained your model on omniglot (train and validation combined) and am achieving similar results to what you're presenting. I am interested in running a single image through the network for prediction, but am having trouble wrapping my head around how to do it.
The network takes in samples, each sample contains a number of sequences. The default setting is batch size of 16 with seq_length of 50, so 50 * 16 images. After the model is trained and saved, the size of the memory block remains 50 sequences long.
How do I perform inference on a single image with a trained model containing a memory module of 50 sequences long? I am also aware that the labels are created arbitrarily once a batch is collected, how should I work with this for inference on a single image?
Thank you so much,
Paul
I got an error when I run 'python one_shot_learning.py'. I think, it occurred because there is no data folder .
Could you explain how to get omniglot dataset and set up the 'data' folder?
Thank you in advanced.
$ python3 one_shot_learning.py
2018-01-22 17:12:28.831022: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-22 17:12:28.972256: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-22 17:12:28.972609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.90GiB freeMemory: 11.76GiB
2018-01-22 17:12:28.972622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
Namespace(augment=True, batch_size=16, debug=False, image_height=20, image_width=20, label_type='one_hot', learning_rate=0.001, memory_size=128, memory_vector_dim=40, mode='train', model='MANN', n_classes=5, n_test_classes=423, n_train_classes=1200, num_epoches=100000, output_dim=5, read_head_num=4, restore_training=False, rnn_num_layers=1, rnn_size=200, save_dir='./save/one_shot_learning', seq_length=50, shift_range=1, tensorboard_dir='./summary/one_shot_learning', test_batch_num=100, write_head_num=1)
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th batch loss
Traceback (most recent call last):
File "one_shot_learning.py", line 156, in
main()
File "one_shot_learning.py", line 38, in main
train(args)
File "one_shot_learning.py", line 71, in train
label_type=args.label_type)
File "/mnt/data1/softgear/snowkylin_ntm_mann/ntm/utils.py", line 66, in fetch_batch
classes = [np.random.choice(range(len(data)), replace=False, size=n_classes) for _ in range(batch_size)]
File "/mnt/data1/softgear/snowkylin_ntm_mann/ntm/utils.py", line 66, in
classes = [np.random.choice(range(len(data)), replace=False, size=n_classes) for _ in range(batch_size)]
File "mtrand.pyx", line 1121, in mtrand.RandomState.choice
ValueError: a must be non-empty
$
I understand what you did here. It's like erasing the least used section from the memory before writing something new. But I couldn't find why we do it.
Also can you please elaborate why the find least used locations that equal to a number of reading heads?
When I run 'python3 copy_task.py', I got errors after batches 5000 as logs below.
Should I make ./save/copy_task/NTM directory?
batches 4900, loss 0.00113704
[[ 1. 1. 1. 1. 1. 1. 1. 0.]
[ 0. 0. 1. 1. 1. 0. 0. 0.]
[ 0. 1. 1. 0. 0. 1. 1. 0.]
[ 1. 0. 1. 1. 1. 0. 1. 1.]
[ 1. 0. 1. 0. 1. 0. 0. 0.]
[ 0. 1. 1. 0. 1. 0. 0. 1.]]
[[ 9.99992490e-01 9.99413490e-01 9.99982953e-01 9.99965310e-01
9.99992251e-01 9.97943461e-01 9.98543024e-01 1.08241523e-03]
[ 2.57530161e-07 2.93660793e-04 9.99461949e-01 9.99995112e-01
9.99989748e-01 1.24331400e-05 8.15727219e-07 3.13950721e-07]
[ 4.58355225e-06 9.99971747e-01 9.99837399e-01 4.42370627e-04
7.24922749e-04 9.99874592e-01 9.99990821e-01 8.03661169e-06]
[ 9.99992490e-01 3.12785669e-05 9.99974847e-01 9.99755561e-01
9.99990463e-01 2.00146256e-04 9.99897242e-01 9.99996305e-01]
[ 9.99901414e-01 1.38089956e-06 9.99881983e-01 2.78331245e-05
9.99562562e-01 2.01372593e-03 7.52124761e-05 1.45035756e-06]
[ 4.35841102e-05 9.99971151e-01 9.99959111e-01 2.36362739e-05
9.99997735e-01 2.10983402e-04 1.00746372e-04 9.99401808e-01]]
batches 5000, loss 0.00103774
2018-01-18 19:36:31.519871: W tensorflow/core/framework/op_kernel.cc:1192] Not found: ./save/copy_task/NTM; No such file or directory
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./save/copy_task/NTM; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, controller/basic_rnn_cell/bias/_7507, controller/basic_rnn_cell/kernel/_7509, init/init_M/_7511, init/init_r_0/_7513, init/init_state/_7515, o2o/o2o_b/_7517, o2o/o2o_w/_7519, o2p/o2p_b/_7521, o2p/o2p_w/_7523, optimizer/controller/basic_rnn_cell/bias/RMSProp/_7525, optimizer/controller/basic_rnn_cell/bias/RMSProp_1/_7527, optimizer/controller/basic_rnn_cell/kernel/RMSProp/_7529, optimizer/controller/basic_rnn_cell/kernel/RMSProp_1/_7531, optimizer/init/init_M/RMSProp/_7533, optimizer/init/init_M/RMSProp_1/_7535, optimizer/init/init_r_0/RMSProp/_7537, optimizer/init/init_r_0/RMSProp_1/_7539, optimizer/init/init_state/RMSProp/_7541, optimizer/init/init_state/RMSProp_1/_7543, optimizer/o2o/o2o_b/RMSProp/_7545, optimizer/o2o/o2o_b/RMSProp_1/_7547, optimizer/o2o/o2o_w/RMSProp/_7549, optimizer/o2o/o2o_w/RMSProp_1/_7551, optimizer/o2p/o2p_b/RMSProp/_7553, optimizer/o2p/o2p_b/RMSProp_1/_7555, optimizer/o2p/o2p_w/RMSProp/_7557, optimizer/o2p/o2p_w/RMSProp_1/_7559)]]During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1573, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: ./save/copy_task/NTM; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, controller/basic_rnn_cell/bias/_7507, controller/basic_rnn_cell/kernel/_7509, init/init_M/_7511, init/init_r_0/_7513, init/init_state/_7515, o2o/o2o_b/_7517, o2o/o2o_w/_7519, o2p/o2p_b/_7521, o2p/o2p_w/_7523, optimizer/controller/basic_rnn_cell/bias/RMSProp/_7525, optimizer/controller/basic_rnn_cell/bias/RMSProp_1/_7527, optimizer/controller/basic_rnn_cell/kernel/RMSProp/_7529, optimizer/controller/basic_rnn_cell/kernel/RMSProp_1/_7531, optimizer/init/init_M/RMSProp/_7533, optimizer/init/init_M/RMSProp_1/_7535, optimizer/init/init_r_0/RMSProp/_7537, optimizer/init/init_r_0/RMSProp_1/_7539, optimizer/init/init_state/RMSProp/_7541, optimizer/init/init_state/RMSProp_1/_7543, optimizer/o2o/o2o_b/RMSProp/_7545, optimizer/o2o/o2o_b/RMSProp_1/_7547, optimizer/o2o/o2o_w/RMSProp/_7549, optimizer/o2o/o2o_w/RMSProp_1/_7551, optimizer/o2p/o2p_b/RMSProp/_7553, optimizer/o2p/o2p_b/RMSProp_1/_7555, optimizer/o2p/o2p_w/RMSProp/_7557, optimizer/o2p/o2p_w/RMSProp_1/_7559)]]Caused by op 'save/SaveV2', defined at:
File "copy_task.py", line 101, in
main()
File "copy_task.py", line 29, in main
train(args)
File "copy_task.py", line 45, in train
saver = tf.train.Saver(tf.global_variables())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1218, in init
self.build()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1227, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1263, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 748, in _build_internal
save_tensor = self._AddSaveOps(filename_tensor, saveables)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 296, in _AddSaveOps
save = self.save_op(filename_tensor, saveables)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 239, in save_op
tensors)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1163, in save_v2
shape_and_slices=shape_and_slices, tensors=tensors, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessNotFoundError (see above for traceback): ./save/copy_task/NTM; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, controller/basic_rnn_cell/bias/_7507, controller/basic_rnn_cell/kernel/_7509, init/init_M/_7511, init/init_r_0/_7513, init/init_state/_7515, o2o/o2o_b/_7517, o2o/o2o_w/_7519, o2p/o2p_b/_7521, o2p/o2p_w/_7523, optimizer/controller/basic_rnn_cell/bias/RMSProp/_7525, optimizer/controller/basic_rnn_cell/bias/RMSProp_1/_7527, optimizer/controller/basic_rnn_cell/kernel/RMSProp/_7529, optimizer/controller/basic_rnn_cell/kernel/RMSProp_1/_7531, optimizer/init/init_M/RMSProp/_7533, optimizer/init/init_M/RMSProp_1/_7535, optimizer/init/init_r_0/RMSProp/_7537, optimizer/init/init_r_0/RMSProp_1/_7539, optimizer/init/init_state/RMSProp/_7541, optimizer/init/init_state/RMSProp_1/_7543, optimizer/o2o/o2o_b/RMSProp/_7545, optimizer/o2o/o2o_b/RMSProp_1/_7547, optimizer/o2o/o2o_w/RMSProp/_7549, optimizer/o2o/o2o_w/RMSProp_1/_7551, optimizer/o2p/o2p_b/RMSProp/_7553, optimizer/o2p/o2p_b/RMSProp_1/_7555, optimizer/o2p/o2p_w/RMSProp/_7557, optimizer/o2p/o2p_w/RMSProp_1/_7559)]]During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "copy_task.py", line 101, in
main()
File "copy_task.py", line 29, in main
train(args)
File "copy_task.py", line 77, in train
saver.save(sess, args.save_dir + '/' + args.model + '/model.tfmodel', global_step=b)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1594, in save
raise exc
ValueError: Parent directory of ./save/copy_task/NTM/model.tfmodel doesn't exist, can't save.
softgear@lobe6:~/work/snowkylin_ntm_mann/ntm$
In Differential Neural Computers they first create a pre-output with the input and previous read vector then before writing the real output from the cell they add the current read vector also. Here as in the paper there is no addition of the current read vector . Am I right?
In this line I don't understand the k_stragedy. Can you please explain it? I understand you use part of the output to calculate the similarity and another one to aloha parameter. What is the other one?
In copy task v2 the implementation of TensorFlow 2, where are you compiling, training, and saving the model. Could you please let me know that? After running task v2, how can I test the model?
Dear author:
I felt confused about the following code episodes, Could u explain it?
Thanks.
num_parameters_per_head = self.memory_vector_dim + 1 + 1 + (self.shift_range * 2 + 1) + 1
num_heads = self.read_head_num + self.write_head_num
total_parameter_num = num_parameters_per_head * num_heads + self.memory_vector_dim * 2 * self.write_head_num
When training you plot using the w_list in this line. What do you plot here?
is this the plot between , read weight list and write weight list?
Hi,
This is a great work. I have read your ppt and it is very clear. I also have read the paper. But I have a question. What is kt (key vector)? How to get key vector? because the paper doesn't mention it clearly.
Best regards,
Albert
Hi, thanks for your code, I use these to do sequence prediction task, I find there is a problem that the vector in different memory slots tend to be same. So do you know how to fix this problem.
And for MANN, I find w_write(t) = w_read(t-1) + w_lt(t), why use t-1 timestep read weight rather than t timestep read weight, I think w_read(t) is more related to w_write(t), is there some consideration ?
thanks
In this line you have use self.num_head parameter.
As I understood in the paper this stage we need to make the least value of the used vector to 1 and all other positions to zero.
Here the top_k operation is giving, indices of the used vector in descending order of their values(since k=memory size). From that, we need to find the position which gives the smallest value and make it to 1 and others to zero.
But why we need num_head parameter?
I assumed you have used this data set. But it has only 964 classes.
But you have used a number of training and testing classes respectively 1200 and 423.
Also when training my training loss seems to keep around 80 , for about 2000 epochs.
Lines 106 to 116 in 7db4068
It looks there is a recurrence happening on all elements in the state dictionary. Wouldn't you need
to do a tf.stop_gradient()
on each of these items (except controller ouput) to be consistent with the paper? At the very least there should be no recurrence on the memory M
. Do you agree?
I got an error when I run copy_task.py --mode test.
First, I learn NTM with this command
python3 copy_task.py --rnn_num_layers 3 --rnn_size 64 --max_seq_length 10 --memory_size 20 --memory_vector_dim 8 --vector_dim 4 --num_epoches 10000
Next, I run copy_task.py with test mode. But I got the following error ; AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'
$ python3 copy_task.py --mode test
2018-01-19 18:51:24.433505: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-19 18:51:24.581123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-19 18:51:24.581508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.90GiB freeMemory: 11.76GiB
2018-01-19 18:51:24.581522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "copy_task.py", line 101, in
main()
File "copy_task.py", line 31, in main
test(args)
File "copy_task.py", line 85, in test
saver.restore(sess, ckpt.model_checkpoint_path)
AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'
$
From this part onward you have initialize the Memory matrix , previous read vector , weight list as variables. So these variables also can get updated with the optimization. Isn't that a problem? because these things should be dynamic.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.