GithubHelp home page GithubHelp logo

aaron-xichen / cnn-lstm-ctc Goto Github PK

View Code? Open in Web Editor NEW
63.0 63.0 28.0 23.51 MB

An implementation of LSTM and CTC to recognize simple english sentence image

Python 71.47% Shell 2.00% Jupyter Notebook 26.53%

cnn-lstm-ctc's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cnn-lstm-ctc's Issues

Training process

When I train the model from scratch, loss : nan phenomenon is detected in the first epoch.

Can you share '99.pkl' file to us?

thanks for your time to solve my problem.
Can you share '99.pkl' trained model file to us or give me a link to download it. so that I can make some data set to test it and I neednt train it again.
thank you very much

would you to implement this with Keras ?

Hello, Aaron

Recently I wanna do some tests about OCR, then I found your code,
I think maybe it's a good start.can you give us some data&&img samples?
BTW: Would you like to implement this with (Keras)[https://github.com/fchollet/keras] ?

Best Regards!

loss: nan, iter:1/455(1, 1.076s)

hello,请问为什么我使用您的程序ctc loss从一开始就为nan呢?希望您指导一下,非常感谢~
下面是显示的内容:
Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, cuDNN not available)
C:\Anaconda\lib\site-packages\theano\tensor\signal\downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
loaded 29143 samples from D:\xugang\OCR\cnn-lstm-ctc-master\dataset\english_sentence\train_img_list.txt
loaded 2914 samples from D:\xugang\OCR\cnn-lstm-ctc-master\dataset\english_sentence\val_img_list.txt
building symbolic tensors(0.0799999237061)
setting parameters(0.0799999237061)
('n_classes: ', 95)
('multi-step: ', set([79625, 68250, 45500]))
building the model(0.0799999237061)
computing updates and function(0.240000009537)
using normal sgd and learning_rate:0.00999999977648
('bw_lstm_b', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('fw_lstm_W', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('fw_lstm_U', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('fw_lstm_b', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('bw_lstm_W', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('bw_lstm_U', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('hidden_b', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('hidden_W', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
building training function(1.78999996185)
building validating function(29.6099998951)
begin to train(32.8609998226)
.epoch 1/200 begin(32.861)
[prefetch]height: 28, x_max_step:141.0, y_max_width:50
D:\xugang\OCR\cnn-lstm-ctc-master - 1.0\layers\utee.py:137: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
x = np.zeros((batch_size, 1, height, x_max_len)). astype(config.floatX)
D:\xugang\OCR\cnn-lstm-ctc-master - 1.0\layers\utee.py:138: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
x_mask = np.zeros((batch_size, x_max_len)).astype(config.floatX)
..loss: nan, iter:1/455(1, 1.076s)
..detect nan
..loss: nan, iter:1/455(1.076)
Traceback (most recent call last):
File "D:\xugang\OCR\cnn-lstm-ctc-master - 1.0\train.py", line 150, in
sys.exit()
SystemExit

Getting IndexError

I am using this model for similar data except that our images contain Sanskrit words. I created the train, val , test files similar to the ones(i.e image_name followed by ordinals for characters) used in this model.
But in our case, the number of characters(i.e n_classes) is 118(instead of 95 in original one) and y_max_len=200(instead of 50 in original one).
When I train the model , I am getting the following error

loaded 25996 samples from ./dataset/train_img_list.txt
loaded 756 samples from ./dataset/val_img_list.txt
building symbolic tensors(0.84720993042)
('#Train samples: ', 25996)
('#Val samples: ', 756)
('#Train Iterations: ', 406)
('#Val Iterations: ', 11)
setting parameters(0.848186016083)
('n_classes: ', 118)
('multi-step: ', set([40600, 71050, 60900]))
building the model(0.848335027695)
Subtensor{int64}.0
Shape.0
computing updates and function(1.2518889904)
using normal sgd and learning_rate:0.00999999977648
('bw_lstm_b', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
('fw_lstm_W', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
('fw_lstm_U', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
('fw_lstm_b', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
('bw_lstm_W', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
('bw_lstm_U', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
('hidden_b', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
('hidden_W', <class 'theano.tensor.sharedvar.TensorSharedVariable'>)
building training function(2.72188806534)
building validating function(25.7086689472)
begin to train(27.9824080467)
.epoch 1/200 begin(27.982)
[prefetch]height: 150, x_max_step:900.0, y_max_width:200
Traceback (most recent call last):
File "train/train.py", line 148, in
loss = train()
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 898, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in call
self.fn() if output_subset is None else
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 872, in rval
r = p(n, [x[0] for x in i], o)
File "/usr/local/lib/python2.7/dist-packages/theano/tensor/subtensor.py", line 2173, in perform
out[0] = inputs[0].getitem(inputs[1:])
IndexError: index 121 is out of bounds for axis 2 with size 119
Apply node that caused the error: AdvancedSubtensor(Reshape{3}.0, SliceConstant{None, None, None}, InplaceDimShuffle{0,x}.0, <TensorType(int32, matrix)>)
Toposort index: 463
Inputs types: [TensorType(float32, 3D), <theano.tensor.type_other.SliceType object at 0x7f6a4d6d9510>, TensorType(int64, col), TensorType(int32, matrix)]
Inputs shapes: [(900, 64, 119), 'No shapes', (64, 1), (64, 200)]
Inputs strides: [(30464, 476, 4), 'No strides', (8, 8), (800, 4)]
Inputs values: ['not shown', slice(None, None, None), 'not shown', 'not shown']
Outputs clients: [[Reshape{2}(AdvancedSubtensor.0, MakeVector{dtype='int64'}.0), Shape_i{2}(AdvancedSubtensor.0), Shape_i{1}(AdvancedSubtensor.0), Shape_i{0}(AdvancedSubtensor.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "train/train.py", line 84, in
mid_layer_type = BLSTMLayer, forget=False)
File "/misc/me/rohits/aaron-cnn-lstm-ctc/layers/net.py", line 40, in init
blank = options['blank'], log_space = True)
File "/misc/me/rohits/aaron-cnn-lstm-ctc/layers/ctc_layer.py", line 25, in init
self.log_ctc(labels_len_const = labels_len_const)
File "/misc/me/rohits/aaron-cnn-lstm-ctc/layers/ctc_layer.py", line 94, in log_ctc
x1 = self.x[:, T.arange(n_samples)[:, None], self.y]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.