watsonyanghx / cnn_lstm_ctc_tensorflow Goto Github PK

View Code? Open in Web Editor NEW

363.0 363.0 212.0 69 KB

CNN+LSTM+CTC based OCR implemented using tensorflow.

License: MIT License

Python 100.00%

cnn ctc lstm ocr tensorflow

cnn_lstm_ctc_tensorflow's People

Contributors

Stargazers

Watchers

Forkers

lxj0276 yyjabiding toxic-0518 af258963 rainyliuliu simmoncn zmxheart apple1987 chenqitao magictroy amore-hdu flashtianjiao hbwst888 coca520 torusknot38 kingbirdpaz fendaq mrpedi tiagoalvesber 1320800521 afternoonzhou lengjiaxu 980044579 wangkai9608 anujssj5 ustcwxn sirlic aakashdhondiyal goelshanky ieee820 alongwithyou juventi hans-shen p0m brianzhu01 rsicak zouwen198317 levinj michaelshing alexanderluo abhishek-peri jupinter rabi3elbeji berthahsu cjjc1220 crazyvertigo omarbakker jaechoon2 anubhavrohatgi mengtong88 fanyuforest jiajiaplus superkingjc akkaxdxd githup2016yjh 27260102 hubble-bubble raziahm qiongxiao technoartista entonytang buntysenrug mathias3 gbc8181 zsun227 ogata-mediadrive fengwusheng1 mialrr lovebobo xuming76 sebastiencerecare shrincy baolinhtb mpqzf zliu63 hujunyao lizipeng claudiazop moses1994 nhmduy xieliuliu davidtranno1 yanzaixu davidcastilloalvarado amanuelemma najourahal pancakeawesome sowlu zhuzhu718 trantorrepository singingkettle 92xianshen 1359874550 yunwenhuang sniranjan premkarki47 alan000 brucekyle99 trial2try alexliyang

cnn_lstm_ctc_tensorflow's Issues

About the data

Hi,
Is it possible that you provide your data (maybe part of them) for those want to have a try? A link or something. Thanks.

Some problems aboult this code

Infact the "max_stepsize" in this code should't be 64.The "max_stepsize" is equal to 12,which is shrunk from original "image_width"(180) to 180/2/2/2/2 = 12.Remenber the core idea in CRNN+CTC is that we split the image vertically to many slices,and we predict each slice's classes,finally using CTC to decode the predicted sequence to the respectd result.For example "aaa_bb_c_"and "a__b_ccc" both respect to the same label "abc",you can also read the paper for more details.

But when I run the wrong code in author's dataset,and I got 98% accuracy while I got a bad result in VGGWord dataset.Finally I got a good result after changing the code.

So, why this code work in your situation,I am very courious about this.Thank you.

Problem with frozen pb

I trained the model with a custom dataset and got the checkpoint files. I froze the model using this script

import tensorflow as tf
def freeze_graph(model_dir, output_node_names, frozen_graph_name):
    if not tf.gfile.Exists(model_dir):
        raise AssertionError(
            "Export directory doesn't exists. Please specify an export "
            "directory: %s" % model_dir)

    if not output_node_names:
        print("You need to supply the name of a node to --output_node_names.")
        return -1

    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_dir)
    input_checkpoint = checkpoint.model_checkpoint_path

    # We precise the file fullname of our freezed graph
    absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_dir + "/" + frozen_graph_name + ".pb"

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True

    # We start a session using a temporary fresh Graph
    with tf.Session(graph=tf.Graph()) as sess:
        # We import the meta graph in the current default Graph
        saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

        # We restore the weights
        saver.restore(sess, input_checkpoint)
        gd = sess.graph.as_graph_def()
        # We use a built-in TF helper to export variables to constants
        output_graph_def = tf.graph_util.convert_variables_to_constants(
            sess,  # The session is used to retrieve the weights
            gd,  # The graph_def is used to retrieve the nodes
            output_node_names.split(",")  # The output node names are used to select the usefull nodes
        )

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))

    return output_graph_def

freeze_graph('./checkpoint','SparseToDense','ocr.pb')

But when I'm loading the graph from the protobuf file, I'm getting this error:

ValueError: Input 0 of node import/cnn/unit-4/bn4/BatchNorm/AssignMovingAvg/cnn/unit-4/bn4/BatchNorm/moving_mean/AssignAdd was passed float from import/cnn/unit-4/bn4/BatchNorm/cnn/unit-4/bn4/BatchNorm/moving_mean/local_step:0 incompatible with expected float_ref.

I know this is a little off topic but any help is appreciated.

Feature extraction using CNN

Hi,
I would like to extract feature sequence of a text line image using CNN.
How can I perform this ?
Thank you in advance for your help.

Can it support Tensorflow 2.0 with Windows10?

how to inference one by one?

sorry... the batch size was used in LSTM structure, so when I inference, I have to send a batch of data: one real data and other zeros.
So how can I inference one by one ?
thanks so much!

natural scene

Can this identify text in a natural scene, such as a letter on a billboard

这个方案好像很多人用，问下是否有相关的论文可否提示下，最近想研究下这块

谢谢

请问labels.txt的格式是什么样的？

测试集精度问题。

你好，我使用自己的数据在测试集上精度还不错，但是检查了一下错误的都是重叠的字符的漏检，下面是部分测试集结果：
38387_1077.jpg 107
38388_100,005.jpg 100,05
38389_1077.jpg 107
38393_100,005.jpg 10,05
38394_1077.jpg 107
38640_131,005.jpg 131,05
38797_61,051.jpg 61,0651
39128_44,438.jpg 4,438
39545_157,333.jpg 157,33
4876_314,554.jpg 314,54
4878_268,866.jpg 268,86
5223_111,055.jpg 111,05
5276_211,904.jpg 21,904
546_32,772.jpg 32,72
571_128,883.jpg 128,83
664_148,733.jpg 148,73
672_144,150.jpg 14,150
7218_102,332.jpg 102,32
7221_100,267.jpg 10,2657
7654_77,132.jpg 7,132
7731_215,223.jpg 215,23
7787_111,702.jpg 11,702
7791_104,773.jpg 104,73
请问针对这个问题有什么好点的解决方案吗？应该调整哪里？谢谢

how to solve this problem?

Change the image width and height

Hello,I chang the Image width and height from(60,180)to(80,500),then I get an error:

InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [40,288], In[1]: [176,512]
[[Node: lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/concat, lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul/Enter)]]
[[Node: Mean/_37 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_950_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Is there anything else I should change to fix this error?

Training does not begin:

Hi Guys

I have prepared a small dataset just for trying out the network and see how it works. It seems like that its able to load the data set well and prints (Begin Training) but after that it just stops and do nothing.Here is what i see on screen:
CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=./imgs/train/ --val_dir=./imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 --mode=train

feature_h: 4, feature_w: 12
lstm input shape: [128, 12, 256]
loading train data
('size: ', 11)
loading validation data
size: 6

2018-05-29 11:47:19.300427: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-29 11:47:19.954690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-29 11:47:19.955398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.50GiB
2018-05-29 11:47:19.955416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-29 11:47:20.485722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-29 11:47:20.485760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-29 11:47:20.485768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-29 11:47:20.485968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3237 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
=============================begin training=============================
as you can see Training does not begin and i dont get any errors either

How to inference in test images?

Hi, Dear All, thanks a lot for this great project!
I have trained the model with 32x128 OCR images successfully. I have a question, how do we test the new test images with the model? Using sliding window? I mean generally speaking, the images detected from the previous text detection branch are variable lengths, how do we input these images into the model to get the prediction? I thought about the sliding window, but could you please provide some advice or reference papers on this? Thanks.

about ctc cost nan and soaring avg_train_cost

got ctc cost nan error after 30 epoch in chinese sentence ocr training...
I can delay the error by smaller lr, bigger lr decay.
But how to prevent ctc cost nan?

Does this OCR work with Arabic text?

Hi I wanted to know whether this OCR works with joint Arabic text yet?

Can you provide the checkpoint of your trained model ?

I would be appreciated if you can provide the ckpt of your trained model.
Thanks. ^_^

Lablels in mode Infer

Hi @watsonyanghx i found your code and i think i'm going to use it for license plate ocr but i want to ask first :
In the inference mode do the images i want to test in mode infer have to have labels?

关于batch normalization层

如果想要保存跟新moving_mean和moving_variance的话，好像要写以下代码：
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_op = optimizer.minimize(loss)
但是好像没找到相关内容，想问作者是不是漏了写了？

Helper.py error

Hey guys

Recently i have been trying to work with this network but when i want to prepare the data using helper.py i encounter some errors. I have not done any modifications on this file except the images and label paths, and that's all. Here is the error i get after running the script. I would appreciate if anyone could help me with this:
Traceback (most recent call last):
File "helper.py", line 116, in
image_path_list = load_img_path(images_path)
File "helper.py", line 68, in load_img_path
tmp.sort(key=lambda x: int(x.split('.')[0]))
File "helper.py", line 68, in
tmp.sort(key=lambda x: int(x.split('.')[0]))
ValueError: invalid literal for int() with base 10: 'labels'

IndexError: list index out of range

Hi, Thanks for sharing the great work!

I downloaded the data based on the suggestion of this link.

Then I tried running the training script, but encountered below error,

    train_feeder = utils.DataIterator(data_dir=train_dir)
  File "/home/levin/workspace/snrprj/CNN_LSTM_CTC_Tensorflow/utils.py", line 73, in __init__
    code = image_name.split('/')[-1].split('_')[1].split('.')[0]
IndexError: list index out of range

It looks to me that the script expects to get label for each image from its filename. So to get the code run properly and train the model, we will have to first rename the image files based on the labels.txt file, is this correct?

ValueError: need more than 2 values to unpack

use 60*180size to train ,num_classes = 12 then i got it

loading train data, please wait---------------------
('get image: ', 15000)
loading validation data, please wait---------------------
('get image: ', 4999)
2017-11-06 13:16:08.547013: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-06 13:16:08.641450: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-11-06 13:16:08.641675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1031] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.29GiB
2017-11-06 13:16:08.641691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
=============================begin training=============================
No handlers could be found for logger "Traing for OCR using CNN+LSTM+CTC"
('batch', 99, ': time', 0.1522228717803955)
('batch', 199, ': time', 0.1701350212097168)
('batch', 299, ': time', 0.14639997482299805)
('batch', 99, ': time', 0.1512739658355713)
Traceback (most recent call last):
File "main.py", line 215, in

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 207, in main

File "main.py", line 111, in train

ValueError: need more than 2 values to unpack

the num of num_classes

hi, thank you for your codes. I am confused by the num_classes,.
+-* + () + 10 digit + blank + space
num_classes = 3 + 2 + 10 + 1 + 1
I understand the ctc loss need to add a special ctc_blank, but there is no space in the label ,why there are add two 1 ?
I notice that in the training phase, the code run the below part to generate label
charset = '0123456789+-*()'
encode_maps = {}
decode_maps = {}
for i, char in enumerate(charset, 1):
encode_maps[char] = i
decode_maps[i] = char

SPACE_INDEX = 0
SPACE_TOKEN = ''
encode_maps[SPACE_TOKEN] = SPACE_INDEX
decode_maps[SPACE_INDEX] = SPACE_TOKEN
I mean there is no space in you lable, so if remove encode_maps[SPACE_TOKEN] = SPACE_INDEX, does the num_class will not need to add another 1?

Feed Images with Variable length

Hey guys i am trying to use the model to train on the images with variable length size.As you know this data set is a fixed length size and there is no problem with running the model but when it comes to other datasets such as IAM we get error since the are not fixed size.One of the techniques that have been mentioned is to do zero padding. Now my question is am i suppose to do the zero padding on the images itself before feeding them to the network and is there any other ways to overcome this issue of variable sizes.

Thanks

Cost continuing reduce while accuracy is always ZERO

Hello, everyone:
I run this script with author's dataset well, but i get into into trouble like title when i train the model with my own dataset.

some pics of my dataset:

these pics are 30x500, 25 chars in each pic. i used about 260k of these to train, 65k to validate.
words in pics are randomly selected from some drug infos like this:

with open('thistxt', 'r', encoding='utf-8') as f:
# read each line into a list
all_lines = f.read().split('\n').strip()
# link each line to a string
data_str = ''.join(all_lines)
# generate word with random index
rand_word = data_str[a_rand_num, a_rand_num + word_length]
there are 196 unique chars in this txt, so my num_classes in the model is 196. is my dataset not large enough or what? i'd appreciate if anyone can help. 中文也可以

How does the inference work?

I strated trainning the model and i stoped it manually via keyboard exception to test the inference but when i run the command i get no errors and nothing happens?

can this model deal with dynamic length images?

hi,
the image height and width is fixed in this model,
how to change it to deal with dynamic length images?
THX!

raise _exceptions.DuplicateFlagError.from_flag

Hi, I try to train but using this cmd command:

main.py --train_dir=../imgs/train/ --val_dir=../imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 -mode=train

But got this error:

Traceback (most recent call last):
  File "C:\Projects\CNN_LSTM_CTC_Tensorflow\main.py", line 14, in <module>
    import cnn_lstm_otc_ocr
  File "C:\Projects\CNN_LSTM_CTC_Tensorflow\cnn_lstm_otc_ocr.py", line 6, in <module>
    import utils
  File "C:\Projects\CNN_LSTM_CTC_Tensorflow\utils.py", line 43, in <module>
    tf.app.flags.DEFINE_string('log_dir', './log', 'the logging dir')
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\platform\flags.py", line 58, in wrapper
    return original_function(*args, **kwargs)
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 241, in DEFINE_string
    DEFINE(parser, name, default, help, flag_values, serializer, **args)
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 82, in DEFINE
    flag_values, module_name)
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 104, in DEFINE_flag
    fv[flag.name] = flag
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_flagvalues.py", line 430, in __setitem__
    raise _exceptions.DuplicateFlagError.from_flag(name, self)
absl.flags._exceptions.DuplicateFlagError: The flag 'log_dir' is defined twice. First from absl.logging, Second from utils.  Description from first occurrence: directory to write logfiles into

How can it be fixed?

can this algorithm deal with dynamic length characters?

I run this code successfully, including both train set and validation set. Then I changed one of the validation image to add 2 characters, previously it is '7+0 * 9', I changed it to '7+0 * 9+7'. But it was recognized as '7+(0 * 9)'. The '+7' font style is same with it in this image, I copied to add it, so it is not font style issue. I attached the image I made. Please take a look. Can you tell me why?

请问一下，第一次训练了一半后保存的记录，第二次如何继续训练？谢谢

一个小问题

请问x.set_shape([shp[0], filters[3], 48])中的48怎么得来的，我改用 shp = x.get_shape().as_list()，然后换成shp[1]或shp[2]都会报错，为什么？请问有谁知道？

文件名中 73091_(8+9)*4.png 含有特殊字符，是不能命名成功的，不知道您是怎么处理的

在网上看过您的CNN_LSTM_CTC_Tensorflow 源码，也下载了数据集，想重现您的结果，有几个问题请教一下，谢谢！
1，源码是在这里下载的，https://github.com/watsonyanghx/CNN_LSTM_CTC_Tensorflow，数据集也下载解压了。
D:\Tensorflow\CNN_LSTM_CTC_Tensorflow-master\imgs
解压后目录结构
imgs\labels.txt
imgs\image_contest_level_1\

2.运行 helper.py 后，在imgs 目录下生成了 X_train.txt、 X_val.txt、 y_train.txt、 y_val.txt4个文件是正常的。

X_train.txt 训练的文件名
X_val.txt 测试的文件名

y_train.txt 训练的答案
y_val.txt 测试的答案

但
cp_file(X_train, y_train, './imgs/train/')
cp_file(X_val, y_val, './imgs/val/')
"D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py
['./imgs/image_contest_level_1/0.png' './imgs/image_contest_level_1/1.png'
'./imgs/image_contest_level_1/2.png' './imgs/image_contest_level_1/3.png'
'./imgs/image_contest_level_1/4.png' './imgs/image_contest_level_1/5.png'
'./imgs/image_contest_level_1/6.png' './imgs/image_contest_level_1/7.png'
'./imgs/image_contest_level_1/8.png' './imgs/image_contest_level_1/9.png']
Traceback (most recent call last):
File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py", line 129, in
cp_file(X_train, y_train, './imgs/train/')
File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py", line 102, in cp_file
shutil.copyfile(file_path, dest_filename)
File "D:\Program Files\Python365\lib\shutil.py", line 121, in copyfile
with open(dst, 'wb') as fdst:
OSError: [Errno 22] Invalid argument: './imgs/train/73091_(8+9)*4.png'

进程完成，退出码 1
文件名中 73091_(8+9)*4.png 含有特殊字符，是不能命名成功的，不知道您是怎么处理的

3.运行 cnn_lstm_otc_ocr.py 报错

"D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/cnn_lstm_otc_ocr.py
D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/cnn_lstm_otc_ocr.py:42: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert (FLAGS.cnn_count <= count_, "FLAGS.cnn_count should be <= {}!".format(count_))

4.运行 main.py

"D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/main.py

feature_h: 4, feature_w: 12
lstm input shape: [40, 12, 256]
loading train data
size: 0
loading validation data
size: 0

2018-07-12 11:02:14.624545: I c:\users\user\source\repos\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-07-12 11:02:14.844809: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.63GiB
2018-07-12 11:02:14.845239: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-12 11:02:16.119318: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-12 11:02:16.119683: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-07-12 11:02:16.119937: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-07-12 11:02:16.137500: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6410 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
=============================begin training=============================

进程完成，退出码 0

运行 utils.py
"D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/utils.py

进程完成，退出码 0

谢谢指点，不知您是否有微信或qq方便联系，请教学习，谢谢

why seq_len equals batch_size?

accourding to
cnn_lstm_otc_ocr.py :
self.seq_len = tf.fill([x.get_shape().as_list()[0]], feature_w)

that is seq_len equals batch_size.
but why?

sharing the model

Hello,

Can you share the model that you got 99 accuracy? uploading on google drive or Box?

Thanks!

这个魅族百度深度学习比赛好像是初赛的，当时谁有参加决赛吗？谁有决赛的数据集吗？谢谢

这个魅族百度深度学习比赛好像是初赛的，群主当时参加决赛了吗？当时谁有参加决赛吗？谁有决赛的数据集吗？谢谢

中文有人试过可以训练吗？ could i use chinese char to train it ?

Mode: Infer does not work

I have trained the 100K images with 80:20 training to validation ratio. My model has completed 9 checkpoints. My test set consists of 40 images taken from the same validation set, just for testing the code. The test set is labeled 1 to 40. But when I pass this command :

python ./main.py --infer_dir=./imgs/infer/
--checkpoint_dir=./checkpoint/
--num_gpus=0
--mode=infer
following error is produced :

2018-01-31 15:25:43.350523: W tensorflow/core/framework/op_kernel.cc:1198] Failed precondition: sequence_length(0) <= 12
Traceback (most recent call last):
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
return fn(*args)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
status, run_metadata)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 12
[[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_1, _arg_lstm/Fill_0_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./main.py", line 185, in
tf.app.run()
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "./main.py", line 180, in main
infer(FLAGS.infer_dir, FLAGS.mode)
File "./main.py", line 155, in infer
dense_decoded_code = sess.run(model.dense_decoded, feed)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 12
[[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_1, _arg_lstm/Fill_0_1)]]

Caused by op 'CTCBeamSearchDecoder', defined at:
File "./main.py", line 185, in
tf.app.run()
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "./main.py", line 180, in main
infer(FLAGS.infer_dir, FLAGS.mode)
File "./main.py", line 115, in infer
model.build_graph()
File "/home/anubhav/Downloads/Manish Sir/CNN_LSTM_CTC_Tensorflow-master (2)/cnn_lstm_otc_ocr.py", line 24, in build_graph
self._build_train_op()
File "/home/anubhav/Downloads/Manish Sir/CNN_LSTM_CTC_Tensorflow-master (2)/cnn_lstm_otc_ocr.py", line 158, in _build_train_op
merge_repeated=False)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py", line 273, in ctc_beam_search_decoder
merge_repeated=merge_repeated))
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 77, in _ctc_beam_search_decoder
top_paths=top_paths, merge_repeated=merge_repeated, name=name)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): sequence_length(0) <= 12
[[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_1, _arg_lstm/Fill_0_1)]]

How to deal with this error and correctly run the program.

For further details checkout the issue #8

how long to we need to train your model?

on your training dataset, how long do i need train on GPU 1080?

watsonyanghx / cnn_lstm_ctc_tensorflow Goto Github PK

cnn_lstm_ctc_tensorflow's People

Contributors

Stargazers

Watchers

Forkers

cnn_lstm_ctc_tensorflow's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs