xiaomaxiao / keras_ocr Goto Github PK

View Code? Open in Web Editor NEW

523.0 28.0 194.0 1.98 MB

用keras实现OCR定位、识别

License: Other

Jupyter Notebook 94.82% Python 5.18%

keras_ocr's Introduction

简介

用keras实现ocr定位、识别，后端tensorflow.

环境 win10 titanx

识别

数据集链接: https://pan.baidu.com/s/1jJWfDmm 密码: vh8p (中英数300W+,语料不均衡)
crnn：vgg + blstm + blstm + ctc
densenet-ocr ：densent + ctc

网格结构	GPU	准确率	模型大小
crnn	60ms	0.972
densent+ctc	8ms	0.982	18.9MB

定位

链接：https://pan.baidu.com/s/1oEWTrx20G41iNaJYF-xa6w 提取码：szj7 (ICDR 2013+少量中文)
CTPN：

即使大部分数据集基于英文，但在中文定位中也表现良好。
各位如有中文标注的数据集愿意分享，可提issues

参考

[1]https://github.com/eragonruan/text-detection-ctpn

[2]https://github.com/senlinuc/caffe_ocr

keras_ocr's People

Contributors

Stargazers

Watchers

Forkers

apple1987 fendaq zouxiaoyuonly dafeix zmxheart zfxu fitrialif simmoncn torusknot38 coinlq hardsoft2023 lxj0276 qwzhong1988 courao rkshuai wyw636 xgmiao tobechao juventi baicol zzzzzzrc lvpython yuanzhenjie leftstone2015 desont xyt2008 szad670401 gehongpeng zhdy008 zhishan332 pustar shubhampachori12110095 william-stocks anazou mukever cronaldo1997 szdree leidaguo lovaster matrixplayer xikunlun001 oujunke delonzhou verydemo daoyijushi jeffery-zhougang wangyu190810 gegetang liuxin949490 alan000 tonyxia2016 bison31205 tianzhongsong adiffm iwanggp wkhunter xinghalo josephuan nwf5d machinelp hukongtao frankfqchen iwii0425 ctolib happog 2020zyc springtty wjl198435 mrkamizhou phrmgb sadknight0001 hexiaosong aurora11111 wjj0317 thorpham zjz5250 phychaos tiravata chenbocqu yishuihanhan luanrp cyneck frankgoahead roughsoft liumihan dreadlord1984 wang422003 machine4life vivelvlv dencechen gothicfox yejiachen huxiao64 hbulaoma alwc styjb qqgeogor kai2020-hello magiccodess wjianxz

keras_ocr's Issues

Could you make model & dataset available on Google Drive or Dropbox?

Thank you for sharing your implementation.
Could you please make model & dataset available on Google Drive or Dropbox?
This would help people outside of China get the files easier.

nice! but can u git an example for us to test one pic?

nice! but can u git an example for us to load model and test one pic? I'm not familiar with this keras ,3Q

在新的数据集测试，效果不是很好，想请教下原因，谢谢

你好，我用你的crnn部分做了测试。

用的代码为https://github.com/xiaomaxiao/keras_ocr/blob/master/densent_ocr/densenet-ocr-test.ipynb
用的模型是你百度云里面的“weights-densent-32-0.9846.hdf5”，但是检测出来的准确率在百分之七十左右。

想请问下是因为我的数据集和你训练用的数据集有一定差异吗？如果我用我的语料库用SynthText生成数据集重新训练，是不是效果会好些。

非常感谢！

楼主能请问Json.load（)中的jsonpath 是您数据集中的train.txt文件还是其他什么的文件

labels = np.ones([batchsize, maxlabellength]) * 10000哥们这个label为啥这样啊？

为啥要初始化成这个样？另外，
labels[i, :len(str)] = [int(i) - 1 for i in str]，后边应该用另外的变量吧，因为前边用到i了啊？

Would u mind sharing the weights files of ctpnlstm？

Would u mind sharing the weights files of ctpn？
Since my computer has CPU only, training large datasets is really very hard.
I mean this file: E:\deeplearn\ctpn2018\model\weights-ctpnlstm-19.hdf5
I'll appreciate it very much if you could lend me a hand.

What are those output shapes of lambda_1 & blstm layers in ctpn?

In the ctpn model, I've replaced the bidirectional unit GRU with LSTM.

x2 = Bidirectional(LSTM(128,return_sequences=True), name='blstm')(x1)

But the shape of blstm really confused me.
Take an image as input, let say its shape is [512,512,3]. Then the output shape of rpn_conv1 is [None,32,32,512] and the output shape of lambda_2 is [None,32,32,256]. So what are the shapes of lambda_1 and blstm?

rpn_conv1 (Conv2D)              (None, 32, 32, 512)
-----------------------------------------------------
lambda_1 (Lambda)               (None, None, 512)
-----------------------------------------------------
blstm (Bidirectional)           (None, None, 256)
-----------------------------------------------------
lambda_2 (Lambda)               (None, 32, 32, 256)

Below are source codes of reshape and reshape2. From my suspection, since batch size is 1, rpn_conv1.output.shape is [1,32,32,512], after reshape function, lambda_1.output.shape is [32,32,512]. Then blstm.output.shape is [32,32,256] and lambda_2.output.shape is [1,32,32,256].

def reshape(x):
    import tensorflow as tf 
    b = tf.shape(x)
    x = tf.reshape(x,[b[0]*b[1],b[2],b[3]])
    return x


def reshape2(x):
    import tensorflow as tf 
    x1,x2 = x
    b = tf.shape(x2)
    x = tf.reshape(x1,[b[0],b[1],b[2],256])
    return x

However, when I try to replace two lambda layers with keras built in reshape layers, I got error:

ValueError: total size of new array must be unchanged

which suggests me that the total size after lambda_1 is different from 1x32x32x512, thus reshape.output.shape is not [32,32,512]. But this result is confict with what I know from the source code of reshape function. Would you please tell me the actual output shapes of lambda_1 and blstm layers? Thanks a lot.

训练数据集

你好，请问训练数据是用SynthText生成的吗？可否提供下训练集呢，谢谢

训练acc达到96%，可是预测乱码

使用楼主提供的数据集，训练acc达到了96%，可是每次预测都是乱码，请问为什么会出现这种情况呢？谢谢！

DistutilsPlatformError: Unable to find vcvarsall.bat

"DistutilsPlatformError: Unable to find vcvarsall.bat"
What is this all about?
i'm using Window 10
Conda environment

训练数据包含在百度网盘里？

你好，我读了下你的CRNN的代码，发现训练数据有从json格式中读出，但我在你给的百度云连接中并没有找到，而且我发现你这只有训练的代码好像并没有测试的代码，你的准确率怎么来的？是先训练集上？

densenet-ocr.ipynb里面 maxlabellength = 20 ?

训练样本的图片都是10个字符，对CTC不是很了解，这里maxlabellength设置为20是什么原因呢？

请问你这个char_std_5990，读取出来len是5994嘛？

char=''
with io.open('/home/zhangtao/work/caffe_ocr_for_linux/data/char_std_5990.txt',encoding='utf-8') as f:
for ch in f.readlines():
ch = ch.strip('\r\n')
char=char+ch
char =char[1:]+'卍'
print('nclass:',len(char))
出来是5994啊，他第一行的blank读取出来是u'l', u'a', u'n', u'k'这四个值？

x[i] = np.expand_dims(img,axis=2)哥们你这句是想干嘛？

前边x = np.zeros((batchsize, imagesize[0], imagesize[1], 1), dtype=np.float)
后边x[i] = np.expand_dims(img,axis=2)你这是想干什么呢？

blstm时的reshape结果是不是笔误了?

def reshape2(x):
x1,x2 = x
b = tf.shape(x2)
x = tf.reshape(x1,[b[0],b[1],b[2],256])
return x
刚开始看错了. 没问题的代码.

各位老哥不会有训练时候磁盘占有100%，GPU占用不高的情况吗？

如题。

densenet+ctc这个方法有论文吗

关于模型

你好，老铁可以分享下这几个训练好的模型文件嘛？

densent训练出错

当我执行下面这句话时
res = model.fit_generator(cc1,
steps_per_epoch =3279601// batch_size,
epochs = 100,
validation_data =cc2 ,
validation_steps = 364400// batch_size,
callbacks =[earlystop,checkpoint,tensorboard],
verbose=1
)
出现如下错误：
Epoch 1/100

StopIteration Traceback (most recent call last)
in ()
4 validation_data =cc2 ,
5 validation_steps = 364400// batch_size,
----> 6 callbacks =[earlystop,checkpoint,tensorboard]
7 )

2 frames
/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
179 batch_index = 0
180 while steps_done < steps_per_epoch:
--> 181 generator_output = next(output_generator)
182
183 if not hasattr(generator_output, 'len'):

StopIteration:
请问，这个要如何解决？

测试问题。

File "test.py", line 179, in
bbox = utils.bbox_transfor_inv(anchor,regr)
File "/data/zhangshihao/ctpn_keras/utils.py", line 175, in bbox_transfor_inv
Cyx = Vcx * ha + Cya
ValueError: operands could not be broadcast together with shapes (52920,) (51460,)

测试图像的h，w必须被16整除吗？还是必须和训练图像的h，w保持一致。。。

训练数据集

您好我是一名初学者，我解压数据图片后训练发现报错显示无法找到图片，我觉得可能是路径问题，请问应该吧数据集放在那里或者在哪个地方修改图像路径的配置

关于字符集的问题

你好，我使用百度网盘中的char_std_5990.txt后，开头的blank也会被输入到字符集中，nclass=5995而不是5990，这个问题如何解决？谢谢。而且5988也不是最后的有效字符

ctpn训练

Overfit in CTPN model

Why don't you use validset and early stopping?I think CTPN model may be overfiting.

ValueError: could not broadcast input array from shape (41,299,1) into shape (32,280,1)

---> 23 x[i] = np.expand_dims(img,axis=2)
24
25 print('-------------')

ValueError: could not broadcast input array from shape (41,299,1) into shape (32,280,1)

测试速率太慢

用自己的数据训练了一个模型，但是测试速度太慢了，一万多张用了得有快一天的时间，不知道是哪里的问题，渴望朋友们的帮助~~~~

About CTPN predict

您好，最近准备复线一下您的代码，关于CTPNpredict方面，我进行训练之后，得到大概66M的参数模型，然后利用您的预测代码，在我将您的GRU替换为CuDNNLSTM之后，系统提升我显存不够，需要6.2G，我是1060ti，6G，所以显存不够，这个是正常现象么？

densnet+ctc 识别模型.h5转pb出现识别效果不一样，

大佬们是如何把.h5转pb

可以分享一下你训练好的18.9MB大小的模型文件么？

@xiaomaxiao 你好！
我看到你在README.md文件中提到 densent+ctc 8ms 0.982 18.9MB
冒昧的问一下，18.9MB的模型文件是E:\deeplearn\OCR\Sample\model\weights-densent-09.hdf5 么？
如果是的话，可以分享一下么？
万分感谢！

对英文句子识别好像不太好

不知道是不是因为识别机制，如果识别英文句子，或者句子中有出现空格的，都会被模型过滤掉。举个例子：The fox jump over the dog. 会被识别成Thefoxjumpoverthedog
是不是因为cnn对空白部分发现没有特征可以识别？这个问题应该怎么办呢？

About pre-processing of Image datasets

Bro, can u explain the proper structuring and procedure of pre processing Image Dataset for the model and also u tried in densent and in ctpn and also RCNN as gen model ... I didn't get complete picture of the model structure can u please explain it in english.. I'm also new to deep learning...

xiaomaxiao / keras_ocr Goto Github PK

keras_ocr's Introduction

简介

识别

定位

参考

keras_ocr's People

Contributors

Stargazers

Watchers

Forkers

keras_ocr's Issues

当我执行下面这句话时 res = model.fit_generator(cc1, steps_per_epoch =3279601// batch_size, epochs = 100, validation_data =cc2 , validation_steps = 364400// batch_size, callbacks =[earlystop,checkpoint,tensorboard], verbose=1 ) 出现如下错误： Epoch 1/100

Recommend Projects

Recommend Topics

Recommend Org

Jobs

当我执行下面这句话时
res = model.fit_generator(cc1,
steps_per_epoch =3279601// batch_size,
epochs = 100,
validation_data =cc2 ,
validation_steps = 364400// batch_size,
callbacks =[earlystop,checkpoint,tensorboard],
verbose=1
)
出现如下错误：
Epoch 1/100