GithubHelp home page GithubHelp logo

keras_ocr's Introduction

简介

用keras实现ocr定位、识别,后端tensorflow.

  • 环境 win10 titanx

识别

  • 数据集链接: https://pan.baidu.com/s/1jJWfDmm 密码: vh8p (中英数300W+,语料不均衡)

  • crnn:vgg + blstm + blstm + ctc

  • densenet-ocr :densent + ctc

网格结构 GPU 准确率 模型大小
crnn 60ms 0.972
densent+ctc 8ms 0.982 18.9MB

定位

  1. 即使大部分数据集基于英文,但在中文定位中也表现良好。
  2. 各位如有中文标注的数据集愿意分享,可提issues

demo

参考

[1]https://github.com/eragonruan/text-detection-ctpn

[2]https://github.com/senlinuc/caffe_ocr

keras_ocr's People

Contributors

xiaomaxiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras_ocr's Issues

在新的数据集测试,效果不是很好,想请教下原因,谢谢

你好,我用你的crnn部分做了测试。

用的代码为https://github.com/xiaomaxiao/keras_ocr/blob/master/densent_ocr/densenet-ocr-test.ipynb
用的模型是你百度云里面的“weights-densent-32-0.9846.hdf5”,但是检测出来的准确率在百分之七十左右。

想请问下是因为我的数据集和你训练用的数据集有一定差异吗?如果我用我的语料库用SynthText生成数据集重新训练,是不是效果会好些。

非常感谢!

Would u mind sharing the weights files of ctpnlstm?

Would u mind sharing the weights files of ctpn?
Since my computer has CPU only, training large datasets is really very hard.
I mean this file: E:\deeplearn\ctpn2018\model\weights-ctpnlstm-19.hdf5
I'll appreciate it very much if you could lend me a hand.

What are those output shapes of lambda_1 & blstm layers in ctpn?

In the ctpn model, I've replaced the bidirectional unit GRU with LSTM.

x2 = Bidirectional(LSTM(128,return_sequences=True), name='blstm')(x1)

But the shape of blstm really confused me.
Take an image as input, let say its shape is [512,512,3]. Then the output shape of rpn_conv1 is [None,32,32,512] and the output shape of lambda_2 is [None,32,32,256]. So what are the shapes of lambda_1 and blstm?

rpn_conv1 (Conv2D)              (None, 32, 32, 512)
-----------------------------------------------------
lambda_1 (Lambda)               (None, None, 512)
-----------------------------------------------------
blstm (Bidirectional)           (None, None, 256)
-----------------------------------------------------
lambda_2 (Lambda)               (None, 32, 32, 256)

Below are source codes of reshape and reshape2. From my suspection, since batch size is 1, rpn_conv1.output.shape is [1,32,32,512], after reshape function, lambda_1.output.shape is [32,32,512]. Then blstm.output.shape is [32,32,256] and lambda_2.output.shape is [1,32,32,256].

def reshape(x):
    import tensorflow as tf 
    b = tf.shape(x)
    x = tf.reshape(x,[b[0]*b[1],b[2],b[3]])
    return x


def reshape2(x):
    import tensorflow as tf 
    x1,x2 = x
    b = tf.shape(x2)
    x = tf.reshape(x1,[b[0],b[1],b[2],256])
    return x 

However, when I try to replace two lambda layers with keras built in reshape layers, I got error:

ValueError: total size of new array must be unchanged

which suggests me that the total size after lambda_1 is different from 1x32x32x512, thus reshape.output.shape is not [32,32,512]. But this result is confict with what I know from the source code of reshape function. Would you please tell me the actual output shapes of lambda_1 and blstm layers? Thanks a lot.

训练数据集

你好,请问训练数据是用SynthText生成的吗?可否提供下训练集呢,谢谢

训练数据包含在百度网盘里?

你好,我读了下你的CRNN的代码,发现训练数据有从json格式中读出,但我在你给的百度云连接中并没有找到,而且我发现你这只有训练的代码好像并没有测试的代码,你的准确率怎么来的?是先训练集上?

请问你这个char_std_5990,读取出来len是5994嘛?

char=''
with io.open('/home/zhangtao/work/caffe_ocr_for_linux/data/char_std_5990.txt',encoding='utf-8') as f:
for ch in f.readlines():
ch = ch.strip('\r\n')
char=char+ch
char =char[1:]+'卍'
print('nclass:',len(char))
出来是5994啊,他第一行的blank读取出来是u'l', u'a', u'n', u'k'这四个值?

关于模型

你好,老铁可以分享下这几个训练好的模型文件嘛?

densent训练出错

当我执行下面这句话时
res = model.fit_generator(cc1,
steps_per_epoch =3279601// batch_size,
epochs = 100,
validation_data =cc2 ,
validation_steps = 364400// batch_size,
callbacks =[earlystop,checkpoint,tensorboard],
verbose=1
)
出现如下错误:
Epoch 1/100

StopIteration Traceback (most recent call last)
in ()
4 validation_data =cc2 ,
5 validation_steps = 364400// batch_size,
----> 6 callbacks =[earlystop,checkpoint,tensorboard]
7 )

2 frames
/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
179 batch_index = 0
180 while steps_done < steps_per_epoch:
--> 181 generator_output = next(output_generator)
182
183 if not hasattr(generator_output, 'len'):

StopIteration:
请问,这个要如何解决?

测试问题。

File "test.py", line 179, in
bbox = utils.bbox_transfor_inv(anchor,regr)
File "/data/zhangshihao/ctpn_keras/utils.py", line 175, in bbox_transfor_inv
Cyx = Vcx * ha + Cya
ValueError: operands could not be broadcast together with shapes (52920,) (51460,)

测试图像的h,w必须被16整除吗?还是必须和训练图像的h,w保持一致。。。

训练数据集

您好我是一名初学者,我解压数据图片后训练发现报错显示无法找到图片,我觉得可能是路径问题,请问应该吧数据集放在那里或者在哪个地方修改图像路径的配置

关于字符集的问题

你好,我使用百度网盘中的char_std_5990.txt后,开头的blank也会被输入到字符集中,nclass=5995而不是5990,这个问题如何解决?谢谢。而且5988也不是最后的有效字符

Overfit in CTPN model

Why don't you use validset and early stopping?I think CTPN model may be overfiting.

测试速率太慢

用自己的数据训练了一个模型,但是测试速度太慢了,一万多张用了得有快一天的时间,不知道是哪里的问题,渴望朋友们的帮助~~~~

About CTPN predict

您好,最近准备复线一下您的代码,关于CTPNpredict方面,我进行训练之后,得到大概66M的参数模型,然后利用您的预测代码,在我将您的GRU替换为CuDNNLSTM之后,系统提升我显存不够,需要6.2G,我是1060ti,6G,所以显存不够,这个是正常现象么?

对英文句子识别好像不太好

不知道是不是因为识别机制,如果识别英文句子,或者句子中有出现空格的,都会被模型过滤掉。举个例子:The fox jump over the dog. 会被识别成Thefoxjumpoverthedog
是不是因为cnn对空白部分发现没有特征可以识别?这个问题应该怎么办呢?

About pre-processing of Image datasets

Bro, can u explain the proper structuring and procedure of pre processing Image Dataset for the model and also u tried in densent and in ctpn and also RCNN as gen model ... I didn't get complete picture of the model structure can u please explain it in english.. I'm also new to deep learning...

建议用HDF5重新写一下训练和测试数据

我用一块1080ti的显卡在做训练,发现显卡占用率只有6%左右。
因为图片太小了,磁盘IO效率很低,瓶颈在磁盘IO上。

建议将图片文件处理成HDF5,既不需要解压缩,载入和训练速度也更快。

char的数据表问题

请问你这个blank是怎么表示的啊,我用那个char5990.txt,第一个blank会被识别成5个分开的字符,请问你这部分怎么处理的?我改成空格怕他对后边的ctc有影响,因为ctc不是要加blank分开字符嘛

side refinement

好像ctpn_train里面没有side refinement的部分??

densenet + blstm

你好,你的训练代码里面有定义一个空的函数,def lstm ,请问这个部分有实现吗,你的代码是根据https://github.com/senlinuc/caffe_ocr 这里面改的吗?我看这里面有densenet-res-blstm densenet-sum-blstm-full-res-blstm结构。谢谢,想试试加入rnn后的效果

densenet+CTC训练准确率99%,预测文字出错

你好,我用自己制作的数据集约六万行文字图片训练densenet+CTC,大概20个epoch后训练准确率接近99%,但是模型测试出现问题,只预测一个字或者为空,即使是预测训练集的样本也会出现同样的现象,请教为什么会出现这样的问题呢?
image

image

jason and txt file request

Hi! I am trying to implement the CRNN code for a personal dataset but I'm getting some errors while doing that! Can you please share the jason and txt file used for your code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.