GithubHelp home page GithubHelp logo

deep_ocr's Introduction

deep ocr

See README_en.md for English installation documentation.

只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整:

git clone https://github.com/JinpengLI/deep_ocr.git ~/deep_ocr
virtualenv ~/deep_ocr_env
source ~/deep_ocr_env/bin/activate
pip install -r ~/deep_ocr/requirements.txt
cd ~/deep_ocr && python setup.py install

测试

source ~/deep_ocr_env/bin/activate && cd ~/deep_ocr && ./bin/deep_ocr_reco data/holiday_notification.jpg -v -d

旧版说明

部分还能用,暂时保留,以后准备删除.

估计很多开发员使用tesseract做中文识别,但是结果不是一般的差,譬如下面的图片

alt text

$ tesseract -l chi_sim data/test_data.png out_test_data
看到恨多公司在招腭大改癫和机器字习胸人 v 我有3个建议 (T) 忧T ' 2个上t较靠遭
胸人就譬了 v不是越多越好 (2) 这T '2个人要能给大蒙上踝'倩邂知L目 (3) 不要招
不宣代四胸人:虹大改癫和机器字习胸v不裹目宣 (或者宣过) 大量代四v基本上就
只会忽悠了

其实现在做文字识别不是很难,特别基于深度学习,这里是这个项目的reco_chars.py脚本,基于caffe的识别效果,是不是好很多?而且代码比tesseract短很多。

$ python reco_chars.py
看很多公苘在招聘天数据和机器学习人我有个建议找个较靠谱
的人就够了不是越多越好这个人要给大家上课传递知识不要招
不写代码的人做天数据机器学习的不亲写或者写过天且代码基本上就
只会忽悠了

大家可以基于caffe训练自己的字体,系统基于这个文章开发单个字的识别:

Deep Convolutional Network for Handwritten Chinese Character Recognition

http://yuhao.im/files/Zhang_CNNChar.pdf

通过 Docker 安装

先安装docker,以下教程在Ubuntu 14.04 通过测试

https://www.docker.com/

下载deep_ocr_workspace.zip (https://pan.baidu.com/s/1nvz2wrBhttps://pan.baidu.com/s/1qYPKH3Y )

两个文件的md5sum值,用于校验文件是否成功下载。

$ md5sum deep_ocr_workspace.zip
ffeda7ea6604e7b8835c05a33fa0459e  deep_ocr_workspace.zip
$ md5sum deep_ocr_workspace.z01
ea66796c2bbdb2bec9b7ee28eb44012d  deep_ocr_workspace.z01

解压到本地硬盘,譬如到以下地方 (~/deep_ocr_workspace)

cat deep_ocr_workspace.z* > unsplit_deep_ocr_workspace.zip
unzip unsplit_deep_ocr_workspace.zip -d ~/

这个zip包含deep_ocr所有需要数据文件(由于太大了,所以放百度云了)。所有数据到解压到 ~/deep_ocr_workspace,你也可以把需要处理的数据放到这个文件夹。

基于cpu

docker pull jinpengli/deep_ocr_cpu_docker:latest

启动 docker container

docker run -ti --volume=${HOME}/deep_ocr_workspace:/workspace jinpengli/deep_ocr_cpu_docker:latest /bin/bash
cd /opt/deep_ocr
git pull origin master

volume用于mount到container里面,这样可以获取上面的识别结果。

python /opt/deep_ocr/reco_chars.py

然后可以继续你们的开发。。。。加油。。。

身份证识别

暂时不是很稳定,需要加一些语义模型。等等吧。。。。

识别图片

识别图片

执行命令

export WORKSPACE=/workspace
deep_ocr_id_card_reco --img $DEEP_OCR_ROOT/data/id_card_img.jpg             --debug_path /tmp/debug             --cls_sim ${WORKSPACE}/data/chongdata_caffe_cn_sim_digits_64_64             --cls_ua ${WORKSPACE}/data/chongdata_train_ualpha_digits_64_64

识别结果:

...
ocr res:
============================================================
name
韦小宝
============================================================
address
北京市东城区累山前街4号
紫禁城敬事房
============================================================
month
12
============================================================
minzu
汉
============================================================
year
1654
============================================================
sex
男
============================================================
id
1X21441114X221243X
============================================================
day
20

deep_ocr's People

Contributors

jinpengli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep_ocr's Issues

codes for detecting characters in non-text images

Great thanks for the models & codes.

I noticed in http://chongdata.com/ocr your api could recognize images with some texts very well. But in the open sourced codes here the results are not so good, with some useless letters in the results.

I think you have done some text detection in your online api, any ideas to share the codes here?

Thanks again for the codes shared.

RuntimeError: Could not open file

我下载了deep_ocr_workspace.zip 和 reco_chars.py 运行脚本出现以下错误,而且你的压缩包在window下解压出错。 我感觉是你的压缩文件有问题

root@orange-VirtualBox:~/caffe/python# python reco_chars.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1223 17:24:19.496032 3764 _caffe.cpp:122] DEPRECATION WARNING - deprecated use of Python interface
W1223 17:24:19.496183 3764 _caffe.cpp:123] Use this instead (with the named "weights" parameter):
W1223 17:24:19.496206 3764 _caffe.cpp:125] Net('/workspace/data/chongdata_caffe_cn_sim_digits_64_64/deploy_lenet_train_test.prototxt', 1, weights='/workspace/data/chongdata_caffe_cn_sim_digits_64_64/lenet_iter_50000.caffemodel')
Traceback (most recent call last):
File "test.py", line 294, in
caffe_cls = CaffeCls(model_def, model_weights, y_tag_json_path)
File "test.py", line 20, in init
caffe.TEST)
RuntimeError: Could not open file /workspace/data/chongdata_caffe_cn_sim_digits_64_64/lenet_iter_50000.caffemodel

Failed to parse NetParameter file

there is some error on the bellow:

python3 reco_chars.py
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 6:15: Message type "caffe.LayerParameter" has no field named "input_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0103 17:15:15.282599 7488 upgrade_proto.cpp:928] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: workspace/data/chongdata_caffe_cn_sim_digits_64_64/deploy_lenet_train_test.prototxt
*** Check failure stack trace: ***
已放弃 (核心已转储)

what's the matter of this file?

best regards!

docker环境无法运行

出现如下错误:
libdc1394 error: Failed to initialize libdc1394
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0507 12:27:13.683022 47 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0507 12:27:13.683099 47 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0507 12:27:13.683104 47 _caffe.cpp:142] Net('/workspace/data/chongdata_caffe_cn_sim_digits_64_64/deploy_lenet_train_test.prototxt', 1, weights='/workspace/data/chongdata_caffe_cn_sim_digits_64_64/lenet_iter_50000.caffemodel')
Traceback (most recent call last):
File "/opt/deep_ocr/reco_chars.py", line 294, in
caffe_cls = CaffeCls(model_def, model_weights, y_tag_json_path)
File "/opt/deep_ocr/reco_chars.py", line 20, in init
caffe.TEST)
RuntimeError: Could not open file /workspace/data/chongdata_caffe_cn_sim_digits_64_64/deploy_lenet_train_test.prototxt

请大佬指点

识别结果如何接收

volume用于mount到container里面,这样可以获取上面的识别结果

这个命令没有看懂,不知道如何从宿主机上接收docker里的识别结果

如何得到输出结果?

我在centos下进行了安装,执行了这个命令./bin/deep_ocr_reco data/holiday_notification.jpg -v
只得到如下输出:

! image to reco: data/holiday_notification.jpg
! no-normalization
estimate skew angle and rotate
estimate_thresholds lo 0.964706 and hi 1.000000
data/holiday_notification.jpg lo-hi (0.96 1.00) angle  0.0 ! no-normalization
scale= 5.744562646538029
computing column separators
considering at most 3 whitespace column separators
computing lines
propagating labels
spreading labels

Chinese Character segment in ID-card

hi, @JinpengLI ,
so appreciated with your great work.
i am so interested in your character segment in ID-card.
do you mean deep_ocr_id_card_segmentation is your code for segmention?
i test it in new ID cards. it seams a little bad for them.
or do you have the new version of the code?
thanks so much.
look forward to your reply .

Could not open file /workspace/data/chongdata_caffe_cn_sim_digits_64_64/deploy_lenet_train_test.prototxt

我仔细检查了每个依赖库、模型文件路径、解压等因素,还是报这个错误?所以,是你上传的压缩文件错误吗?
Traceback (most recent call last):
File "reco_chars.py", line 294, in
caffe_cls = CaffeCls(model_def, model_weights, y_tag_json_path)
File "reco_chars.py", line 20, in init
caffe.TEST)
RuntimeError: Could not open file /workspace/data/chongdata_caffe_cn_sim_digits_64_64/deploy_lenet_train_test.prototx

best regards!

Get_nothing_about_another_image

Thanks for your great share! I run your wonderful code successfully with your test image.
I replaced the test image with data/id_card_img.jpg which is a funny id card. but i get nothing ,

Then i use the scripts in the ./bin/deep_ocr_id_card_segmentation to get a gray_texts.jpg. Then i run
the reco_chars.py for the gray_texts.jpg. The result is not reasonable at all. i think the reason is that the these characters are not contained in train data .Is that right?
Maybe i need finetune the model on a large character dataset ,could you tell me how to finetune?

Thanks for your help and nice work!

通过 Docker 安装,识别问题

您好,按步骤搭设环境,运行python /opt/deep_ocr/reco_chars.py 出现错误 如下图:
1

而后执行命令识别,这个时候例子提供的图片只识别出了号码,文字没有显示。如图
2017-11-18_16-11-50
请指导,谢谢!!!

测试程序运行出错

运行测试程序时出现错误:
! image to reco: data/holiday_notification.jpg
Traceback (most recent call last):
File "./bin/deep_ocr_reco", line 137, in
show_img(raw, title="raw image")
File "./bin/deep_ocr_reco", line 27, in show_img
plt.gray()
File "/mnt/OCR/deep_ocr_env/local/lib/python2.7/site-packages/matplotlib/pyplot.py", line 3829, in gray
set_cmap("gray")
File "/mnt/OCR/deep_ocr_env/local/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2269, in set_cmap
im = gci()
File "/mnt/OCR/deep_ocr_env/local/lib/python2.7/site-packages/matplotlib/pyplot.py", line 335, in gci
return gcf()._gci()
File "/mnt/OCR/deep_ocr_env/local/lib/python2.7/site-packages/matplotlib/pyplot.py", line 601, in gcf
return figure()
File "/mnt/OCR/deep_ocr_env/local/lib/python2.7/site-packages/matplotlib/pyplot.py", line 548, in figure
**kwargs)
File "/mnt/OCR/deep_ocr_env/local/lib/python2.7/site-packages/matplotlib/backend_bases.py", line 161, in new_figure_manager
return cls.new_figure_manager_given_figure(num, fig)
File "/mnt/OCR/deep_ocr_env/local/lib/python2.7/site-packages/matplotlib/backends/_backend_tk.py", line 1044, in new_figure_manager_given_figure
window = Tk.Tk(className="matplotlib")
File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 1767, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

环境:
Ubuntu 14.04
Python 2.7

找到BUG了!!

PYTHON的BUG
㧟 䏝 㤘 䥽 䁖 䦃 㸆
这几个字无法通过PIL画出来,不信你试试
我平常不用PYTHON,这种BUG该往哪报啊??
简单的测试代码

font = ImageFont.truetype("STXIHEI.TTF", 300)
img = Image.new("L", (300, 300), "black")
draw = ImageDraw.Draw(img)
ch = u'㸆'
draw.text((0, -75), ch, 255, font=font)
img.show()

你可以试试,哈哈蛤
把STXIHEI.TTF找出来,系统里就有,是华文细黑!!

请教

你好, 按照步骤配置好虚拟环境后, 执行如下:
haiyun@dell-Precision-Tower-5810:~$ source ~/deep_ocr_env/bin/activate && cd ~/deep_ocr && ./bin/deep_ocr_reco data/holiday_notification.jpg -v -d
! image to reco: data/holiday_notification.jpg
Traceback (most recent call last):
File "./bin/deep_ocr_reco", line 137, in
show_img(raw, title="raw image")
File "./bin/deep_ocr_reco", line 27, in show_img
plt.gray()
File "/home/haiyun/deep_ocr_env/lib/python2.7/site-packages/matplotlib/pyplot.py", line 3932, in gray
set_cmap("gray")
File "/home/haiyun/deep_ocr_env/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2372, in set_cmap
im = gci()
File "/home/haiyun/deep_ocr_env/lib/python2.7/site-packages/matplotlib/pyplot.py", line 335, in gci
return gcf()._gci()
File "/home/haiyun/deep_ocr_env/lib/python2.7/site-packages/matplotlib/pyplot.py", line 601, in gcf
return figure()
File "/home/haiyun/deep_ocr_env/lib/python2.7/site-packages/matplotlib/pyplot.py", line 548, in figure
**kwargs)
File "/home/haiyun/deep_ocr_env/lib/python2.7/site-packages/matplotlib/backend_bases.py", line 161, in new_figure_manager
return cls.new_figure_manager_given_figure(num, fig)
File "/home/haiyun/deep_ocr_env/lib/python2.7/site-packages/matplotlib/backends/_backend_tk.py", line 1044, in new_figure_manager_given_figure
window = Tk.Tk(className="matplotlib")
File "/home/haiyun/install/python-2.7.11/lib/python2.7/lib-tk/Tkinter.py", line 1814, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display ":0"
在网上搜索“_tkinter.TclError: couldn't connect to display ":0"” 尝试去解决没有成功, 烦请指点! 多谢!

11.2号,我没有成功

运行deep_ocr_reco,报错:ImportError: No module named ocrolib
剩下的几个可执行文件报错则是:ImportError: No module named cv2
看了一下requirement.txt里面确实没有cv2,不知道这里怎么过的。

你好~请问windows下可以运行deep_ocr吗

请问windows下可以运行deep_ocr吗,能不能用python调用,还是只能使用命令行运行。
另:之前没有使用过docker,从docker官网下载windows稳定版时无法安装。百度云上下载两个文件的速度很慢。不知有没有方法可以解决。

deep_ocr_make_caffe_dataset的时候报错

您好:
执行虫数据中lesson4的deep_ocr_make_caffe_dataset命令时候,images文件夹生成了,但是没有生成图片, 报错代码:
File "/opt/deep_ocr/bin/deep_ocr_make_caffe_dataset", line 83, in
lang_chars = lang_chars_gen.do()
File "build/bdist.linux-x86_64/egg/deep_ocr/lang_aux.py", line 27, in do
ImportError: No module named langs.lower_eng
langs已经添加到了python模块中,请问这个是什么原因导致的呢?

博客代码

您好!你的博客代码非常优雅简洁,可否fork你的博客代码?GitHub上好像没有看到。

关于训练集的疑问

首先,感谢你的开源项目~
我的问题是:我看到之前的问题中有提到训练集的问题,也下载了百度网盘的数据。知道可以用字体文件生成训练集。请问训练集难道每个类别只有一张图片吗?如果不是的话更多的训练数据是如何自动产生的?

binary image has result but rgb image not

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 597502990
I0110 09:56:18.573720 37 net.cpp:761] Ignoring source layer mnist
I0110 09:56:18.760609 37 net.cpp:761] Ignoring source layer loss
image
the following image has and result is not as good as raw OCR
image

关于文字转图片的问题

我上次问过,您没有回答清楚啊~
我没看出来你拿到身份证字体格式后,如何遍历下所有文字形成图片??
你能说得清楚些吗
如何从文字转到图片??

ubuntu16.0.4 docker 执行reco_chars.py 出错

1, 下载deep_ocr_workspace.zip
2,docker pull jinpengli/deep_ocr_cpu_docker:latest
3,docker run -ti --volume=${HOME}/deep_ocr_workspace:/workspace jinpengli/deep_ocr_cpu_docker:latest /bin/bash
4,python /opt/deep_ocr/reco_chars.py
会出现错误:
root@dd66a9208e12:/workspace# python /opt/deep_ocr/reco_chars.py
libdc1394 error: Failed to initialize libdc1394
WARNING: Logging before InitGoogleLogging() is written to STDERR

。。。中间省略一些普通日志。。。

Traceback (most recent call last):
File "/opt/deep_ocr/reco_chars.py", line 364, in
output_tag_to_max_proba = caffe_cls.predict_cv2_imgs(np_char_imgs)
File "/opt/deep_ocr/reco_chars.py", line 66, in predict_cv2_imgs
self._predict_cv2_imgs_sub(cv2_imgs, i, pos_end)
File "/opt/deep_ocr/reco_chars.py", line 53, in _predict_cv2_imgs_sub
item = (self.y_tag_json[str(index)],
KeyError: '5613'

识别准确率很低

  • 费了好大劲终于编译成功,试验了提供的图片识别率还不错,但是自己拍照书上的文字完全无法识别,用文本编辑器输入文字后再截图能识别出来,但是错误非常多,大概不到50%的识别准确率。
  • 试验了身份证识别也是一样的情况,样例图片能够识别,但是网上下载的清晰的身份证图片识别率很低,很多错误,自己拍的身份证图片也是一样的几乎无法识别

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.