There is a new set of DL-based OCR tools at <a href="https://github.com/NVlabs/ocropus

预处理二值化 <a href="https://github.com/zp-j/binar

OCR 结果的输出格式 <a href="http://openphilology.github.io/nidaba/tei.html" rel="nofollow

Comments (57)

tmbdev commented on June 2, 2024 1

There is a new set of DL-based OCR tools at https://github.com/NVlabs/ocropus3

OCR systems have traditionally relied on huge data collection efforts and supervised training. I think we can only make significant progress in the long run if we change over to self-supervised training. In the pre-DL world, we had some approaches to that (with the original OCRopus), but carrying that over into the DL world will still require significant effort.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

ocropus

https://github.com/jbest/typeface-corpus
https://github.com/sudeepraja/BLSTM-for-supervised-sequence-recognition.git
https://github.com/sbuss/ocropus
https://github.com/tmbdev?tab=repositories
ocropus/ocropy#54

Is there support for non-latin languages like Chinese, Japanese or Thai?

Here is report about training clstm to recognize Japanese:

tmbdev/clstm#49

After 50 days of training on my MacBookPro, 130000 iterations, against 3877 classes of characters, the clstm has achieved 3.6% error rate so far... The hidden nodes is 800, The trained data is 58.2MB.

Ocropus fork with sane defaults
https://github.com/mittagessen/kraken

kraken is a fork of ocropus intended to rectify a number of issues while preserving (mostly) functional equivalence. Its main goals are:

    Explicit input/output handling ✓
    Clean public API
    Word and character bounding boxes in hOCR ✓
    Tests
    Removal of runtime dependency on gcc ✓
    clstm compatibility ✓

Ticked of goals have been realized while some others still require further work. Pull requests and code contributions are always welcome.

Recognition Models for Kraken and CLSTM https://github.com/mittagessen/kraken-models

kraken-models

This repository contains recognition models for kraken, both legacy pyrnn (converted to pronn) and clstm ones. To have one or more model added open a pull request or send an email to [email protected].

https://github.com/kendemu/char-rnn-chinese
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.

OpenPhilology https://github.com/OpenPhilology
nidaba

An expandable and scalable OCR pipeline

Updated 8 days ago
1 0
tei-ocr

TEI customization for OCR generated layout and content information

Updated on 22 Mar
Scala 1 1
migne-text-reuse

Investigating text reuse in the Patrologia.

Updated on 18 Feb
C 0 2
ancientgreekocr-ocr-evaluation-tools

forked from ryanfb/ancientgreekocr-ocr-evaluation-tools

'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.

Updated on 6 Nov 2015
3 0
Iris

The OCR pipeline to succeed Rigaudon

Updated on 12 Apr 2015
Python 13 8
phaidra

eLearning for historical languages.

Updated on 13 Jan 2015
0 0
Planning

For now we're just using the wiki to discuss future steps.

Updated on 5 Dec 2014
1 1
canonical

OPP work iterating PerseusDL

Updated on 7 Oct 2014
Java 1 2
hocrinfoaggregator

forked from fbaumgardt/hocrinfoaggregator

HocrInfoAggregator

Updated on 31 Mar 2014
Java 1 1
OpenGreekAndLatin

forked from GreekOCR/OpenGreekAndLatin

Based on Rigaudon, hOCRInfoAggregator and CoPhi Proofreader

Updated on 18 Sep 2013
Python 0 4
rigaudon

forked from brobertson/rigaudon

Polytonic Greek OCR engine derived from Gamera and based on the work of Dalitz and Brandt

Updated on 16 Sep 2013
Java 0 1
cophiproofreader

Proof-reading system for OCR applied to Greek and Latin texts

Updated on 11 Sep 2013
3 People
@fbaumgardt
fbaumgardt
@srdee
srdee
@ThomasK81
ThomasK81

https://github.com/jknollmeyer/whiteboard ocropus 的nodejs封装
https://github.com/naptha/ocracy

https://github.com/Totkichi/SciOCR
https://github.com/manhcuogntin4
https://github.com/manhcuogntin4/OCR
https://github.com/manhcuogntin4/CLSTM
Transliteration related data files and/or models. https://github.com/googlei18n/transliteration

https://github.com/naptha/tesseract.js

from awesome-ocr.

wanghaisheng commented on June 2, 2024

latest paper in arxiv during 2015-2016 titled ”OCR“
OCR of historical printings with an application to building diachronic corpora:A case study using the RIDGES herbal corpus https://arxiv.org/pdf/1608.02153.pdf
OCR accuracy improvement on document images through a novel pre-processing approach
https://arxiv.org/abs/1509.03456
Automatic quality evaluation and (semi-) automatic improvement of OCR models for historical printings https://arxiv.org/pdf/1606.05157.pdf
OCR Error Correction Using Character Correction and Feature-Based Word Classification https://arxiv.org/pdf/1604.06225.pdf

Towards a robust ocr system for indic scripts
P Krishnan, N Sankaran, AK Singh… - … Systems (DAS), 2014 …, 2014 - ieeexplore.ieee.org
Abstract—The current Optical Character Recognition (OCR) systems for Indic scripts are not
robust enough for recognizing arbitrary collection of printed documents. Reasons for this
limitation includes the lack of resources (eg not enough examples with natural variations, ...
被引用次数：6 相关文章所有 8 个版本引用保存
OCR of historical printings of Latin texts: problems, prospects, progress
U Springmann, D Najock, H Morgenroth… - Proceedings of the First …, 2014 - dl.acm.org
Abstract This paper deals with the application of OCR methods to historical printings of Latin
texts. Whereas the problem of recognizing historical printings of modern languages has
been the subject of the IMPACT program, Latin has not yet been given any serious ...
被引用次数：4 相关文章所有 2 个版本引用保存
Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning
S Otte, M Liwicki, A Zell - Artificial Neural Networks and Machine Learning– …, 2014 - Springer
Abstract In this paper a novel recurrent neural network (RNN) model for gradient-based
sequence learning is introduced. The presented dynamic cortex memory (DCM) is an
extension of the well-known long short term memory (LSTM) model. The main innovation ...
被引用次数：3 相关文章所有 2 个版本引用保存
A sequence learning approach for multiple script identification
A Ul-Hasan, MZ Afzal, F Shafait… - … (ICDAR), 2015 13th …, 2015 - ieeexplore.ieee.org
Abstract-In this paper, we present a novel methodology for multiple script identification using
Long Short-Term Memory (LSTM) networks' sequence-learning capabilities. Our method is able
to identify multiple scripts at text-line level, where two or more scripts are present in the ...
被引用次数：3 相关文章所有 3 个版本引用保存
Generic Text Recognition using Long Short-Term Memory Networks
A Ul-Hasan - 2016 - kluedo.ub.uni-kl.de
Abstract The task of printed Optical Character Recognition (OCR) is considered a “solved”
issue by many Pattern Recognition (PR) researchers. The notion, however, partially true,
does not represent the whole picture. Although, it is true that state-of-the-art OCR systems ...

from awesome-ocr.

wanghaisheng commented on June 2, 2024

深度学习进行目标识别的资源列表：O网页链接包括RNN、MultiBox、SPP-Net、DeepID-Net、Fast R-CNN、DeepBox、MR-CNN、Faster R-CNN、YOLO、DenseBox、SSD、Inside-Outside Net、G-CNN
http://handong1587.github.io/deep_learning/2015/10/09/object-detection.html

from awesome-ocr.

wanghaisheng commented on June 2, 2024

我现同事之前一个工作就是给约炮网站写程序模拟女性和男网友聊天的，用聊天对话数当评价，硬编码，几乎完美通过约炮版图灵测试//@編程菜菜: //@UB_吴斌:不要让你们的客户知道，聊天对象是个计算机，呵呵，后果不堪设想。
@breezedeus
RNN and LSTM的各种资源：O网页链接。利用佳缘对话数据来训练RNN进行交流，有谁感兴趣么？
http://handong1587.github.io/deep_learning/2015/10/09/rnn-and-lstm.html

from awesome-ocr.

wanghaisheng commented on June 2, 2024

自然场景下的识别

https://github.com/tongpi/basicOCR

菜单

《Applying OCR Technology for Receipt Recognition》by Ozhiganov Ivan pdf:http://t.cn/Rqqsban http://t.cn/RqqFY4X

验证码

https://zhuanlan.zhihu.com/p/21344595?f3fb8ead20=357481ecd0939762f4f9dcc75015e93a
http://www.jianshu.com/p/4fadf629895b?utm_campaign=hugo&utm_medium=reader_share&utm_content=note&utm_source=weibo
端到端的OCR：验证码识别

车牌号码

延伸到各种号码车牌号码的识别

印刷文本

简历

https://github.com/Halfish/cvOCR

发票票据

https://github.com/xuwenxue000/PJ_DARKNET
https://github.com/xuwenxue000/PJ_PREDICT_IMG
https://github.com/lxj0276/OCRServer/tree/master/ocr_server
https://github.com/moonChenHaohui/PictureCut

from awesome-ocr.

wanghaisheng commented on June 2, 2024

预处理

二值化

https://github.com/zp-j/binarizewolfjolion
Document image binarization for Project 3A @Mines_Nancy http://zp-j.github.io/blog/2013/10/04/document-binarization/
https://github.com/jon1van/SPIE-DRR-2014

文本定位 opencv

http://stackoverflow.com/questions/23506105/extracting-text-opencv
http://stackoverflow.com/questions/23506105/extracting-text-opencv
Here is an alternative approach that I used to detect the text blocks:

Converted the image to grayscale
Applied threshold (simple binary threshold, with a handpicked value of 150 as the threshold value)
Applied dilation to thicken lines in image, leading to more compact objects and less white space fragments. Used a high value for number of iterations, so dilation is very heavy (13 iterations, also handpicked for optimal results).
Identified contours of objects in resulted image using opencv findContours function.
Drew a bounding box (rectangle) circumscribing each contoured object - each of them frames a block of text.
Optionally discarded areas that are unlikely to be the object you are searching for (e.g. text blocks) given their size, as the algorithm above can also find intersecting or nested objects (like the entire top area for the first card) some of which could be uninteresting for your purposes.

Below is the code written in python with pyopencv, it should easily be ported to C++.

import cv2

image = cv2.imread("card.png")
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) # grayscale
_,thresh = cv2.threshold(gray,150,255,cv2.THRESH_BINARY_INV) # threshold
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
dilated = cv2.dilate(thresh,kernel,iterations = 13) # dilate
contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) # get contours

for each contour found, draw a rectangle around it on original image

for contour in contours:
# get rectangle bounding contour
[x,y,w,h] = cv2.boundingRect(contour)

# discard areas that are too large
if h>300 and w>300:
    continue

# discard areas that are too small
if h<40 or w<40:
    continue

# draw rectangle around contour on original image
cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,255),2)

write original image with added contours to disk

cv2.imwrite("contoured.jpg", image)

The original image is the first image in your post.

After preprocessing (grayscale, threshold and dilate - so after step 3) the image looked like this:

Dilated image

Below is the resulted image ("contoured.jpg" in the last line); the final bounding boxes for the objects in the image look like this:

enter image description here

You can see the text block on the left is detected as a separate block, delimited from its surroundings.

Using the same script with the same parameters (except for thresholding type that was changed for the second image like described below), here are the results for the other 2 cards:

enter image description here

enter image description here
Tuning the parameters

The parameters (threshold value, dilation parameters) were optimized for this image and this task (finding text blocks) and can be adjusted, if needed, for other cards images or other types of objects to be found.

For thresholding (step 2), I used a black threshold. For images where text is lighter than the background, such as the second image in your post, a white threshold should be used, so replace thesholding type with cv2.THRESH_BINARY). For the second image I also used a slightly higher value for the threshold (180). Varying the parameters for the threshold value and the number of iterations for dilation will result in different degrees of sensitivity in delimiting objects in the image.

Finding other object types:

For example, decreasing the dilation to 5 iterations in the first image gives us a more fine delimitation of objects in the image, roughly finding all words in the image (rather than text blocks):

enter image description here

Knowing the rough size of a word, here I discarded areas that were too small (below 20 pixels width or height) or too large (above 100 pixels width or height) to ignore objects that are unlikely to be words, to get the results in the above image.

https://github.com/danvk/oldnyc/blob/master/ocr/tess/crop_morphology.py

from awesome-ocr.

wanghaisheng commented on June 2, 2024

'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy. https://github.com/ryanfb/ancientgreekocr-ocr-evaluation-tools

from awesome-ocr.

wanghaisheng commented on June 2, 2024

参考其接口库模板编辑工具的设计
https://github.com/ushelp/EasyOCR

from awesome-ocr.

wanghaisheng commented on June 2, 2024

OCR 结果的输出格式
http://openphilology.github.io/nidaba/tei.html

hocr 可视化
https://github.com/mlichtenberg/hocrimagemapper
https://github.com/dinosauria123/gcv2hocr 里面包含了hocr的例子

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/kba/awesome-ocr Links to awesome OCR projects https://github.com/kba/awesome-ocr

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://market.aliyun.com/products/57124001/cmapi011523.html?spm=5176.730005-56956004.0.0.CaW56n#sku=yuncode552300007
汉王云-文本识别

from awesome-ocr.

wanghaisheng commented on June 2, 2024

WebAppFind OCR demo - Applies Ocras.js or GOCR.js to a PDF file opened via right-click from the desktop (the Firefox add-on is currently Windows only; ports welcome!)

from awesome-ocr.

wanghaisheng commented on June 2, 2024

结果评估
https://github.com/Early-Modern-OCR/page-evaluator/tree/master/src/main/java/edu/illinois/i3/emop/apps/pageevaluator

https://github.com/Shreeshrii/imagessan
Images and Ground Truth text files in Sanskrit for evaluating Tesseract OCR (3.04) for Sanskrit language (Devanagari script)

https://github.com/Shreeshrii/ocr-evaluation-tools
'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Caption generation from images using deep neural net http://t-satoshi.blogspot.com/2015/12/image-caption-generation-by-cnn-and-lstm.html
Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
Code for paper "Image Caption Generation with Text-Conditional Semantic Attention"
Image caption generation to diagnose chest x-rays using dataset of images and reports
论文

from awesome-ocr.

wanghaisheng commented on June 2, 2024

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Optical Character Recognition of old and noisy print sources.

https://github.com/digiah/oldOCR

https://github.com/jflesch/pyocr
https://github.com/jflesch/libpillowfight#stroke-width-transformation
这个库里包含了从自然场景的图片中抠出文字来的算法还是屌屌的看起来

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Cuneiform https://github.com/PauloMigAlmeida/cuneiform

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/potterhsu/SVHNClassifier-PyTorch
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

from awesome-ocr.

wanghaisheng commented on June 2, 2024

手写字符
https://github.com/tianrolin/HCCR-ResNet

这里采用两种网络进行训练，一个是与MNIST类似的深度卷积网络HCCR3755_cnn_solver.prototxt，另外一个是深度残差网络HCCR3755_res20_solver.prototxt。

对于3755字符识别分类，分别迭代10000次之后，前者网络可以达到91.19%精度，而后者则高达97.23%的精度。

https://github.com/nicklhy/ResNet_caffe2mxnet

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/mateogianolio/ocr
nodejs 版本的验证码识别例子里面主要是英文和数字

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/Kidel/In-Codice-Ratio-OCR-with-CNN
logo

In Codice Ratio (ICR) is a project curated by Roma Tre University in collaboration with Vatican Secret Archives. This project has the purpose of digitalizing the contents of documents and ancient texts from the Archive.

The problem we faced in this repository wes just a part of ICR, basically its core. We had to classify handwritten characters in Carolingian minuscule starting from an image of that character. The input is an ensemble of possible cuts of the word that has to be read, and our system has to be able to decide if a cut is correct and, if it is, which character it is.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/ruiwen905/MLTensorFlow

Use of Google's Open Source Artificial Intelligence API

Develop OCR and Supervised Learning applications using TensorFlow, Scikit and Graphviz
Make use of deep learning to train classifiers to learn to recognise and predict from images and data instead of using conditional rules.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/Shreeshrii/tess4tutorial
Tesseract OCR 4.0.0-alpha LSTM Training data for Sanskrit Transliteration

https://github.com/Shreeshrii/tess4eval_deva
Tesseract OCR 4.0.0-alpha LSTM Engine evaluation for Devanagari Alphabet and Old Orthography https://shreeshrii.github.io/tess4eva…

参考这个的安装测试
https://github.com/Shreeshrii/tess4eval
Tesseract OCR 4.0.0alpha evaluation for Hindi and Sanskrit https://shreeshrii.github.io/tess4eval/

from awesome-ocr.

wanghaisheng commented on June 2, 2024

OCR evaluation brought to you by University of Alicante

https://github.com/impactcentre/ocrevalUAtion/wiki

Glyph Miner, a system for extracting glyphs from early typeset prints

https://github.com/benedikt-budig/glyph-miner

📎 Using scanners and OCR to grep paper documents the easy way (Linux/Windows) https://openpaper.work/

https://github.com/jflesch/paperwork

tess4训练过程
tesseract-ocr/tesseract#819

from awesome-ocr.

wanghaisheng commented on June 2, 2024

A brand logo recognition system using deep convolutional neural networks.

https://github.com/satojkovic/DeepLogo

from awesome-ocr.

wanghaisheng commented on June 2, 2024

【基于计算机视觉/深度学习打造先进OCR工作流】《Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning | Dropbox Tech Blog》by Brad Neuberg O
https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/

from awesome-ocr.

wanghaisheng commented on June 2, 2024

We ended up using a classic computer vision approach named Maximally Stable Extremal Regions (MSERs),
using OpenCV’s implementation. The MSER algorithm 􀃒nds connected regions at di􀃗erent thresholds, or
levels, of the image. Essentially, they detect blobs in images, and are thus particularly good for text.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

测试地址在http://www.onlineocr.net/

http://www.ocrwebservice.com/api/restguide

这个网站提供了ocr的接口 soap和rest的支持中文
输入格式和输出格式都很多暂时没有看到支持输出坐标数据如果能支持的话就好了

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/fierceX/cnn_ocr_mnist
使用卷积神经网络识别组合手写数字

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/psoder3/OCRPractice

An attempt at Optical Character Recognition without being tainted by knowledge of existing implementations

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/danielquinn/paperless
Scan, index, and archive all of your paper documents

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Apache Tika bridge for Node.js. Text and metadata extraction, language detection and more.
https://github.com/ICIJ/node-tika

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/Muhimbi/PDF-Converter-Services-Online

OCR scripts for digitized NYC city directories

https://github.com/nypl-spacetime/ocr-scripts

Optical character recognition ANN for AI class
Python based Open Source framework for document processing, content analysis and data enrichment pipelines http://www.opensemanticsearch.org/etl

https://github.com/StateFromJakeFarm/OCRANN

https://github.com/LanguageMachines/PICCL
A set of workflows for corpus building through OCR, post-correction and Natural Language Processing

https://github.com/CatWang/OCR-Picture-Generators/

This is a simple project to generate simple cropped images with characters. You can generate with Chinese or English characters. Backgrounds are also allowed. Medical bills simulation are also included.

https://github.com/CatWang/Synthesize_text_generation_Python

一个比较复杂的生成真实场景文字的Python项目。原项目只能生成英文。经过修改之后能够生成中文。并且我也添加了图片中文字的切割和对应label的保存代码。

https://github.com/Gr1f0n6x/OCR_NN

Python, Keras, OpenCV
https://github.com/Gr1f0n6x/OnlineOCRMVC

from awesome-ocr.

wanghaisheng commented on June 2, 2024

验证码
https://github.com/jimmikaelkael/pwntcha-testsuite
https://github.com/iveney/pwntcha
http://caca.zoy.org/wiki/PWNtcha
虽然识别部分可能没法用了但预处理绝逼很有价值

https://blog.bmonkeys.net/2014/build-pwntcha-on-ubuntu-14-04

from awesome-ocr.

wanghaisheng commented on June 2, 2024

预处理算法

测试看下来对于复杂的验证码街景中的图标 sobel filter效果甚好

https://github.com/danvk/oldnyc
https://github.com/zmr/namsel/tree/master
https://github.com/Visslo-PCH/Training
https://github.com/Wangsujeon/Etc.sc/tree/495ae8d8043db55edd9cfc468063e471faf502a6/Project1/Project1

https://github.com/AlexOuyang/OCR/
https://github.com/teichgraf/MuLaPeGASim
https://github.com/comrat/ocr-toolkit

=====

https://github.com/shrutikapoyrekar/Licence-Plate-Detector-Recognition
https://github.com/ankitsingh/ANPR/wiki/Algorithm
https://github.com/Wangsujeon/Etc.sc/blob/495ae8d8043db55edd9cfc468063e471faf502a6/Project1/Project1/bookline.cpp
https://github.com/Visslo-PCH/Training/blob/cbbc6adca836217c2c0fa1c2be0b435a5ab2bd18/gaussian_bluring/gaussian_bluring.cpp
https://github.com/whatthefua/latexocr

https://github.com/g4gaj/eazyBill/tree/0e25f85ca4ef2a401c30a2394eb4351b31e98228

使用opencv的方法
https://github.com/liwangjing/opencv-in-python/tree/master

https://github.com/Cid1986/BlindSightPrototype2/blob/71bbcff15c6c8da0aff830534e3ec0d5d3f3e893/src/detectors/TextDetector.java
https://github.com/mabotech/mabo.io/blob/7f646db9d5ee3cd0b137866bf8eaf295890f134c/py/vision/test1/ocr4.py

https://github.com/zmr/namsel/tree/master

https://github.com/dingtiansong/infoEx/blob/cb858fef0fefd3f5a4397d79c0fffc57b49f0d2a/picReg/pyopenCV/houghlines3.jpg
https://github.com/srihareendra/PYTHON_imageprocessing

from awesome-ocr.

wanghaisheng commented on June 2, 2024

import numpy as np
import pytesseract
import cv2
import scipy.fftpack

import io
import os

#from google.cloud import vision

try:
	import Image
except ImportError:
	import PIL.Image

'''
    IMAGE PREPROCESSING
'''
# reading the image
First_Image = cv2.imread('02.old.bmp')

#converting image to greyscale 
grey_image = cv2.cvtColor(First_Image, cv2.COLOR_BGR2GRAY)

#applying otsu's thresholding method after Gaussian blur
blur_image = cv2.GaussianBlur(grey_image, (5,5), 0)

#preparation and application of sobel edge
ddepth = cv2.CV_16S
kw = dict(ksize=3, scale=1, delta=0, borderType=cv2.BORDER_DEFAULT)

# Gradient-X.
grad_x = cv2.Sobel(blur_image, ddepth, 1, 0, **kw)

# Gradient-Y.
grad_y = cv2.Sobel(blur_image, ddepth, 0, 1, **kw)

# Converting back to uint8.
abs_grad_x = cv2.convertScaleAbs(grad_x)
abs_grad_y = cv2.convertScaleAbs(grad_y)

sobel = cv2.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0)
sobel_no_blend = cv2.add(abs_grad_x, abs_grad_y)


#finding image graidents for edge detection
edge_image = cv2.Canny(blur_image, 250, 100)

#using otsu's agorithm to perform binarization
retVal,thresh_image = cv2.threshold(edge_image, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

'''
    PLATE LOCALIZATION
'''

#connected component analysis on the thresh_image
thresh_image_copy = thresh_image.copy()
contours, hierarchy = cv2.findContours(thresh_image_copy, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

#looping through contours to find poosible plates
long_plates = []
short_plates = []
full_set_plates = []
for contour in contours:
    [x, y, width, height] = cv2.boundingRect(contour)
    
    #filtering contours for possible plates
    if (height >  30 and width > 50) and height<120 and width < 300:
        #filtering for short and long plates with aspect_ration
        aspect_ratio = width/height
        if aspect_ratio >= 1.5 and aspect_ratio <= 3:
            possible_candidate = grey_image[y:y+height, x:x+width]
            short_plates.append(possible_candidate)
            cv2.rectangle(First_Image, (x,y), (x+width, y+height), (0, 255, 0), 2) #drawing rectange around possible_candidate
        elif aspect_ratio >= 3.5 and aspect_ratio <=4.5:
            possible_candidate = grey_image[y:y+height, x:x+width]
            long_plates.append(possible_candidate)
            cv2.rectangle(First_Image, (x,y), (x+width, y+height), (0, 255, 0), 2) #drawing rectange around possible_candidate

full_set_plates += long_plates
full_set_plates += short_plates
                    
'''
    CANDIDATE ANALYSIS AND PLATE EXTRACTION
'''

# Candidate analysis on the full_set_plates
strong_plates = []
fuzzy_plates = []
for candidate in full_set_plates:
    blurr = cv2.GaussianBlur(candidate, (5,5),0) 
    candidate_edge = cv2.Canny(blurr, 250, 100)
    cand_h, cand_w = candidate.shape
    plate_candidae_copy = candidate_edge.copy()

    ## perform connected component analysis on plate_candidate
    contours2, hierrachy2 = cv2.findContours(plate_candidae_copy, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    chars_count = 0
    for contour in contours2:
        [w, y, w, h] = cv2.boundingRect(contour)

        ## the apsect ration analysis to check for possible characters
        character_aspect_ratio = w/h
        high_index = 1.2
        low_index = 3.5
        if (h>(0.4*cand_h) and h < cand_h): #  h>(cand_h/low_index) and h<(cand_h/high_index) and width < (cand_w/3):
            #if character_aspect_ratio > 0.4 and character_aspect_ratio < 1.5:
            chars_count += 1
            
    if chars_count >= 5:
        strong_plates.append(candidate)
    elif chars_count >= 2 and chars_count <= 4:
        fuzzy_plates.append(candidate)

print("Strong_plates: {}".format(len(strong_plates)))
print("Fuzzy_plates: {}".format(len(fuzzy_plates)))


for i in range(0, len(strong_plates)):
    cv2.imshow(str(i), strong_plates[i])

# Strong and Fuzzy plate analysis to get the best candidate
for i in range(0, len(strong_plates)):
    plate = strong_plates[i]
    #plate_threshold_image = cv2.adaptiveThreshold(plate, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,2)
    p_h, p_w = plate.shape
    
    #resiszing the plate if the height and width are below a certain size
    if p_h < 74 or p_w < 285: 
        plate_to_save = cv2.resize(plate, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC) 
    else:
        plate_to_save = plate
    
    cv2.imwrite('saves/extractedplate' + str(i) + '.jpg', plate_to_save)

'''
    PLATE SEGMENTATION
'''

#### imclearborder definition

def imclearborder(imgBW, radius):

    # Given a black and white image, first find all of its contours
    imgBWcopy = imgBW.copy()
    contours,hierarchy = cv2.findContours(imgBWcopy.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

    # Get dimensions of image
    imgRows = imgBW.shape[0]
    imgCols = imgBW.shape[1]    

    contourList = [] # ID list of contours that touch the border

    # For each contour...
    for idx in np.arange(len(contours)):
        # Get the i'th contour
        cnt = contours[idx]

        # Look at each point in the contour
        for pt in cnt:
            rowCnt = pt[0][1]
            colCnt = pt[0][0]

            # If this is within the radius of the border
            # this contour goes bye bye!
            check1 = (rowCnt >= 0 and rowCnt < radius) or (rowCnt >= imgRows-1-radius and rowCnt < imgRows)
            check2 = (colCnt >= 0 and colCnt < radius) or (colCnt >= imgCols-1-radius and colCnt < imgCols)

            if check1 or check2:
                contourList.append(idx)
                break

    for idx in contourList:
        cv2.drawContours(imgBWcopy, contours, idx, (0,0,0), -1)

    return imgBWcopy

#### bwareaopen definition
def bwareaopen(imgBW, areaPixels):
    # Given a black and white image, first find all of its contours
    imgBWcopy = imgBW.copy()
    contours,hierarchy = cv2.findContours(imgBWcopy.copy(), cv2.RETR_LIST, 
        cv2.CHAIN_APPROX_SIMPLE)

    # For each contour, determine its total occupying area
    for idx in np.arange(len(contours)):
        area = cv2.contourArea(contours[idx])
        if (area >= 0 and area <= areaPixels):
            cv2.drawContours(imgBWcopy, contours, idx, (0,0,0), -1)

    return imgBWcopy
    
#### Main segmentation program

# Read in image
img = cv2.imread('02.old.bmp', 0)

# Number of rows and columns
rows = img.shape[0]
cols = img.shape[1]

# Remove some columns from the beginning and end
#img = img[:, 59:cols-20]

# Number of rows and columns
rows = img.shape[0]
cols = img.shape[1]

# Convert image to 0 to 1, then do log(1 + I)
imgLog = np.log1p(np.array(img, dtype="float") / 255)

# Create Gaussian mask of sigma = 10
M = 2*rows + 1
N = 2*cols + 1
sigma = 10
(X,Y) = np.meshgrid(np.linspace(0,N-1,N), np.linspace(0,M-1,M))
centerX = np.ceil(N/2)
centerY = np.ceil(M/2)
gaussianNumerator = (X - centerX)**2 + (Y - centerY)**2

# Low pass and high pass filters
Hlow = np.exp(-gaussianNumerator / (2*sigma*sigma))
Hhigh = 1 - Hlow

# Move origin of filters so that it's at the top left corner to
# match with the input image
HlowShift = scipy.fftpack.ifftshift(Hlow.copy())
HhighShift = scipy.fftpack.ifftshift(Hhigh.copy())

# Filter the image and crop
If = scipy.fftpack.fft2(imgLog.copy(), (M,N))
Ioutlow = scipy.real(scipy.fftpack.ifft2(If.copy() * HlowShift, (M,N)))
Iouthigh = scipy.real(scipy.fftpack.ifft2(If.copy() * HhighShift, (M,N)))

# Set scaling factors and add
gamma1 = 0.5
gamma2 = 2.0
Iout = gamma1*Ioutlow[0:rows,0:cols] + gamma2*Iouthigh[0:rows,0:cols]

# Anti-log then rescale to [0,1]
Ihmf = np.expm1(Iout)
Ihmf = (Ihmf - np.min(Ihmf)) / (np.max(Ihmf) - np.min(Ihmf))
Ihmf2 = np.array(255*Ihmf, dtype="uint8")

# Threshold the image - Anything below intensity 65 gets set to white
Ithresh = Ihmf2 < 65
Ithresh = 255*Ithresh.astype("uint8")

# Clear off the border.  Choose a border radius of 5 pixels
Iclear = imclearborder(Ithresh, 5)

# Eliminate regions that have areas below 120 pixels
Iopen = bwareaopen(Iclear, 120)

'''
    CHARACTER RECOGNITION
'''
# using the tesseract OCR
cv2.imwrite('saves/chars.jpeg', Iopen)
img_with_chars = PIL.Image.open('saves/chars.jpeg')
text = pytesseract.image_to_string(img_with_chars)
print('Number Plate: {}'.format(text))

'''
# using the Google Cloud Machine Learning Engine for OCR
vision_client = vision.Client('anpr-166523')

with io.open('saves/chars.jpeg', 'rb') as image_file:
    content = image_file.read()

image = vision_client.image(content=content)

texts = image.detect_text()
print("USING THE ML ENGINE")
print('Plate:')

for text in texts:
    print('\n"{}"'.format(text.description))
'''

'''
    IMAGES DISPLAY
'''
#displaying various forms of images
cv2.imshow('grey Image', grey_image)
cv2.imshow('blur Image', blur_image)
cv2.imshow('sobel', sobel_no_blend)
cv2.imshow('thresh_image', thresh_image)
cv2.imshow('original', First_Image)
# Show all plate candidate series
#cv2.imshow('Original Image', img)
#cv2.imshow('Homomorphic Filtered Result', Ihmf2)
#cv2.imshow('Thresholded Result', Ithresh)
cv2.imshow('Opened Result', Iopen)


cv2.waitKey(0)
cv2.destroyAllWindows();

from awesome-ocr.

wanghaisheng commented on June 2, 2024

护照卡片
Extraction of machine-readable zone information from passports, visas and id-cards via OCR
https://github.com/konstantint/PassportEye/tree/master

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://mp.weixin.qq.com/s?__biz=MzI1NTE4NTUwOQ==&mid=2650326555&idx=1&sn=ffb945f27814bb450b8de2d87087227d
视频行为识别年度进展
https://pan.baidu.com/s/1pLx2Sxd#list/path=%2F&parentPath=%2FVALSE

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/JarveeLee/SynthText_Chinese_version 这个应该是可以模拟生成出胶片ocr的训练数据来

《Synthetic Data for Text Localisation in Natural Images》A Gupta, A Vedaldi, A Zisserman [University of Oxford] (CVPR 2016) O
https://github.com/ankush-me/SynthText

from awesome-ocr.

wanghaisheng commented on June 2, 2024



def ocr_question_extract(im):
    # [email protected]:madmaze/pytesseract.git
    global pytesseract
    try:
        import pytesseract
    except:
        print "[ERROR] pytesseract not installed"
        return
    im = im.crop((127, 3, 260, 22))
    im = pre_ocr_processing(im)
    # im.show()
    return pytesseract.image_to_string(im, lang='chi_sim').strip()


def pre_ocr_processing(im):
    im = im.convert("RGB")
    width, height = im.size

    white = im.filter(ImageFilter.BLUR).filter(ImageFilter.MaxFilter(23))
    grey = im.convert('L')
    impix = im.load()
    whitepix = white.load()
    greypix = grey.load()

    for y in range(height):
        for x in range(width):
            greypix[x,y] = min(255, max(255 + impix[x,y][0] - whitepix[x,y][0],
                                        255 + impix[x,y][1] - whitepix[x,y][1],
                                        255 + impix[x,y][2] - whitepix[x,y][2]))

    new_im = grey.copy()
    binarize(new_im, 150)
    return new_im


def binarize(im, thresh=120):
    assert 0 < thresh < 255
    assert im.mode == 'L'
    w, h = im.size
    for y in xrange(0, h):
        for x in xrange(0, w):
            if im.getpixel((x,y)) < thresh:
                im.putpixel((x,y), 0)
            else:
                im.putpixel((x,y), 255)

from awesome-ocr.

wanghaisheng commented on June 2, 2024

pipeline
https://github.com/harshit158/OCR-pipeline

from awesome-ocr.

wanghaisheng commented on June 2, 2024

printed scientific document
https://github.com/chungkwong/MathOCR/tree/e335392f4bdb98e69a507287686dc8b0abdc275e

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Setting Up a Simple OCR Server
https://realpython.com/blog/python/setting-up-a-simple-ocr-server/
https://github.com/ybur-yug/python_ocr_tutorial

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/PedroBarcha/Context-Spelling-Correction

Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phrase for the suggestion. The software was originally developed for correcting OCR output.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

屏幕捕获

http://www.sikulix.com/

SikuliX automates anything you see on the screen of your desktop computer running Windows, Mac or some Linux/Unix. It uses image recognition powered by OpenCV to identify and control GUI components. This is handy in cases when there is no easy access to a GUI's internals or the source code of the application or web page you want to act on.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Image Processing Worms Assignment Report
To start I read in both image channels as grayscale, normalized them both to the 0-255 range so they were visible as images were in the range 0-1 given to us and therefore black. I then added the two normalized images together with equal weighting. Also read in the ‘w2’ channel as ‘unchanged’ for later use as well as reading in the ground truth image for the respective images read in.

																		Image shows both channels normalized and 
						then added together.

This image is a good start but some flaws is that the worm on the far left is a similar shade to the background and that the background is not all one colour. As a result, I decided to apply various threshold based segmentations.
As the background is much lighter than the worms I could extract the worms from the background.
Noise in an image can cause small errors due to thresholding so it is best to first do some various forms of image filtering to remove this noise.
I started both of these segmentation techniques with a Gaussian Blur which helped to remove Gaussian noise from the image as well as median blur which helped to remove Salt-and-Pepper noise from the image.

I then implemented different segmentation methods depending on what image thresholding methods I was using

i) Simple Binary thresholding

I found a threshold value of 54 gave me the best results

I then inverted the image so I could effectively use morphological transformations. I.e. all worms now white with a black background.
For these original thresholding methods, I found a kernel size of 3 gave me good results.
I applied morphological opening to remove the noise in the rest of the image and morphological closing which removed the small holes due to noise inside the worm object.

					Results of morphological transforms

ii) Adaptive Mean Thresholding

I repeated this process but in exchange of binary
thresholding I used adaptive thresholding.
Image on right is pre-morphological transforms.

Adaptive thresholding gives a much better output as the algorithm calculates the thresholding for small regions in the image. This gave much better results in terms of the quality of the worms but also left a border. I found a block size of 33 and a constant of 10 gave me the best results regarding the quality of worms.

						Post-morphological transforms

I then compared both methods (i) and (ii) to the ground truth data.
By visual inspection they are a good start.
Ground Truth Image

Comparison via taking the difference between the 2 images.

Segmentation method 1 comparison Segmentation method 2 comparison

To get a better image to compare to ground truth data, I read in the ‘w2’ band image only and started with a power law transform where I used a gamma of 1.3 to brighten the image.

					Power Law Transform

I then got a clear white frame with well-defined edges and an image with well-defined worms before adding them together.
I did this by converting the image to 8-bit before using a median blur and an adaptive histogram to improve the contrast of the image and then doing Otsu’s binarization thresholding.
I then used adaptive thresholding, a median blur to remove noise and bilateral filtering to further remove noise without ruining the edges.
Interim Image of segmentation and background separation.

I then inverted the image before applying various morphological transforms however this time with a kernel size 2.
Final Segmented Image Ground Truth

Very good segmentation in comparison to ground truth.

I then found the contours of the image. I looped through these contours and if their area was greater than 250 and less than 10000 I plotted a minimum area bounding rectangle. I then drew the contours on the image.
I calculated a rough length of each worm/contour using perimeter of contour/2. If this length was greater than the length of the diagonal of the bounding rectangle, I classed it as dead, otherwise alive. I labelled each worm dead or alive.
I counted the number of contours to get the number of worms.
I printed each worm individually onto a black background and wrote to individual files.

Clearly shows evidence of success of system. Straight worms classified
As dead and curly as alive.

Counts the number of worms as 11 which is a good level of accuracy.

Worm Written to file Individual Worm Ground Truth

Very clear evidence of system performing the specific task of separation of individual worms.

Watershed Algorithm

Sources

          http://stackoverflow.com/questions/11294859/how-to-define-the-markers-for-watershed-in-opencv

          https://stackoverflow.com/questions/41555031/identifying-curved-and-straight-objects-in-opencv

          http://blog.christianperone.com/2014/06/simple-and-effective-coin-segmentation-using-python-and-opencv/

```
          Lecture demos 
```

          http://opencvpython.blogspot.co.uk/2012/05/skeletonization-using-opencv-python.html

          http://stackoverflow.com/questions/34834523/an-alternative-way-to-skeletonize-in-opencv-python

          http://stackoverflow.com/questions/15135676/problems-during-skeletonization-image-for-extracting-contours

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/Transkribus?page=1
A platform to collaborate, share and benefit from cutting edge research in Handwritten Text Recognition

from awesome-ocr.

wanghaisheng commented on June 2, 2024

There is no dataset/competition like ImageNet for OCR.
Most people/conferences/universities are going after natural images and "computer vision" problems. OCR is its own animal and while it shares some concepts with computer vision it's not the same thing.
A lot of IP, knowledge and talent locked up in a handful of very old companies doing this for a long time. ABBYY is for OCR what Google + Facebook are for deep learning (maybe more).
OCR is kind of a niche, a lot of knowledge is not available to many people outside of a few insiders (ABBYY/Nuance, universities, research labs, OCR conferences). I'm sure Google uses it a lot internally (e.g. Google Street View numbers etc.).
The incumbents don't just do OCR. They do preprocessing (computer vision/image processing) + OCR + NLP.
Hard to find data. ABBYY Finereader supports 190 languages. Collecting this data is no easy task.

I'm probably missing other reasons as well, but this is just off the top of my head.

That being said, I'm sure that there's going to be a lot of progress in the OCR + deep learning space soon.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

ocrcustomserver 5 months ago [-]I wouldn't say that full page OCR is trivial. Using an opensource solution (99% based on Tesseract) is going to get you ok-ish results if your input is relatively clean (no complex layout, scanned documents from a flatbed scanner, standard fonts) and you don't care about speed. If you care about recognition accuracy then Tesseract isn't going to cut it (at least not without some serious effort).Replying to points 1 and 3: For smaller players and/or complex tasks you can always implement your own custom parser.I'm doing work as a contractor in this space.	ocrcustomserver 5 months ago [-]I wouldn't say that full page OCR is trivial. Using an opensource solution (99% based on Tesseract) is going to get you ok-ish results if your input is relatively clean (no complex layout, scanned documents from a flatbed scanner, standard fonts) and you don't care about speed. If you care about recognition accuracy then Tesseract isn't going to cut it (at least not without some serious effort).Replying to points 1 and 3: For smaller players and/or complex tasks you can always implement your own custom parser.I'm doing work as a contractor in this space.
ocrcustomserver 5 months ago [-]I wouldn't say that full page OCR is trivial. Using an opensource solution (99% based on Tesseract) is going to get you ok-ish results if your input is relatively clean (no complex layout, scanned documents from a flatbed scanner, standard fonts) and you don't care about speed. If you care about recognition accuracy then Tesseract isn't going to cut it (at least not without some serious effort).Replying to points 1 and 3: For smaller players and/or complex tasks you can always implement your own custom parser.I'm doing work as a contractor in this space.
staticautomatic 5 months ago [-]I agree with you that Tesseract isn't great out of the box, but if you aren't doing huge volumes, there are plenty of cloud options available.Respectfully, I disagree about this being a parsing issue. The whole reason so-called "zonal ocr" exists is because of the challenges of reliably inferring the structure of a document at parsing time. Yes, there are some kinds of documents where parsing logic alone will suffice, but for more complex tasks you need what ABBYY and Nuance are selling.

ocrcustomserver 5 months ago [-]Just to be sure that we're talking about the same thing, by "custom parser" I meant implementing your own barebones "zonal OCR" functionality with just the features that are needed for the specific problem.I think it boils down to the needs of each individual application.Some cases have a lot of templates and need the "automatic fuzzy matching" functionality and the extra bells and whistles.But smaller players often deal with just a handful of relatively simple templates where FlexiCapture would just be overkill (not to mention a couple of other problems that I'm covering at the end of the post). This is of course not an easy task because you need someone who can design and implement an end to end system that possibly involves image processing, "zonal OCR", include an OCR engine and also perform reliable text extraction from images/PDFs (extracting text from PDFs is tricky). It's way easier for a non developer to think about what rulesets/logic to apply and not having to think about the image processing/OCR bits. I think that is one of the main selling points of FlexiCapture. It abstracts the OCR bits so that the system designer can think about the problem itself, design a spec and think about the logic the logic. Do you need deskewing of documents? Click a button and you get deskewing.Which brings me to the second point. The products sold by ABBYY/Nuance are meant to be used by integrators (no programming needed other than the occasional VB.net script), not image processing specialists/developers. In my (biased) opinion, it makes more sense for some businesses to go the custom route instead of investing in FlexiCapture.There is also FlexiCapture Engine that is meant for developers. This has the same problems as the other offerings by ABBYY (I don't know about Nuance but I suspect it's the same): - expensive - vendor lock-in - ridiculous extra costs for things like "cloud/VM license", exporting to PDF, etc. - limits on how many pages you can process per year or in total (complex licensing schemes) - ABBYY really wants to sell you their own cluster/cloud management services which is all proprietary - limited flexibility in implementing distributed services, costs that add up fast, you have to be trained in their own stack Can you provide an example where you think that a custom solution would not work? I'm curious.

staticautomatic 5 months ago [-]First of all, why don't you shoot me an email at [email protected] and we can talk further. In a nutshell, it would have been way more expensive and difficult for us to roll our own than even the high cost of a FlexiCapture license. But here's a reasonably complete explanation of the build vs buy analysis we did.1. FlexiCapture makes pre-processing incredibly painless and training-free. Beyond the usual binarization stuff, we extensively use the built in auto-rotation and cleanup (skew, noise, speckle, etc.).2. Templating is really the big win for FlexiCapture. I have not seen anything else with a template GUI that comes close to being as usable, robust, or simple. That's really important to us because we build a LOT of templates. I have a really hard time imagining having to code them.3. FlexiCapture's template engine is super strong for the kinds of documents we work with, which is mainly complex repeating groups with nested structures. It's also really good at handling both photos of documents (e.g. mobile) and scans. One thing it also offers that I haven't seen elsewhere in a turnkey product or existing platform is the ability to define zones in purely relative terms without absolute positions. I don't know about Nuance but I've not seen any other template GUI that will allow you to spec something like "look for either this word or a two-line string containing these words in the upper left quadrant of the document."4. There's a dearth of zonal ocr frameworks. Outside of ABBY and Nuance's SDK's, the only one I'm even aware of is OpenKM, and I don't write Java. The FlexiCapture Engine SDK is a terrible beast. The documentation is horrible, it's Windows only, and it's all COM objects.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

ABBYY has dominated the field for many years (decades really) and still outperforms every solution out there. OmniPage by Nuance is probably the second best.
Preprocessing the images (OCR pipeline) is very important for OCR. For generic scanned PDF documents Finereader does a pretty good job.

There is a lot of stuff going on in a OCR engine. Layout analysis, dewarping, binarization, deskewing, despeckling (and others) and then there's the OCR itself. With Tesseract you have to do a lot of things yourself, you have to provide it with a clean image. The commercial packages do that for you automatically. ABBYY and other solutions also use NLP to augment/check the OCR results from a semantic analysis perspective.

Also, there is no "one size fits all" OCR. It is highly specific to the nature of the application. Consider the following use cases:

scanned PDF document
scanned document with a non-standard font (e.g. Fraktur script in a historic book)
photo of scanned document acquired with a mobile phone's camera
passport OCR (MRZ)
credit card OCR
text appearing in natural image (e.g. store sign)
These are all "OCR projects" but they require very different approaches. You cannot just throw any input image at an OCR engine and expect it to work. It often requires a mix of computer vision/image processing, machine learning and OCR engine.
There is a growing number of papers using deep learning that get submitted to ICDAR (the premier OCR conference) and the other OCR conferences. One of the problems is the lack of a universal dataset/competition like ImageNet. The SmartDoc competition (documents captured from smartphones) was cancelled this year due to an insufficient number of participants.

If anyone is doing work with OCR + deep learning, I'd love to discuss!

from awesome-ocr.

wanghaisheng commented on June 2, 2024

thx for your excellent work @tmbdev
I have noticed your ocropy2 long ago and your frequent NVIDIA LAB sub project on OCR, before that i guess and discuss with my github friends that you would release a new one .

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/dhlab-epfl/dhSegment
It is a generic approach for Historical Document Processing. It relies on a Convolutional Neural Network to do the heavy lifting of predicting pixelwise characteristics. Then simple image processing operations are provided to extract the components of interest (boxes, polygons, lines, masks, …)

It does include the following features:

You only need to provide a list of images with annotated masks, which everybody can do with an image editing software (Gimp, Photoshop). You only need to draw the elements you care about!

Allows to classify each pixel across multiple classes, or even multiple labels per pixel.

On-the-fly data augmentation, and efficient batching of batches.

Leverages a state-of-the-art pre-trained network (Resnet50) to lower the need for training data and improve generalization.

Monitor training on Tensorboard very easily.

A list of image processing operations are already implemented such that the post-processing step only take a couple of lines.

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/AstarLight/CPS-OCR-Engine

中山大学某同学做的发票等票据的识别

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Autonomous feedback-based preprocessing using classification likelihoods
http://fse.studenttheses.ub.rug.nl/12113/1/AI_BA_2014_JOOSTBAPTIST.pdf

In pattern recognition and optical character recognition
(OCR) specifically, input images are presented to the
classification system, they are preprocessed, features are
extracted and then the image is classified. This is a sequential
process in which decisions made during early
preprocessing affect the outcome of the classification
and potentially decrease performance. Hand-tuning the
preprocessing parameters is often undesirable, as this
can be a complex task with many parameters to optimize.
Moreover, it is often desirable to minimize the
amount of human intelligence that ends up in an autonomous
system, if it can be expected that new variants
of the data would require new human knowledgebased
labor. A different approach to preprocessing in
OCR is proposed, in which preprocessing is performed
autonomously and depends on computed likelihood of
classification outcomes. This paper shows that by using
this approach, color, scale and rotation invariance can
be achieved, as well as high accuracy and precision. The
performance is solid and reaches a plateau even when
noise in the data is not fully accounted for

from awesome-ocr.

wanghaisheng commented on June 2, 2024

Autonomous feedback-based preprocessing using classification likelihoods
http://fse.studenttheses.ub.rug.nl/12113/1/AI_BA_2014_JOOSTBAPTIST.pdf

In pattern recognition and optical character recognition
(OCR) specifically, input images are presented to the
classification system, they are preprocessed, features are
extracted and then the image is classified. This is a sequential
process in which decisions made during early
preprocessing affect the outcome of the classification
and potentially decrease performance. Hand-tuning the
preprocessing parameters is often undesirable, as this
can be a complex task with many parameters to optimize.
Moreover, it is often desirable to minimize the
amount of human intelligence that ends up in an autonomous
system, if it can be expected that new variants
of the data would require new human knowledgebased
labor. A different approach to preprocessing in
OCR is proposed, in which preprocessing is performed
autonomously and depends on computed likelihood of
classification outcomes. This paper shows that by using
this approach, color, scale and rotation invariance can
be achieved, as well as high accuracy and precision. The
performance is solid and reaches a plateau even when
noise in the data is not fully accounted for

from awesome-ocr.

wanghaisheng commented on June 2, 2024

https://github.com/DriesSmit/GeneralOCR

This software finds text and structure in images.

from awesome-ocr.

references about awesome-ocr HOT 57 OPEN

Comments (57)

ocropus

自然场景下的识别

菜单

验证码

车牌号码

印刷文本

简历

发票票据

预处理

二值化

文本定位 opencv

for each contour found, draw a rectangle around it on original image

write original image with added contours to disk

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

Comments (57)

ocropus

自然场景下的识别

菜单

验证码

车牌号码

印刷文本

简历

发票 票据

预处理

二值化

文本定位 opencv

for each contour found, draw a rectangle around it on original image

write original image with added contours to disk

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

发票票据