Comments (57)
There is a new set of DL-based OCR tools at https://github.com/NVlabs/ocropus3
OCR systems have traditionally relied on huge data collection efforts and supervised training. I think we can only make significant progress in the long run if we change over to self-supervised training. In the pre-DL world, we had some approaches to that (with the original OCRopus), but carrying that over into the DL world will still require significant effort.
from awesome-ocr.
ocropus
https://github.com/jbest/typeface-corpus
https://github.com/sudeepraja/BLSTM-for-supervised-sequence-recognition.git
https://github.com/sbuss/ocropus
https://github.com/tmbdev?tab=repositories
ocropus/ocropy#54
Is there support for non-latin languages like Chinese, Japanese or Thai?
Here is report about training clstm to recognize Japanese:
After 50 days of training on my MacBookPro, 130000 iterations, against 3877 classes of characters, the clstm has achieved 3.6% error rate so far... The hidden nodes is 800, The trained data is 58.2MB.
Ocropus fork with sane defaults
https://github.com/mittagessen/kraken
kraken is a fork of ocropus intended to rectify a number of issues while preserving (mostly) functional equivalence. Its main goals are:
Explicit input/output handling ✓
Clean public API
Word and character bounding boxes in hOCR ✓
Tests
Removal of runtime dependency on gcc ✓
clstm compatibility ✓
Ticked of goals have been realized while some others still require further work. Pull requests and code contributions are always welcome.
Recognition Models for Kraken and CLSTM https://github.com/mittagessen/kraken-models
kraken-models
This repository contains recognition models for kraken, both legacy pyrnn (converted to pronn) and clstm ones. To have one or more model added open a pull request or send an email to [email protected].
https://github.com/kendemu/char-rnn-chinese
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.
OpenPhilology https://github.com/OpenPhilology
nidaba
An expandable and scalable OCR pipeline
Updated 8 days ago
1 0
tei-ocr
TEI customization for OCR generated layout and content information
Updated on 22 Mar
Scala 1 1
migne-text-reuse
Investigating text reuse in the Patrologia.
Updated on 18 Feb
C 0 2
ancientgreekocr-ocr-evaluation-tools
forked from ryanfb/ancientgreekocr-ocr-evaluation-tools
'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.
Updated on 6 Nov 2015
3 0
Iris
The OCR pipeline to succeed Rigaudon
Updated on 12 Apr 2015
Python 13 8
phaidra
eLearning for historical languages.
Updated on 13 Jan 2015
0 0
Planning
For now we're just using the wiki to discuss future steps.
Updated on 5 Dec 2014
1 1
canonical
OPP work iterating PerseusDL
Updated on 7 Oct 2014
Java 1 2
hocrinfoaggregator
forked from fbaumgardt/hocrinfoaggregator
HocrInfoAggregator
Updated on 31 Mar 2014
Java 1 1
OpenGreekAndLatin
forked from GreekOCR/OpenGreekAndLatin
Based on Rigaudon, hOCRInfoAggregator and CoPhi Proofreader
Updated on 18 Sep 2013
Python 0 4
rigaudon
forked from brobertson/rigaudon
Polytonic Greek OCR engine derived from Gamera and based on the work of Dalitz and Brandt
Updated on 16 Sep 2013
Java 0 1
cophiproofreader
Proof-reading system for OCR applied to Greek and Latin texts
Updated on 11 Sep 2013
3 People
@fbaumgardt
fbaumgardt
@srdee
srdee
@ThomasK81
ThomasK81
https://github.com/jknollmeyer/whiteboard ocropus 的nodejs封装
https://github.com/naptha/ocracy
https://github.com/Totkichi/SciOCR
https://github.com/manhcuogntin4
https://github.com/manhcuogntin4/OCR
https://github.com/manhcuogntin4/CLSTM
Transliteration related data files and/or models. https://github.com/googlei18n/transliteration
https://github.com/naptha/tesseract.js
from awesome-ocr.
latest paper in arxiv during 2015-2016 titled ”OCR“
OCR of historical printings with an application to building diachronic corpora:A case study using the RIDGES herbal corpus https://arxiv.org/pdf/1608.02153.pdf
OCR accuracy improvement on document images through a novel pre-processing approach
https://arxiv.org/abs/1509.03456
Automatic quality evaluation and (semi-) automatic improvement of OCR models for historical printings https://arxiv.org/pdf/1606.05157.pdf
OCR Error Correction Using Character Correction and Feature-Based Word Classification https://arxiv.org/pdf/1604.06225.pdf
-
Towards a robust ocr system for indic scripts
P Krishnan, N Sankaran, AK Singh… - … Systems (DAS), 2014 …, 2014 - ieeexplore.ieee.org
Abstract—The current Optical Character Recognition (OCR) systems for Indic scripts are not
robust enough for recognizing arbitrary collection of printed documents. Reasons for this
limitation includes the lack of resources (eg not enough examples with natural variations, ...
被引用次数:6 相关文章 所有 8 个版本 引用 保存 -
OCR of historical printings of Latin texts: problems, prospects, progress
U Springmann, D Najock, H Morgenroth… - Proceedings of the First …, 2014 - dl.acm.org
Abstract This paper deals with the application of OCR methods to historical printings of Latin
texts. Whereas the problem of recognizing historical printings of modern languages has
been the subject of the IMPACT program, Latin has not yet been given any serious ...
被引用次数:4 相关文章 所有 2 个版本 引用 保存 -
Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning
S Otte, M Liwicki, A Zell - Artificial Neural Networks and Machine Learning– …, 2014 - Springer
Abstract In this paper a novel recurrent neural network (RNN) model for gradient-based
sequence learning is introduced. The presented dynamic cortex memory (DCM) is an
extension of the well-known long short term memory (LSTM) model. The main innovation ...
被引用次数:3 相关文章 所有 2 个版本 引用 保存 -
A sequence learning approach for multiple script identification
A Ul-Hasan, MZ Afzal, F Shafait… - … (ICDAR), 2015 13th …, 2015 - ieeexplore.ieee.org
Abstract-In this paper, we present a novel methodology for multiple script identification using
Long Short-Term Memory (LSTM) networks' sequence-learning capabilities. Our method is able
to identify multiple scripts at text-line level, where two or more scripts are present in the ...
被引用次数:3 相关文章 所有 3 个版本 引用 保存 -
Generic Text Recognition using Long Short-Term Memory Networks
A Ul-Hasan - 2016 - kluedo.ub.uni-kl.de
Abstract The task of printed Optical Character Recognition (OCR) is considered a “solved”
issue by many Pattern Recognition (PR) researchers. The notion, however, partially true,
does not represent the whole picture. Although, it is true that state-of-the-art OCR systems ...
from awesome-ocr.
深度学习进行目标识别的资源列表:O网页链接 包括RNN、MultiBox、SPP-Net、DeepID-Net、Fast R-CNN、DeepBox、MR-CNN、Faster R-CNN、YOLO、DenseBox、SSD、Inside-Outside Net、G-CNN
http://handong1587.github.io/deep_learning/2015/10/09/object-detection.html
from awesome-ocr.
我现同事之前一个工作就是给约炮网站写程序模拟女性和男网友聊天的,用聊天对话数当评价,硬编码,几乎完美通过约炮版图灵测试//@編程菜菜: //@UB_吴斌:不要让你们的客户知道,聊天对象是个计算机,呵呵,后果不堪设想。
@breezedeus
RNN and LSTM的各种资源:O网页链接。利用佳缘对话数据来训练RNN进行交流,有谁感兴趣么?
http://handong1587.github.io/deep_learning/2015/10/09/rnn-and-lstm.html
from awesome-ocr.
自然场景下的识别
https://github.com/tongpi/basicOCR
菜单
《Applying OCR Technology for Receipt Recognition》by Ozhiganov Ivan pdf:http://t.cn/Rqqsban http://t.cn/RqqFY4X
验证码
https://zhuanlan.zhihu.com/p/21344595?f3fb8ead20=357481ecd0939762f4f9dcc75015e93a
http://www.jianshu.com/p/4fadf629895b?utm_campaign=hugo&utm_medium=reader_share&utm_content=note&utm_source=weibo
端到端的OCR:验证码识别
车牌号码
延伸到各种号码 车牌号码的识别
印刷文本
简历
https://github.com/Halfish/cvOCR
发票 票据
https://github.com/xuwenxue000/PJ_DARKNET
https://github.com/xuwenxue000/PJ_PREDICT_IMG
https://github.com/lxj0276/OCRServer/tree/master/ocr_server
https://github.com/moonChenHaohui/PictureCut
from awesome-ocr.
预处理
二值化
https://github.com/zp-j/binarizewolfjolion
Document image binarization for Project 3A @Mines_Nancy http://zp-j.github.io/blog/2013/10/04/document-binarization/
https://github.com/jon1van/SPIE-DRR-2014
文本定位 opencv
http://stackoverflow.com/questions/23506105/extracting-text-opencv
http://stackoverflow.com/questions/23506105/extracting-text-opencv
Here is an alternative approach that I used to detect the text blocks:
Converted the image to grayscale
Applied threshold (simple binary threshold, with a handpicked value of 150 as the threshold value)
Applied dilation to thicken lines in image, leading to more compact objects and less white space fragments. Used a high value for number of iterations, so dilation is very heavy (13 iterations, also handpicked for optimal results).
Identified contours of objects in resulted image using opencv findContours function.
Drew a bounding box (rectangle) circumscribing each contoured object - each of them frames a block of text.
Optionally discarded areas that are unlikely to be the object you are searching for (e.g. text blocks) given their size, as the algorithm above can also find intersecting or nested objects (like the entire top area for the first card) some of which could be uninteresting for your purposes.
Below is the code written in python with pyopencv, it should easily be ported to C++.
import cv2
image = cv2.imread("card.png")
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) # grayscale
_,thresh = cv2.threshold(gray,150,255,cv2.THRESH_BINARY_INV) # threshold
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
dilated = cv2.dilate(thresh,kernel,iterations = 13) # dilate
contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) # get contours
for each contour found, draw a rectangle around it on original image
for contour in contours:
# get rectangle bounding contour
[x,y,w,h] = cv2.boundingRect(contour)
# discard areas that are too large
if h>300 and w>300:
continue
# discard areas that are too small
if h<40 or w<40:
continue
# draw rectangle around contour on original image
cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,255),2)
write original image with added contours to disk
cv2.imwrite("contoured.jpg", image)
The original image is the first image in your post.
After preprocessing (grayscale, threshold and dilate - so after step 3) the image looked like this:
Dilated image
Below is the resulted image ("contoured.jpg" in the last line); the final bounding boxes for the objects in the image look like this:
enter image description here
You can see the text block on the left is detected as a separate block, delimited from its surroundings.
Using the same script with the same parameters (except for thresholding type that was changed for the second image like described below), here are the results for the other 2 cards:
enter image description here
enter image description here
Tuning the parameters
The parameters (threshold value, dilation parameters) were optimized for this image and this task (finding text blocks) and can be adjusted, if needed, for other cards images or other types of objects to be found.
For thresholding (step 2), I used a black threshold. For images where text is lighter than the background, such as the second image in your post, a white threshold should be used, so replace thesholding type with cv2.THRESH_BINARY). For the second image I also used a slightly higher value for the threshold (180). Varying the parameters for the threshold value and the number of iterations for dilation will result in different degrees of sensitivity in delimiting objects in the image.
Finding other object types:
For example, decreasing the dilation to 5 iterations in the first image gives us a more fine delimitation of objects in the image, roughly finding all words in the image (rather than text blocks):
enter image description here
Knowing the rough size of a word, here I discarded areas that were too small (below 20 pixels width or height) or too large (above 100 pixels width or height) to ignore objects that are unlikely to be words, to get the results in the above image.
https://github.com/danvk/oldnyc/blob/master/ocr/tess/crop_morphology.py
from awesome-ocr.
'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy. https://github.com/ryanfb/ancientgreekocr-ocr-evaluation-tools
from awesome-ocr.
参考其接口 库 模板编辑工具的设计
https://github.com/ushelp/EasyOCR
from awesome-ocr.
OCR 结果的输出格式
http://openphilology.github.io/nidaba/tei.html
hocr 可视化
https://github.com/mlichtenberg/hocrimagemapper
https://github.com/dinosauria123/gcv2hocr 里面包含了hocr的例子
from awesome-ocr.
https://github.com/kba/awesome-ocr Links to awesome OCR projects https://github.com/kba/awesome-ocr
from awesome-ocr.
from awesome-ocr.
WebAppFind OCR demo - Applies Ocras.js or GOCR.js to a PDF file opened via right-click from the desktop (the Firefox add-on is currently Windows only; ports welcome!)
from awesome-ocr.
https://github.com/Shreeshrii/imagessan
Images and Ground Truth text files in Sanskrit for evaluating Tesseract OCR (3.04) for Sanskrit language (Devanagari script)
https://github.com/Shreeshrii/ocr-evaluation-tools
'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.
from awesome-ocr.
Caption generation from images using deep neural net http://t-satoshi.blogspot.com/2015/12/image-caption-generation-by-cnn-and-lstm.html
Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
Code for paper "Image Caption Generation with Text-Conditional Semantic Attention"
Image caption generation to diagnose chest x-rays using dataset of images and reports
论文
from awesome-ocr.
latest paper in arxiv during 2015-2016 titled ”OCR“
OCR of historical printings with an application to building diachronic corpora:A case study using the RIDGES herbal corpus https://arxiv.org/pdf/1608.02153.pdf
OCR accuracy improvement on document images through a novel pre-processing approach
https://arxiv.org/abs/1509.03456
Automatic quality evaluation and (semi-) automatic improvement of OCR models for historical printings https://arxiv.org/pdf/1606.05157.pdf
OCR Error Correction Using Character Correction and Feature-Based Word Classification https://arxiv.org/pdf/1604.06225.pdf
from awesome-ocr.
Optical Character Recognition of old and noisy print sources.
https://github.com/digiah/oldOCR
https://github.com/jflesch/pyocr
https://github.com/jflesch/libpillowfight#stroke-width-transformation
这个库里包含了从自然场景的图片中抠出文字来的算法 还是屌屌的 看起来
from awesome-ocr.
Cuneiform https://github.com/PauloMigAlmeida/cuneiform
from awesome-ocr.
https://github.com/potterhsu/SVHNClassifier-PyTorch
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks
from awesome-ocr.
手写字符
https://github.com/tianrolin/HCCR-ResNet
这里采用两种网络进行训练,一个是与MNIST类似的深度卷积网络HCCR3755_cnn_solver.prototxt,另外一个是深度残差网络HCCR3755_res20_solver.prototxt。
对于3755字符识别分类,分别迭代10000次之后,前者网络可以达到91.19%精度,而后者则高达97.23%的精度。
https://github.com/nicklhy/ResNet_caffe2mxnet
from awesome-ocr.
https://github.com/mateogianolio/ocr
nodejs 版本的验证码识别 例子里面主要是英文和数字
from awesome-ocr.
https://github.com/Kidel/In-Codice-Ratio-OCR-with-CNN
logo
In Codice Ratio (ICR) is a project curated by Roma Tre University in collaboration with Vatican Secret Archives. This project has the purpose of digitalizing the contents of documents and ancient texts from the Archive.
The problem we faced in this repository wes just a part of ICR, basically its core. We had to classify handwritten characters in Carolingian minuscule starting from an image of that character. The input is an ensemble of possible cuts of the word that has to be read, and our system has to be able to decide if a cut is correct and, if it is, which character it is.
from awesome-ocr.
https://github.com/ruiwen905/MLTensorFlow
Use of Google's Open Source Artificial Intelligence API
Develop OCR and Supervised Learning applications using TensorFlow, Scikit and Graphviz
Make use of deep learning to train classifiers to learn to recognise and predict from images and data instead of using conditional rules.
from awesome-ocr.
https://github.com/Shreeshrii/tess4tutorial
Tesseract OCR 4.0.0-alpha LSTM Training data for Sanskrit Transliteration
https://github.com/Shreeshrii/tess4eval_deva
Tesseract OCR 4.0.0-alpha LSTM Engine evaluation for Devanagari Alphabet and Old Orthography https://shreeshrii.github.io/tess4eva…
参考这个的安装测试
https://github.com/Shreeshrii/tess4eval
Tesseract OCR 4.0.0alpha evaluation for Hindi and Sanskrit https://shreeshrii.github.io/tess4eval/
from awesome-ocr.
OCR evaluation brought to you by University of Alicante
https://github.com/impactcentre/ocrevalUAtion/wiki
Glyph Miner, a system for extracting glyphs from early typeset prints
https://github.com/benedikt-budig/glyph-miner
https://github.com/jflesch/paperwork
tess4训练过程
tesseract-ocr/tesseract#819
from awesome-ocr.
A brand logo recognition system using deep convolutional neural networks.
https://github.com/satojkovic/DeepLogo
from awesome-ocr.
【基于计算机视觉/深度学习打造先进OCR工作流】《Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning | Dropbox Tech Blog》by Brad Neuberg O
https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/
from awesome-ocr.
We ended up using a classic computer vision approach named Maximally Stable Extremal Regions (MSERs),
using OpenCV’s implementation. The MSER algorithm nds connected regions at dierent thresholds, or
levels, of the image. Essentially, they detect blobs in images, and are thus particularly good for text.
from awesome-ocr.
测试地址在http://www.onlineocr.net/
http://www.ocrwebservice.com/api/restguide
这个网站提供了ocr的接口 soap和rest的 支持中文
输入格式和输出格式都很多 暂时没有看到支持输出坐标数据 如果能支持的话就好了
from awesome-ocr.
https://github.com/fierceX/cnn_ocr_mnist
使用卷积神经网络识别组合手写数字
from awesome-ocr.
https://github.com/psoder3/OCRPractice
An attempt at Optical Character Recognition without being tainted by knowledge of existing implementations
from awesome-ocr.
https://github.com/danielquinn/paperless
Scan, index, and archive all of your paper documents
from awesome-ocr.
Apache Tika bridge for Node.js. Text and metadata extraction, language detection and more.
https://github.com/ICIJ/node-tika
from awesome-ocr.
https://github.com/Muhimbi/PDF-Converter-Services-Online
OCR scripts for digitized NYC city directories
https://github.com/nypl-spacetime/ocr-scripts
Optical character recognition ANN for AI class
Python based Open Source framework for document processing, content analysis and data enrichment pipelines http://www.opensemanticsearch.org/etl
https://github.com/StateFromJakeFarm/OCRANN
https://github.com/LanguageMachines/PICCL
A set of workflows for corpus building through OCR, post-correction and Natural Language Processing
https://github.com/CatWang/OCR-Picture-Generators/
This is a simple project to generate simple cropped images with characters. You can generate with Chinese or English characters. Backgrounds are also allowed. Medical bills simulation are also included.
https://github.com/CatWang/Synthesize_text_generation_Python
一个比较复杂的生成真实场景文字的Python项目。原项目只能生成英文。 经过修改之后能够生成中文。 并且我也添加了图片中文字的切割和对应label的保存代码。
https://github.com/Gr1f0n6x/OCR_NN
Python, Keras, OpenCV
https://github.com/Gr1f0n6x/OnlineOCRMVC
from awesome-ocr.
验证码
https://github.com/jimmikaelkael/pwntcha-testsuite
https://github.com/iveney/pwntcha
http://caca.zoy.org/wiki/PWNtcha
虽然识别部分可能没法用了 但预处理绝逼很有价值
https://blog.bmonkeys.net/2014/build-pwntcha-on-ubuntu-14-04
from awesome-ocr.
预处理算法
测试看下来 对于复杂的验证码 街景中的图标 sobel filter效果甚好
https://github.com/danvk/oldnyc
https://github.com/zmr/namsel/tree/master
https://github.com/Visslo-PCH/Training
https://github.com/Wangsujeon/Etc.sc/tree/495ae8d8043db55edd9cfc468063e471faf502a6/Project1/Project1
https://github.com/AlexOuyang/OCR/
https://github.com/teichgraf/MuLaPeGASim
https://github.com/comrat/ocr-toolkit
=====
https://github.com/shrutikapoyrekar/Licence-Plate-Detector-Recognition
https://github.com/ankitsingh/ANPR/wiki/Algorithm
https://github.com/Wangsujeon/Etc.sc/blob/495ae8d8043db55edd9cfc468063e471faf502a6/Project1/Project1/bookline.cpp
https://github.com/Visslo-PCH/Training/blob/cbbc6adca836217c2c0fa1c2be0b435a5ab2bd18/gaussian_bluring/gaussian_bluring.cpp
https://github.com/whatthefua/latexocr
https://github.com/g4gaj/eazyBill/tree/0e25f85ca4ef2a401c30a2394eb4351b31e98228
使用opencv的方法
https://github.com/liwangjing/opencv-in-python/tree/master
https://github.com/Cid1986/BlindSightPrototype2/blob/71bbcff15c6c8da0aff830534e3ec0d5d3f3e893/src/detectors/TextDetector.java
https://github.com/mabotech/mabo.io/blob/7f646db9d5ee3cd0b137866bf8eaf295890f134c/py/vision/test1/ocr4.py
https://github.com/zmr/namsel/tree/master
https://github.com/dingtiansong/infoEx/blob/cb858fef0fefd3f5a4397d79c0fffc57b49f0d2a/picReg/pyopenCV/houghlines3.jpg
https://github.com/srihareendra/PYTHON_imageprocessing
from awesome-ocr.
import numpy as np
import pytesseract
import cv2
import scipy.fftpack
import io
import os
#from google.cloud import vision
try:
import Image
except ImportError:
import PIL.Image
'''
IMAGE PREPROCESSING
'''
# reading the image
First_Image = cv2.imread('02.old.bmp')
#converting image to greyscale
grey_image = cv2.cvtColor(First_Image, cv2.COLOR_BGR2GRAY)
#applying otsu's thresholding method after Gaussian blur
blur_image = cv2.GaussianBlur(grey_image, (5,5), 0)
#preparation and application of sobel edge
ddepth = cv2.CV_16S
kw = dict(ksize=3, scale=1, delta=0, borderType=cv2.BORDER_DEFAULT)
# Gradient-X.
grad_x = cv2.Sobel(blur_image, ddepth, 1, 0, **kw)
# Gradient-Y.
grad_y = cv2.Sobel(blur_image, ddepth, 0, 1, **kw)
# Converting back to uint8.
abs_grad_x = cv2.convertScaleAbs(grad_x)
abs_grad_y = cv2.convertScaleAbs(grad_y)
sobel = cv2.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0)
sobel_no_blend = cv2.add(abs_grad_x, abs_grad_y)
#finding image graidents for edge detection
edge_image = cv2.Canny(blur_image, 250, 100)
#using otsu's agorithm to perform binarization
retVal,thresh_image = cv2.threshold(edge_image, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
'''
PLATE LOCALIZATION
'''
#connected component analysis on the thresh_image
thresh_image_copy = thresh_image.copy()
contours, hierarchy = cv2.findContours(thresh_image_copy, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
#looping through contours to find poosible plates
long_plates = []
short_plates = []
full_set_plates = []
for contour in contours:
[x, y, width, height] = cv2.boundingRect(contour)
#filtering contours for possible plates
if (height > 30 and width > 50) and height<120 and width < 300:
#filtering for short and long plates with aspect_ration
aspect_ratio = width/height
if aspect_ratio >= 1.5 and aspect_ratio <= 3:
possible_candidate = grey_image[y:y+height, x:x+width]
short_plates.append(possible_candidate)
cv2.rectangle(First_Image, (x,y), (x+width, y+height), (0, 255, 0), 2) #drawing rectange around possible_candidate
elif aspect_ratio >= 3.5 and aspect_ratio <=4.5:
possible_candidate = grey_image[y:y+height, x:x+width]
long_plates.append(possible_candidate)
cv2.rectangle(First_Image, (x,y), (x+width, y+height), (0, 255, 0), 2) #drawing rectange around possible_candidate
full_set_plates += long_plates
full_set_plates += short_plates
'''
CANDIDATE ANALYSIS AND PLATE EXTRACTION
'''
# Candidate analysis on the full_set_plates
strong_plates = []
fuzzy_plates = []
for candidate in full_set_plates:
blurr = cv2.GaussianBlur(candidate, (5,5),0)
candidate_edge = cv2.Canny(blurr, 250, 100)
cand_h, cand_w = candidate.shape
plate_candidae_copy = candidate_edge.copy()
## perform connected component analysis on plate_candidate
contours2, hierrachy2 = cv2.findContours(plate_candidae_copy, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
chars_count = 0
for contour in contours2:
[w, y, w, h] = cv2.boundingRect(contour)
## the apsect ration analysis to check for possible characters
character_aspect_ratio = w/h
high_index = 1.2
low_index = 3.5
if (h>(0.4*cand_h) and h < cand_h): # h>(cand_h/low_index) and h<(cand_h/high_index) and width < (cand_w/3):
#if character_aspect_ratio > 0.4 and character_aspect_ratio < 1.5:
chars_count += 1
if chars_count >= 5:
strong_plates.append(candidate)
elif chars_count >= 2 and chars_count <= 4:
fuzzy_plates.append(candidate)
print("Strong_plates: {}".format(len(strong_plates)))
print("Fuzzy_plates: {}".format(len(fuzzy_plates)))
for i in range(0, len(strong_plates)):
cv2.imshow(str(i), strong_plates[i])
# Strong and Fuzzy plate analysis to get the best candidate
for i in range(0, len(strong_plates)):
plate = strong_plates[i]
#plate_threshold_image = cv2.adaptiveThreshold(plate, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,2)
p_h, p_w = plate.shape
#resiszing the plate if the height and width are below a certain size
if p_h < 74 or p_w < 285:
plate_to_save = cv2.resize(plate, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
else:
plate_to_save = plate
cv2.imwrite('saves/extractedplate' + str(i) + '.jpg', plate_to_save)
'''
PLATE SEGMENTATION
'''
#### imclearborder definition
def imclearborder(imgBW, radius):
# Given a black and white image, first find all of its contours
imgBWcopy = imgBW.copy()
contours,hierarchy = cv2.findContours(imgBWcopy.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# Get dimensions of image
imgRows = imgBW.shape[0]
imgCols = imgBW.shape[1]
contourList = [] # ID list of contours that touch the border
# For each contour...
for idx in np.arange(len(contours)):
# Get the i'th contour
cnt = contours[idx]
# Look at each point in the contour
for pt in cnt:
rowCnt = pt[0][1]
colCnt = pt[0][0]
# If this is within the radius of the border
# this contour goes bye bye!
check1 = (rowCnt >= 0 and rowCnt < radius) or (rowCnt >= imgRows-1-radius and rowCnt < imgRows)
check2 = (colCnt >= 0 and colCnt < radius) or (colCnt >= imgCols-1-radius and colCnt < imgCols)
if check1 or check2:
contourList.append(idx)
break
for idx in contourList:
cv2.drawContours(imgBWcopy, contours, idx, (0,0,0), -1)
return imgBWcopy
#### bwareaopen definition
def bwareaopen(imgBW, areaPixels):
# Given a black and white image, first find all of its contours
imgBWcopy = imgBW.copy()
contours,hierarchy = cv2.findContours(imgBWcopy.copy(), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE)
# For each contour, determine its total occupying area
for idx in np.arange(len(contours)):
area = cv2.contourArea(contours[idx])
if (area >= 0 and area <= areaPixels):
cv2.drawContours(imgBWcopy, contours, idx, (0,0,0), -1)
return imgBWcopy
#### Main segmentation program
# Read in image
img = cv2.imread('02.old.bmp', 0)
# Number of rows and columns
rows = img.shape[0]
cols = img.shape[1]
# Remove some columns from the beginning and end
#img = img[:, 59:cols-20]
# Number of rows and columns
rows = img.shape[0]
cols = img.shape[1]
# Convert image to 0 to 1, then do log(1 + I)
imgLog = np.log1p(np.array(img, dtype="float") / 255)
# Create Gaussian mask of sigma = 10
M = 2*rows + 1
N = 2*cols + 1
sigma = 10
(X,Y) = np.meshgrid(np.linspace(0,N-1,N), np.linspace(0,M-1,M))
centerX = np.ceil(N/2)
centerY = np.ceil(M/2)
gaussianNumerator = (X - centerX)**2 + (Y - centerY)**2
# Low pass and high pass filters
Hlow = np.exp(-gaussianNumerator / (2*sigma*sigma))
Hhigh = 1 - Hlow
# Move origin of filters so that it's at the top left corner to
# match with the input image
HlowShift = scipy.fftpack.ifftshift(Hlow.copy())
HhighShift = scipy.fftpack.ifftshift(Hhigh.copy())
# Filter the image and crop
If = scipy.fftpack.fft2(imgLog.copy(), (M,N))
Ioutlow = scipy.real(scipy.fftpack.ifft2(If.copy() * HlowShift, (M,N)))
Iouthigh = scipy.real(scipy.fftpack.ifft2(If.copy() * HhighShift, (M,N)))
# Set scaling factors and add
gamma1 = 0.5
gamma2 = 2.0
Iout = gamma1*Ioutlow[0:rows,0:cols] + gamma2*Iouthigh[0:rows,0:cols]
# Anti-log then rescale to [0,1]
Ihmf = np.expm1(Iout)
Ihmf = (Ihmf - np.min(Ihmf)) / (np.max(Ihmf) - np.min(Ihmf))
Ihmf2 = np.array(255*Ihmf, dtype="uint8")
# Threshold the image - Anything below intensity 65 gets set to white
Ithresh = Ihmf2 < 65
Ithresh = 255*Ithresh.astype("uint8")
# Clear off the border. Choose a border radius of 5 pixels
Iclear = imclearborder(Ithresh, 5)
# Eliminate regions that have areas below 120 pixels
Iopen = bwareaopen(Iclear, 120)
'''
CHARACTER RECOGNITION
'''
# using the tesseract OCR
cv2.imwrite('saves/chars.jpeg', Iopen)
img_with_chars = PIL.Image.open('saves/chars.jpeg')
text = pytesseract.image_to_string(img_with_chars)
print('Number Plate: {}'.format(text))
'''
# using the Google Cloud Machine Learning Engine for OCR
vision_client = vision.Client('anpr-166523')
with io.open('saves/chars.jpeg', 'rb') as image_file:
content = image_file.read()
image = vision_client.image(content=content)
texts = image.detect_text()
print("USING THE ML ENGINE")
print('Plate:')
for text in texts:
print('\n"{}"'.format(text.description))
'''
'''
IMAGES DISPLAY
'''
#displaying various forms of images
cv2.imshow('grey Image', grey_image)
cv2.imshow('blur Image', blur_image)
cv2.imshow('sobel', sobel_no_blend)
cv2.imshow('thresh_image', thresh_image)
cv2.imshow('original', First_Image)
# Show all plate candidate series
#cv2.imshow('Original Image', img)
#cv2.imshow('Homomorphic Filtered Result', Ihmf2)
#cv2.imshow('Thresholded Result', Ithresh)
cv2.imshow('Opened Result', Iopen)
cv2.waitKey(0)
cv2.destroyAllWindows();
from awesome-ocr.
护照 卡片
Extraction of machine-readable zone information from passports, visas and id-cards via OCR
https://github.com/konstantint/PassportEye/tree/master
from awesome-ocr.
https://mp.weixin.qq.com/s?__biz=MzI1NTE4NTUwOQ==&mid=2650326555&idx=1&sn=ffb945f27814bb450b8de2d87087227d
视频行为识别年度进展
https://pan.baidu.com/s/1pLx2Sxd#list/path=%2F&parentPath=%2FVALSE
from awesome-ocr.
https://github.com/JarveeLee/SynthText_Chinese_version 这个应该是可以模拟生成出胶片ocr的训练数据来
《Synthetic Data for Text Localisation in Natural Images》A Gupta, A Vedaldi, A Zisserman [University of Oxford] (CVPR 2016) O
https://github.com/ankush-me/SynthText
from awesome-ocr.
def ocr_question_extract(im):
# [email protected]:madmaze/pytesseract.git
global pytesseract
try:
import pytesseract
except:
print "[ERROR] pytesseract not installed"
return
im = im.crop((127, 3, 260, 22))
im = pre_ocr_processing(im)
# im.show()
return pytesseract.image_to_string(im, lang='chi_sim').strip()
def pre_ocr_processing(im):
im = im.convert("RGB")
width, height = im.size
white = im.filter(ImageFilter.BLUR).filter(ImageFilter.MaxFilter(23))
grey = im.convert('L')
impix = im.load()
whitepix = white.load()
greypix = grey.load()
for y in range(height):
for x in range(width):
greypix[x,y] = min(255, max(255 + impix[x,y][0] - whitepix[x,y][0],
255 + impix[x,y][1] - whitepix[x,y][1],
255 + impix[x,y][2] - whitepix[x,y][2]))
new_im = grey.copy()
binarize(new_im, 150)
return new_im
def binarize(im, thresh=120):
assert 0 < thresh < 255
assert im.mode == 'L'
w, h = im.size
for y in xrange(0, h):
for x in xrange(0, w):
if im.getpixel((x,y)) < thresh:
im.putpixel((x,y), 0)
else:
im.putpixel((x,y), 255)
from awesome-ocr.
pipeline
https://github.com/harshit158/OCR-pipeline
from awesome-ocr.
printed scientific document
https://github.com/chungkwong/MathOCR/tree/e335392f4bdb98e69a507287686dc8b0abdc275e
from awesome-ocr.
Setting Up a Simple OCR Server
https://realpython.com/blog/python/setting-up-a-simple-ocr-server/
https://github.com/ybur-yug/python_ocr_tutorial
from awesome-ocr.
https://github.com/PedroBarcha/Context-Spelling-Correction
Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phrase for the suggestion. The software was originally developed for correcting OCR output.
from awesome-ocr.
屏幕捕获
SikuliX automates anything you see on the screen of your desktop computer running Windows, Mac or some Linux/Unix. It uses image recognition powered by OpenCV to identify and control GUI components. This is handy in cases when there is no easy access to a GUI's internals or the source code of the application or web page you want to act on.
from awesome-ocr.
Image Processing Worms Assignment Report
To start I read in both image channels as grayscale, normalized them both to the 0-255 range so they were visible as images were in the range 0-1 given to us and therefore black. I then added the two normalized images together with equal weighting. Also read in the ‘w2’ channel as ‘unchanged’ for later use as well as reading in the ground truth image for the respective images read in.
Image shows both channels normalized and
then added together.
This image is a good start but some flaws is that the worm on the far left is a similar shade to the background and that the background is not all one colour. As a result, I decided to apply various threshold based segmentations.
As the background is much lighter than the worms I could extract the worms from the background.
Noise in an image can cause small errors due to thresholding so it is best to first do some various forms of image filtering to remove this noise.
I started both of these segmentation techniques with a Gaussian Blur which helped to remove Gaussian noise from the image as well as median blur which helped to remove Salt-and-Pepper noise from the image.
I then implemented different segmentation methods depending on what image thresholding methods I was using
i) Simple Binary thresholding
I found a threshold value of 54 gave me the best results
I then inverted the image so I could effectively use morphological transformations. I.e. all worms now white with a black background.
For these original thresholding methods, I found a kernel size of 3 gave me good results.
I applied morphological opening to remove the noise in the rest of the image and morphological closing which removed the small holes due to noise inside the worm object.
Results of morphological transforms
ii) Adaptive Mean Thresholding
I repeated this process but in exchange of binary
thresholding I used adaptive thresholding.
Image on right is pre-morphological transforms.
Adaptive thresholding gives a much better output as the algorithm calculates the thresholding for small regions in the image. This gave much better results in terms of the quality of the worms but also left a border. I found a block size of 33 and a constant of 10 gave me the best results regarding the quality of worms.
Post-morphological transforms
I then compared both methods (i) and (ii) to the ground truth data.
By visual inspection they are a good start.
Ground Truth Image
Comparison via taking the difference between the 2 images.
Segmentation method 1 comparison Segmentation method 2 comparison
To get a better image to compare to ground truth data, I read in the ‘w2’ band image only and started with a power law transform where I used a gamma of 1.3 to brighten the image.
Power Law Transform
I then got a clear white frame with well-defined edges and an image with well-defined worms before adding them together.
I did this by converting the image to 8-bit before using a median blur and an adaptive histogram to improve the contrast of the image and then doing Otsu’s binarization thresholding.
I then used adaptive thresholding, a median blur to remove noise and bilateral filtering to further remove noise without ruining the edges.
Interim Image of segmentation and background separation.
I then inverted the image before applying various morphological transforms however this time with a kernel size 2.
Final Segmented Image Ground Truth
Very good segmentation in comparison to ground truth.
I then found the contours of the image. I looped through these contours and if their area was greater than 250 and less than 10000 I plotted a minimum area bounding rectangle. I then drew the contours on the image.
I calculated a rough length of each worm/contour using perimeter of contour/2. If this length was greater than the length of the diagonal of the bounding rectangle, I classed it as dead, otherwise alive. I labelled each worm dead or alive.
I counted the number of contours to get the number of worms.
I printed each worm individually onto a black background and wrote to individual files.
Clearly shows evidence of success of system. Straight worms classified
As dead and curly as alive.
Counts the number of worms as 11 which is a good level of accuracy.
Worm Written to file Individual Worm Ground Truth
Very clear evidence of system performing the specific task of separation of individual worms.
Watershed Algorithm
Sources
-
http://stackoverflow.com/questions/11294859/how-to-define-the-markers-for-watershed-in-opencv
-
https://stackoverflow.com/questions/41555031/identifying-curved-and-straight-objects-in-opencv
-
http://blog.christianperone.com/2014/06/simple-and-effective-coin-segmentation-using-python-and-opencv/
-
Lecture demos
-
http://opencvpython.blogspot.co.uk/2012/05/skeletonization-using-opencv-python.html
-
http://stackoverflow.com/questions/34834523/an-alternative-way-to-skeletonize-in-opencv-python
-
http://stackoverflow.com/questions/15135676/problems-during-skeletonization-image-for-extracting-contours
from awesome-ocr.
https://github.com/Transkribus?page=1
A platform to collaborate, share and benefit from cutting edge research in Handwritten Text Recognition
from awesome-ocr.
-
There is no dataset/competition like ImageNet for OCR.
-
Most people/conferences/universities are going after natural images and "computer vision" problems. OCR is its own animal and while it shares some concepts with computer vision it's not the same thing.
-
A lot of IP, knowledge and talent locked up in a handful of very old companies doing this for a long time. ABBYY is for OCR what Google + Facebook are for deep learning (maybe more).
-
OCR is kind of a niche, a lot of knowledge is not available to many people outside of a few insiders (ABBYY/Nuance, universities, research labs, OCR conferences). I'm sure Google uses it a lot internally (e.g. Google Street View numbers etc.).
-
The incumbents don't just do OCR. They do preprocessing (computer vision/image processing) + OCR + NLP.
-
Hard to find data. ABBYY Finereader supports 190 languages. Collecting this data is no easy task.
I'm probably missing other reasons as well, but this is just off the top of my head.
That being said, I'm sure that there's going to be a lot of progress in the OCR + deep learning space soon.
from awesome-ocr.
ocrcustomserver 5 months ago [-]I wouldn't say that full page OCR is trivial. Using an opensource solution (99% based on Tesseract) is going to get you ok-ish results if your input is relatively clean (no complex layout, scanned documents from a flatbed scanner, standard fonts) and you don't care about speed. If you care about recognition accuracy then Tesseract isn't going to cut it (at least not without some serious effort).Replying to points 1 and 3: For smaller players and/or complex tasks you can always implement your own custom parser.I'm doing work as a contractor in this space. | ocrcustomserver 5 months ago [-]I wouldn't say that full page OCR is trivial. Using an opensource solution (99% based on Tesseract) is going to get you ok-ish results if your input is relatively clean (no complex layout, scanned documents from a flatbed scanner, standard fonts) and you don't care about speed. If you care about recognition accuracy then Tesseract isn't going to cut it (at least not without some serious effort).Replying to points 1 and 3: For smaller players and/or complex tasks you can always implement your own custom parser.I'm doing work as a contractor in this space. |
---|---|
ocrcustomserver 5 months ago [-]I wouldn't say that full page OCR is trivial. Using an opensource solution (99% based on Tesseract) is going to get you ok-ish results if your input is relatively clean (no complex layout, scanned documents from a flatbed scanner, standard fonts) and you don't care about speed. If you care about recognition accuracy then Tesseract isn't going to cut it (at least not without some serious effort).Replying to points 1 and 3: For smaller players and/or complex tasks you can always implement your own custom parser.I'm doing work as a contractor in this space. | |
staticautomatic 5 months ago [-]I agree with you that Tesseract isn't great out of the box, but if you aren't doing huge volumes, there are plenty of cloud options available.Respectfully, I disagree about this being a parsing issue. The whole reason so-called "zonal ocr" exists is because of the challenges of reliably inferring the structure of a document at parsing time. Yes, there are some kinds of documents where parsing logic alone will suffice, but for more complex tasks you need what ABBYY and Nuance are selling. | |
ocrcustomserver 5 months ago [-]Just to be sure that we're talking about the same thing, by "custom parser" I meant implementing your own barebones "zonal OCR" functionality with just the features that are needed for the specific problem.I think it boils down to the needs of each individual application.Some cases have a lot of templates and need the "automatic fuzzy matching" functionality and the extra bells and whistles.But smaller players often deal with just a handful of relatively simple templates where FlexiCapture would just be overkill (not to mention a couple of other problems that I'm covering at the end of the post). This is of course not an easy task because you need someone who can design and implement an end to end system that possibly involves image processing, "zonal OCR", include an OCR engine and also perform reliable text extraction from images/PDFs (extracting text from PDFs is tricky). It's way easier for a non developer to think about what rulesets/logic to apply and not having to think about the image processing/OCR bits. I think that is one of the main selling points of FlexiCapture. It abstracts the OCR bits so that the system designer can think about the problem itself, design a spec and think about the logic the logic. Do you need deskewing of documents? Click a button and you get deskewing.Which brings me to the second point. The products sold by ABBYY/Nuance are meant to be used by integrators (no programming needed other than the occasional VB.net script), not image processing specialists/developers. In my (biased) opinion, it makes more sense for some businesses to go the custom route instead of investing in FlexiCapture.There is also FlexiCapture Engine that is meant for developers. This has the same problems as the other offerings by ABBYY (I don't know about Nuance but I suspect it's the same): - expensive - vendor lock-in - ridiculous extra costs for things like "cloud/VM license", exporting to PDF, etc. - limits on how many pages you can process per year or in total (complex licensing schemes) - ABBYY really wants to sell you their own cluster/cloud management services which is all proprietary - limited flexibility in implementing distributed services, costs that add up fast, you have to be trained in their own stack Can you provide an example where you think that a custom solution would not work? I'm curious. | |
staticautomatic 5 months ago [-]First of all, why don't you shoot me an email at [email protected] and we can talk further. In a nutshell, it would have been way more expensive and difficult for us to roll our own than even the high cost of a FlexiCapture license. But here's a reasonably complete explanation of the build vs buy analysis we did.1. FlexiCapture makes pre-processing incredibly painless and training-free. Beyond the usual binarization stuff, we extensively use the built in auto-rotation and cleanup (skew, noise, speckle, etc.).2. Templating is really the big win for FlexiCapture. I have not seen anything else with a template GUI that comes close to being as usable, robust, or simple. That's really important to us because we build a LOT of templates. I have a really hard time imagining having to code them.3. FlexiCapture's template engine is super strong for the kinds of documents we work with, which is mainly complex repeating groups with nested structures. It's also really good at handling both photos of documents (e.g. mobile) and scans. One thing it also offers that I haven't seen elsewhere in a turnkey product or existing platform is the ability to define zones in purely relative terms without absolute positions. I don't know about Nuance but I've not seen any other template GUI that will allow you to spec something like "look for either this word or a two-line string containing these words in the upper left quadrant of the document."4. There's a dearth of zonal ocr frameworks. Outside of ABBY and Nuance's SDK's, the only one I'm even aware of is OpenKM, and I don't write Java. The FlexiCapture Engine SDK is a terrible beast. The documentation is horrible, it's Windows only, and it's all COM objects. | |
from awesome-ocr.
ABBYY has dominated the field for many years (decades really) and still outperforms every solution out there. OmniPage by Nuance is probably the second best.
Preprocessing the images (OCR pipeline) is very important for OCR. For generic scanned PDF documents Finereader does a pretty good job.
There is a lot of stuff going on in a OCR engine. Layout analysis, dewarping, binarization, deskewing, despeckling (and others) and then there's the OCR itself. With Tesseract you have to do a lot of things yourself, you have to provide it with a clean image. The commercial packages do that for you automatically. ABBYY and other solutions also use NLP to augment/check the OCR results from a semantic analysis perspective.
Also, there is no "one size fits all" OCR. It is highly specific to the nature of the application. Consider the following use cases:
- scanned PDF document
- scanned document with a non-standard font (e.g. Fraktur script in a historic book)
- photo of scanned document acquired with a mobile phone's camera
- passport OCR (MRZ)
- credit card OCR
- text appearing in natural image (e.g. store sign)
These are all "OCR projects" but they require very different approaches. You cannot just throw any input image at an OCR engine and expect it to work. It often requires a mix of computer vision/image processing, machine learning and OCR engine.
There is a growing number of papers using deep learning that get submitted to ICDAR (the premier OCR conference) and the other OCR conferences. One of the problems is the lack of a universal dataset/competition like ImageNet. The SmartDoc competition (documents captured from smartphones) was cancelled this year due to an insufficient number of participants.
If anyone is doing work with OCR + deep learning, I'd love to discuss!
from awesome-ocr.
thx for your excellent work @tmbdev
I have noticed your ocropy2 long ago and your frequent NVIDIA LAB sub project on OCR, before that i guess and discuss with my github friends that you would release a new one .
from awesome-ocr.
https://github.com/dhlab-epfl/dhSegment
It is a generic approach for Historical Document Processing. It relies on a Convolutional Neural Network to do the heavy lifting of predicting pixelwise characteristics. Then simple image processing operations are provided to extract the components of interest (boxes, polygons, lines, masks, …)
It does include the following features:
You only need to provide a list of images with annotated masks, which everybody can do with an image editing software (Gimp, Photoshop). You only need to draw the elements you care about!
Allows to classify each pixel across multiple classes, or even multiple labels per pixel.
On-the-fly data augmentation, and efficient batching of batches.
Leverages a state-of-the-art pre-trained network (Resnet50) to lower the need for training data and improve generalization.
Monitor training on Tensorboard very easily.
A list of image processing operations are already implemented such that the post-processing step only take a couple of lines.
from awesome-ocr.
https://github.com/AstarLight/CPS-OCR-Engine
中山大学某同学做的发票等票据的识别
from awesome-ocr.
Autonomous feedback-based preprocessing using classification likelihoods
http://fse.studenttheses.ub.rug.nl/12113/1/AI_BA_2014_JOOSTBAPTIST.pdf
In pattern recognition and optical character recognition
(OCR) specifically, input images are presented to the
classification system, they are preprocessed, features are
extracted and then the image is classified. This is a sequential
process in which decisions made during early
preprocessing affect the outcome of the classification
and potentially decrease performance. Hand-tuning the
preprocessing parameters is often undesirable, as this
can be a complex task with many parameters to optimize.
Moreover, it is often desirable to minimize the
amount of human intelligence that ends up in an autonomous
system, if it can be expected that new variants
of the data would require new human knowledgebased
labor. A different approach to preprocessing in
OCR is proposed, in which preprocessing is performed
autonomously and depends on computed likelihood of
classification outcomes. This paper shows that by using
this approach, color, scale and rotation invariance can
be achieved, as well as high accuracy and precision. The
performance is solid and reaches a plateau even when
noise in the data is not fully accounted for
from awesome-ocr.
Autonomous feedback-based preprocessing using classification likelihoods
http://fse.studenttheses.ub.rug.nl/12113/1/AI_BA_2014_JOOSTBAPTIST.pdf
In pattern recognition and optical character recognition
(OCR) specifically, input images are presented to the
classification system, they are preprocessed, features are
extracted and then the image is classified. This is a sequential
process in which decisions made during early
preprocessing affect the outcome of the classification
and potentially decrease performance. Hand-tuning the
preprocessing parameters is often undesirable, as this
can be a complex task with many parameters to optimize.
Moreover, it is often desirable to minimize the
amount of human intelligence that ends up in an autonomous
system, if it can be expected that new variants
of the data would require new human knowledgebased
labor. A different approach to preprocessing in
OCR is proposed, in which preprocessing is performed
autonomously and depends on computed likelihood of
classification outcomes. This paper shows that by using
this approach, color, scale and rotation invariance can
be achieved, as well as high accuracy and precision. The
performance is solid and reaches a plateau even when
noise in the data is not fully accounted for
from awesome-ocr.
https://github.com/DriesSmit/GeneralOCR
This software finds text and structure in images.
from awesome-ocr.
Related Issues (20)
- OCR basics HOT 1
- EAST:An Efficient and Accurate Scene Text Detector HOT 1
- Robust, Simple Page Segmentation using Hybrid Convolutional MDLSTM Networks
- PixelLink: Detecting Scene Text via Instance Segmentation
- Table-to-Text: Describing Table Region with Natural Language
- lable tools
- how to modify the connectionist Temporal Classification (CTC) layer of the network to also give us a confidence score? HOT 2
- Confidence Prediction for Lexicon-Free OCR HOT 1
- 工业制造——Workplace of automated control of vibration output circular trays HOT 3
- Tesseract for R HOT 1
- Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
- 【Rosetta:大规模图像文字检测识别系统】《Rosetta: Large scale system for text detection and recognition in images》[Facebook] (2018) O HOT 4
- Radical analysis network for zero-shot learning in printed Chinese character recognition HOT 3
- DenseRAN for Offline Handwritten Chinese Character Recognition HOT 3
- Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
- in marmot data set the table BBOX are not matching with original images
- dhSegment: A generic deep-learning approach for document segmentation
- null
- 2018年末撸串计划 HOT 5
- 希望可以增加PaddleOCR、AgentOCR HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from awesome-ocr.