Comments (3)
背景介绍
photo OCR的应用场景
- 供导航系统使用的街景识别
- 供盲人使用的辅助技术 街标解读
- 手机上的实时识别和翻译
- 网页上海量视频 图片的检索和查询
Photo Optical Character Recognition (photo OCR), which aims to read scene text in natural images, is an essen- tial step for a wide variety of computer vision tasks, and has enjoyed significant success in several commercial applica- tions. These include street-sign reading for automatic navi- gation systems, assistive technologies for the blind (such as product-label reading), real-time text recognition and trans- lation on mobile phones, and search/indexing the vast cor- pus of image and video on the web.
from awesome-ocr.
方法
传统方法
- 固定个数的固定长度的单词、单字形成的字典 辅以人工提取图片特征
- 1 .区域二值化、HOG特征提取、马尔科夫模型二值化 连通组件特征提取
- 2.在人工提取特征上整合CNN 均无法解决字典中不存在的字词识别
The field of photo OCR has been primarily focused on constrained scenarios with hand-engineered image features. (Here, constrained means that there is a fixed lexicon or dictionary and words have known length during inference.). Specifically, examples of constrained text recognition methods include region-based binarization or grouping [5, 24, 33], pictorial structures with HOG features [47, 46], integer programming with SIFT descriptor [41], Conditional Random Fields (CRFs) with HOG features [32, 31, 39], Markov models with binary and connected component features [49]. Some early attempts [26, 53, 10] try to learn local mid-level representation on top of the handcrafted features, and some methods in [48, 19, 16] incorporate deep convolutional neural networks (CNNs) [25, 13] for a better image feature extraction. These methods work very well when candidate ground-truth word strings are known at testing stage, but do not generalize to words that are not present in the list of a lexicon at all
-
使用两个 CNN 一个用于对字符序列建模 一个用于N-gram 语言模型 然后使用 CRF 图模型将二者整合起来
A recent advance in the state-of-the-art that moves beyond this constrained setting was presented by Jaderberg et al. in [17]. The authors report results in the unconstrained setting by constructing two sets of CNNs – one for modeling character sequences and one for N-gram language statistics – followed by a CRF graphical model to combine their activations. This method achieved great success and set a new standard in photo OCR field. However, despite these successes, the system in [17] does have some drawbacks. For instance, the use of two different CNNs incurs a relatively large memory and computation cost. Furthermore, the manually defined N-gram CNN model has a large number of output nodes (10k output units for N = 4), which increases the training complexity – requiring an incremen- tal training procedure and heuristic gradient rescaling based on N-gram frequencies.
-
本文提出的新方法
Inspired by [17], we continue to focus our efforts on the unconstrained scene text recognition task, and we develop a recursive recurrent neural networks with attention modeling (R2AM) system that directly performs image to sequence (word strings) learning, delivering improvements over their work. The three main contributions of the work presented in this paper are:
(1) Recursive CNNs with weight-sharing, for more effective image feature extraction than a “vanilla” CNN under the same parametric capacity.
(2) Recurrent neural networks (RNNs) atop of extracted image features from the aforementioned recursive CNNs, to perform implicit learning of character-level language model. RNNs can automatically learn the sequential dy- namics of characters that are naturally present in word strings from the training data without the need of manually defining N-grams from a dictionary.
(3) A sequential attention-based modeling mechanism that performs “soft” deterministic image feature selection as the character sequence is being read, and that can be trained end-to-end within the standard backpropagation.
We pursue extensive experimental validation on chal- lenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k. We also provide a detailed ablation study by examining the effectiveness of each of the pro- posed components. Our proposed network architecture achieves the new state-of-the-art results and significantly outperforms the previous best reported results for unconstrained text recognition [17]; i.e. we observe an absolute accuracy improvement of 9% on Street View Text and 8.2% on ICDAR 2013.
from awesome-ocr.
Hi, do you have the code of this paper? Thank you very much.
from awesome-ocr.
Related Issues (20)
- OCR basics HOT 1
- EAST:An Efficient and Accurate Scene Text Detector HOT 1
- Robust, Simple Page Segmentation using Hybrid Convolutional MDLSTM Networks
- PixelLink: Detecting Scene Text via Instance Segmentation
- Table-to-Text: Describing Table Region with Natural Language
- lable tools
- how to modify the connectionist Temporal Classification (CTC) layer of the network to also give us a confidence score? HOT 2
- Confidence Prediction for Lexicon-Free OCR HOT 1
- 工业制造——Workplace of automated control of vibration output circular trays HOT 3
- Tesseract for R HOT 1
- Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
- 【Rosetta:大规模图像文字检测识别系统】《Rosetta: Large scale system for text detection and recognition in images》[Facebook] (2018) O HOT 4
- Radical analysis network for zero-shot learning in printed Chinese character recognition HOT 3
- DenseRAN for Offline Handwritten Chinese Character Recognition HOT 3
- Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
- in marmot data set the table BBOX are not matching with original images
- dhSegment: A generic deep-learning approach for document segmentation
- null
- 2018年末撸串计划 HOT 5
- 希望可以增加PaddleOCR、AgentOCR HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from awesome-ocr.