Light

milely / ocr_paper Goto Github PK

View Code? Open in Web Editor NEW

9.0 1.0 0.0 72 KB

Papers in the field of OCR

ocr_paper's Introduction

OCR_paper

Papers in the field of OCR(Continually updated)

Text Detection

ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene Text Detection(ADNet)(TMM)
CBNet: A Plug-and-Play Network for Segmentation-based Scene Text Detection(CBNet)
Zoom Text Detector
UNITS: UNSUPERVISED INTERMEDIATE TRAINING STAGE FOR SCENE TEXT DETECTION(ICME2022)
Vision-Language Pre-Training for Boosting Scene Text Detectors(ssl for text det CVPR2022)
Few Could Be Better Than All:Feature Sampling and Grouping for Scene Text Detection(Transformer-based)
Kernel Proposal Network for Arbitrary Shape Text Detection(KPN)
Real-Time Scene Text Detection with Differentiable Binarizationand Adaptive Scale Fusion(DBNet++)
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation (code)

Text Recognition

Masked and Permuted Implicit Context Learning for Scene Text Recognition(MM23)
Transferring General Multimodal Pretrained Models to Text Recognition(Pretrained Model)
Multi-Granularity Prediction for Scene Text Recognition(ECCV2022)
Levenshtein OCR(ECCV2022)
SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition(iregular text)
Scene Text Recognition with Permuted Autoregressive Sequence Models(ABI-based)
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition(SSL)
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining(SSL for encoder and decoder)
Multimodal Semi-Supervised Learning for Text Recognition(Multi-modal SSL)
SVTR: Scene Text Recognition with a Single Visual Model(Visual ICJAI2022)
Pushing the Performance Limit of Scene Text Recognizer without Human Annotation(Semi-supervised)
Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition(SSL)
Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement(ssl pretrain encoder for text recogniton)
Training Protocol Matters:Towards Accurate Scene Text Recognition via Training Protocol Searching(search training protocal)
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition(Textual reason by GCN)
Visual-Semantic Transformer for Scene Text Recognition(Multi-modal recognition)
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features(Multi-modal recognition)
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance(Transformer based recognizer)
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition(position enhance)
Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation(KD)
Towards the Unseen: Iterative Text Recognition by Distilling from Errors(Feedback)
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition(Multi-Stage Decoder)

End-to-End text recogniton(Text Spotting)

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting(CPR2023)
Text Spotting Transformers(Transformer detect control points)
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text(PANNet for Text Spotting)
DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting(single point text spotting)
SPTS: Single-Point Text Spotting(single point text spotting)

Document layout analysis

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks(DOC2GRAPH)

Font Generation && Style Transfer

Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator

OCR Post Process（spell check）

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

Paragraph Recognition

LexiconNet: An End-to-End Handwritten Paragraph Text Recognition System

Mathematical Expression Recognition

CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
When Counting Meets HMER:Counting-Aware Network for Handwritten Mathematical Expression Recognition

Table Releated

Revisiting Table Detection Datasets for Visually Rich Documents

ocr_paper's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs