GithubHelp home page GithubHelp logo

ocr_paper's Introduction

OCR_paper

Papers in the field of OCR(Continually updated)

Text Detection

ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene Text Detection(ADNet)(TMM)
CBNet: A Plug-and-Play Network for Segmentation-based Scene Text Detection(CBNet)
Zoom Text Detector
UNITS: UNSUPERVISED INTERMEDIATE TRAINING STAGE FOR SCENE TEXT DETECTION(ICME2022)
Vision-Language Pre-Training for Boosting Scene Text Detectors(ssl for text det CVPR2022)
Few Could Be Better Than All:Feature Sampling and Grouping for Scene Text Detection(Transformer-based)
Kernel Proposal Network for Arbitrary Shape Text Detection(KPN)
Real-Time Scene Text Detection with Differentiable Binarizationand Adaptive Scale Fusion(DBNet++)
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation(code)

Text Recognition

Masked and Permuted Implicit Context Learning for Scene Text Recognition(MM23)
Transferring General Multimodal Pretrained Models to Text Recognition(Pretrained Model)
Multi-Granularity Prediction for Scene Text Recognition(ECCV2022)
Levenshtein OCR(ECCV2022)
SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition(iregular text)
Scene Text Recognition with Permuted Autoregressive Sequence Models(ABI-based)
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition(SSL)
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining(SSL for encoder and decoder)
Multimodal Semi-Supervised Learning for Text Recognition(Multi-modal SSL)
SVTR: Scene Text Recognition with a Single Visual Model(Visual ICJAI2022)
Pushing the Performance Limit of Scene Text Recognizer without Human Annotation(Semi-supervised)
Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition(SSL)
Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement(ssl pretrain encoder for text recogniton)
Training Protocol Matters:Towards Accurate Scene Text Recognition via Training Protocol Searching(search training protocal)
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition(Textual reason by GCN)
Visual-Semantic Transformer for Scene Text Recognition(Multi-modal recognition)
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features(Multi-modal recognition)
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance(Transformer based recognizer)
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition(position enhance)
Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation(KD)
Towards the Unseen: Iterative Text Recognition by Distilling from Errors(Feedback)
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition(Multi-Stage Decoder)

End-to-End text recogniton(Text Spotting)

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting(CPR2023)
Text Spotting Transformers(Transformer detect control points)
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text(PANNet for Text Spotting)
DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting(single point text spotting)
SPTS: Single-Point Text Spotting(single point text spotting)

Document layout analysis

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks(DOC2GRAPH)

Font Generation && Style Transfer

Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator

OCR Post Process(spell check)

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

Paragraph Recognition

LexiconNet: An End-to-End Handwritten Paragraph Text Recognition System

Mathematical Expression Recognition

CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
When Counting Meets HMER:Counting-Aware Network for Handwritten Mathematical Expression Recognition

Table Releated

Revisiting Table Detection Datasets for Visually Rich Documents

ocr_paper's People

Contributors

milely avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.