GithubHelp home page GithubHelp logo

manuthvann216 / ocr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yui-mhcp/ocr

0.0 0.0 0.0 1.02 MB

OCR with Scene-Text Detection project

License: GNU Affero General Public License v3.0

Python 79.12% Makefile 0.08% Jupyter Notebook 20.78% Dockerfile 0.02%

ocr's Introduction

๐Ÿ˜‹ Optical Character Recognition

NEW : this repository is new and experimental, do not hesitate to open issues if you have any question or bug, or even suggestions to improve the project ! ๐Ÿ˜‹

Check the CHANGELOG file to have a global overview of the latest modifications ! ๐Ÿ˜‹

Project structure

โ”œโ”€โ”€ custom_architectures
โ”‚ย ย  โ”œโ”€โ”€ crnn_arch.py        : defines the CRNN main architecture for OCR (with CTC decoding)
โ”‚ย ย  โ”œโ”€โ”€ unet_arch.py        : defines variants of the UNet architectures used in the EAST detector
โ”‚ย ย  โ””โ”€โ”€ yolo_arch.py        : defines the YOLOv2 architecture
โ”œโ”€โ”€ custom_layers
โ”œโ”€โ”€ custom_train_objects
โ”œโ”€โ”€ datasets
โ”œโ”€โ”€ hparams
โ”œโ”€โ”€ loggers
โ”œโ”€โ”€ models
โ”‚ย ย  โ”œโ”€โ”€ detection           : used to detect texts in images (with the EAST detector)
โ”‚ย ย  โ”œโ”€โ”€ ocr
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ base_ocr.py     : abstract class for OCR models
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ crnn.py         : main CRNN class (OCR)
โ”œโ”€โ”€ pretrained_models
โ”‚ย ย  โ””โ”€โ”€ yolo_backend        : directory where to save the yolo_backend weights
โ”œโ”€โ”€ unitest
โ”œโ”€โ”€ utils
โ”œโ”€โ”€ example_crnn.ipynb
โ””โ”€โ”€ pcr.ipynb

Check the main project for more information about the unextended modules / structure / main classes.

Check the detection project for more information about the detection module and the EAST Scene-Text Detection model.

Available features

  • Detection (module models.detection) :
Feature Fuction / class Description
OCR ocr Performs OCR on the given image(s)

You can check the ocr notebook for a concrete demonstration

Available models

Model architectures

Available architectures :

Model weights

Classes Dataset Architecture Trainer Weights

Models must be unzipped in the pretrained_models/ directory !

The pretrained CRNN models come from the EasyOCR library. Weights are automatically downloaded given the language or the model's name, and converted in tensorflow ! The easyocr is therefore not required, by pytorch is required for weights loading (for convertion).

The pretrained version of EAST can be downloaded from this project. It should be set in pretrained_models/pretrained_weights/east_vgg16.pth (torch is required to transfer the weights : pip install torch).

Installation and usage

  1. Clone this repository : git clone https://github.com/yui-mhcp/ocr.git
  2. Go to the root of this repository : cd ocr
  3. Install requirements : pip install -r requirements.txt
  4. Open detection notebook and follow the instructions !

Important Note : some heavy requirements are removed in order to avoid unnecessary installation of such packages (e.g. torch and transformers), as they are used only in very specific functions. It is therefore possible that some ImportError occurs when using specific functions, such as TextEncoder.from_transformers_pretrained(...).

TO-DO list :

  • Make the TO-DO list
  • Convert the CRNN architecture / weights from the easyocr library to tensorflow
  • Convert the CRNN + attention architecture from this repo to tensorflow
  • Add examples to initialize pretrained models (both EAST and CRNN)
  • Add an example to perform OCR on image (with text detection)
  • Add an example to perform OCR on camera
  • Allow to combine texts in lines / paragraphs (as EAST detects individual words)
  • Take into account the text rotation in the combination procedure

Contacts and licence

You can contact me at [email protected] or on discord at yui#0732

The objective of these projects is to facilitate the development and deployment of useful application using Deep Learning for solving real-world problems and helping people. For this purpose, all the code is under the Affero GPL (AGPL) v3 licence

All my projects are "free software", meaning that you can use, modify, deploy and distribute them on a free basis, in compliance with the Licence. They are not in the public domain and are copyrighted, there exist some conditions on the distribution but their objective is to make sure that everyone is able to use and share any modified version of these projects.

Furthermore, if you want to use any project in a closed-source project, or in a commercial project, you will need to obtain another Licence. Please contact me for more information.

For my protection, it is important to note that all projects are available on an "As Is" basis, without any warranties or conditions of any kind, either explicit or implied. However, do not hesitate to report issues on the repository's project or make a Pull Request to solve it ๐Ÿ˜„

If you use this project in your work, please add this citation to give it more visibility ! ๐Ÿ˜‹

@misc{yui-mhcp
    author  = {yui},
    title   = {A Deep Learning projects centralization},
    year    = {2021},
    publisher   = {GitHub},
    howpublished    = {\url{https://github.com/yui-mhcp}}
}

Notes and references

The code for the CRNN architecture is highly inspired from the easyocr repo :

Papers and tutorials :

Datasets :

  • COCO Text dataset : an extension of COCO for text detection

ocr's People

Contributors

yui-mhcp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.