GithubHelp home page GithubHelp logo

tamlthari / vietnamese_handwritten_text_recognition_cinnamon_ocr Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 7.0 130.89 MB

This is a solution to Cinnamon AI Challenge (https://drive.google.com/drive/folders/1Qa2YA6w6V5MaNV-qxqhsHHoYFRK5JB39) using convolutional, attention, bidirectional LSTM, achieving CER 0.081 WER 0.188 and SER 0.89

Python 100.00%
ocr optical-character-recognition handwritten-text-recognition vietnamese lstm bidirectional-lstm attention convolutional-neural-network convolutional-lstm crnn-ctc

vietnamese_handwritten_text_recognition_cinnamon_ocr's Introduction

Cinamon AI Challenge - Handwritten text recognition

This implementation follows Do Hai Minh's github closely and adapts from @pbcquoc, the winner on the challenge. Our version of Quoc's model does not perform as well as Minh's due to the fact that we did not implement Beam Search that Quoc has implemented.

The data can be download at Cinnamon AI Challenge

Please unzip the data in this folder structure to run the code:

|--data/
|----raw/
|------0825_DataSamples_1/
|------0916_DataSamples_2/
|------1015_Private_Test/
|--src/

Data preprocessing

Move to /src and run this to transform the data

python transform.py --path ../data/raw/0916_DataSamples_2 --type train --transform
python transform.py --path ../data/raw/1015_Private_Test --type test --transform

Two new folders train/ and test/ and two json files containing the labels will be created in data/. The folders train/ and test/ contain the preprocessed images. You can also run

python transform.py --path ../data/raw/0825_DataSamples_1 --type val --transform

to create a val/ set with 15 samples.

Showing examples

python transform.py --type [train or test or val] --sample

This will open a OpenCV window showing the preprocessed images (50 samples) one by one. The labels of the images will be shown in the terminal window.

Model training

You can train with three different models, Minh's model (achieving WER 0.188 and SER 0.89), Quoc's model or combined model (convolution layers from Minh's and Quoc's attention layers)

For Minh's model (consisting of convolutional layers and bidirectional LSTM): Convolutional Recurrent Neural Network by Puigcerver et al.

Reference:
    Joan Puigcerver.
    Are multidimensional recurrent layers really necessary for handwritten text recognition?
    In: Document Analysis and Recognition (ICDAR), 2017 14th
    IAPR International Conference on, vol. 1, pp. 67–72. IEEE (2017)

    Carlos Mocholí Calvo and Enrique Vidal Ruiz.
    Development and experimentation of a deep learning system for convolutional and recurrent neural networks
    Escola Tècnica Superior d’Enginyeria Informàtica, Universitat Politècnica de València, 2018
python train.py --train

or for combined model:

python train.py --trainattn

or for Quoc's model (base model VGG16, attention layer and bidirectional LSTM):

python train.py --trainquoc

Every time training or testing is done, stats will be saved in train_stats.txt or evaluate_stats.txt

For testing on Minh's and combined model, run:

python train.py --test --path [path to the test images]

or testing on Quoc's model (because of different base layer input shape):

python train.py --testquoc --path [path to the test images]

Example python3 train.py --test --path ../data/test. Then predicted texts and the ground true texts will be stored in predictions_test.txt.

vietnamese_handwritten_text_recognition_cinnamon_ocr's People

Contributors

tamlthari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.