GithubHelp home page GithubHelp logo

sushant097 / handwritten-line-text-recognition-using-deep-learning-with-tensorflow Goto Github PK

View Code? Open in Web Editor NEW
273.0 15.0 124.0 2.24 MB

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

License: Apache License 2.0

Python 81.67% CSS 3.57% JavaScript 1.16% HTML 13.60%
cnn blstm tensorflow deep-neural-networks handwritten-text-recognition iam-dataset crnn-tensorflow python3 deep-learning recurrent-neural-network

handwritten-line-text-recognition-using-deep-learning-with-tensorflow's Introduction

Handwritten Line Text Recognition using Deep Learning with Tensorflow

GitHub stars GitHub forks Maintenance Website shields.io Ask Me Anything ! License

Description

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train. More read this Medium Post

Why Deep Learning?

Why Deep Learning

Deep Learning self extracts features with a deep neural networks and classify itself. Compare to traditional Algorithms it performance increase with Amount of Data.

Basic Intuition on How it Works.

Step_wise_detail

  • First Use Convolutional Recurrent Neural Network to extract the important features from the handwritten line text Image.
  • The output before CNN FC layer (512x1x100) is passed to the BLSTM which is for sequence dependency and time-sequence operations.
  • Then CTC LOSS Alex Graves is used to train the RNN which eliminate the Alignment problem in Handwritten, since handwritten have different alignment of every writers. We just gave the what is written in the image (Ground Truth Text) and BLSTM output, then it calculates loss simply as -log("gtText"); aim to minimize negative maximum likelihood path.
  • Finally CTC finds out the possible paths from the given labels. Loss is given by for (X,Y) pair is: Ctc_Loss
  • Finally CTC Decode is used to decode the output during Prediction.

Detail Project Workflow

Architecture of Model

  • Project consists of Three steps:
    1. Multi-scale feature Extraction --> Convolutional Neural Network 7 Layers
    2. Sequence Labeling (BLSTM-CTC) --> Recurrent Neural Network (2 layers of LSTM) with CTC
    3. Transcription --> Decoding the output of the RNN (CTC decode) DetailModelArchitecture

Requirements

  1. Tensorflow 1.8.0 ; You can upgrade to Tensorflow v2 with this link
  2. Flask
  3. Numpy
  4. OpenCv 3
  5. Spell Checker autocorrect >=0.3.0 pip install autocorrect

Dataset Used

  • IAM dataset download from here
  • Only needed the lines images and lines.txt (ASCII).
  • Place the downloaded files inside data directory
The Trained model is available and download from this link. The trained model available have CER=8.32% and trained on IAM dataset with some additional created dataset. The final model have 3.42% CER which is not available publicly.

To Train the model from scratch

$ python main.py --train

To validate the model

$ python main.py --validate

To Prediction

$ python main.py

Run in Web with Flask

$ python upload.py
Validation character error rate of saved model: 8.654728%
Python: 3.6.4 
Tensorflow: 1.8.0
Init with stored values from ../model/snapshot-24
Without Correction clothed leaf by leaf with the dioappoistmest
With Correction clothed leaf by leaf with the dioappoistmest

Prediction output on IAM Test Data PredictionOutput

Prediction output on Self Test Data PredictionOutput

See the project Devnagari Handwritten Word Recognition with Deep Learning for more insights.

Further Improvement

  • Using MDLSTM to recognize whole paragraph at once Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
  • Line segementation can be added for full paragraph text recognition. For line segmentation you can use A* path planning algorithm or CNN model to seperate paragraph into lines.
  • Better Image preprocessing such as: reduce backgoround noise to handle real time image more accurately.
  • Better Decoding approach to improve accuracy. Some of the CTC Decoder found here

Citation

If you use any part of this project in your work, please cite:

@techreport{Handwritten-Line-text-recognition-using-deep-learning-2019,
  title={Handwritten Line Text Recognition},
  author={Gautam Sushant},
  institution={Tribhuvan University},
  year={2019}
}

Feel Free to improve this project with pull Request.

This is a work from my last semester project in computer engineering at Tribhuvan University. In July of 2019,

handwritten-line-text-recognition-using-deep-learning-with-tensorflow's People

Contributors

arielbatkilin avatar sushant097 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

handwritten-line-text-recognition-using-deep-learning-with-tensorflow's Issues

Loss too high

Hello, Sir. Thank for your post. I'm a newbie to this domain and I read your post on Medium. I clone repos into Google Colab, download your data on kaggle and upload it to folder data as in readme. I also redefine images path in Data_Loader.py at line 75:
fileName = filePath + 'self_lines' + fileNameSplit[0] + '.png'.
After training done: It has a really bad result. WER 400% and accuracy 0.00000%.
Could you help me figure out which part I have done wrong, Sir. Thank you in advance.

Unreasonably low results, when using IAM dataset

I'm also getting incredibly poor scores, character error rate of ~45%, word error rate of ~650%
Using the IAM lines dataset, all according to instructions.
Not using word beam search decoding or anything similar, just python main.py --train
Exact same results using either CPU or GPU (the only difference of course being that the GPU way is several times faster). The CTC loss scores and the error rate stop improving after the 7th or so epoch.
Tried increasing the file input size, but got considerably worse results.
Attached an image of an example word error rate while training.

screenshot_1

My specs are:
16 GB RAM, Ryzen 3700X CPU, NVIDIA RTX2070 SUPER GPU. Using Docker and TensorFlow 1.13.

Originally posted by @mcmalzahar in #4 (comment)

Running error

Traceback (most recent call last):
File "main.py", line 193, in
main()
File "main.py", line 164, in main
Model.imgSize, Model.maxTextLen, load_aug=True)
File "C:\Users\hp\Desktop\Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow-master\src\DataLoader.py", line 79, in init
gtText_list = lineSplit[9].split('|')
IndexError: list index out of range

throwing errors

Hi Sushant,

Great work by you!! kudos sir.

I am facing the following issues while running this model.

in DataLoader.py file
for reading the data from ground truth text file

GT text are columns starting at 10

  | 77 | gtText_list = lineSplit[9].split('|')
  | 78 | gtText = self.truncateLabel(' '.join(gtText_list), maxTextLen)

this throws the error -- index out of range and on correcting
gtText_list = lineSplit[8].split('|')

Also in main.py file
totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch
 
 |26| while True:
| 27 | epoch += 1
| 28 | print('Epoch:', epoch, '/', totalEpoch)

is also throwing the error. On commenting totalEpoch line and sending epoch to print statement-

#totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch

while True:
    epoch += 1
    print('Epoch:', epoch, '/', epoch)

Also Autocorrect in spellchecker.py is shown as depreceated and on changing it to pyspellchecker v.4.0

I am able to run the model but on training from scratch its showing very high validation CER of around 43.
let me know if change in spellchecker and other performed changes can lead to this. Also let me know if some other approach has to be taken for training this model on IAM line based dataset

Input new image

How can I use the model to classify a new image? Which class should I add the file address and which file should I add it in?

How to ediy "maxTextLen"

Hi sushant097, Thank you very much for this article. Although the article is long, you can guide how to edit "maxTextLen". I tried with sentences of any length and got an error.
Thanks!

Does the "word" have be exatly an English word?

The tested dataset (IAM Handwriting) has words that are availble in English. How good is this model at recognizing random English letters and characters? For example: jecinapa!]@%. Thank you!

AttributeError: module 'tensorflow' has no attribute 'placeholder'

Hi
I installed Tensorflow 2.x as 1.8.0 version not available on their repository.
When I running this example I am getting the following error.
AttributeError: module 'tensorflow' has no attribute 'placeholder'

Can you please update the code to support Tensorflow latest version.

Thank you

Issue with the color pictures detection

Brother how about also adding teserect functionality with the project as it would be more helpful. I couldn't do the same thingas I am a total beginner at ML and AI .
and ALso please can i get your emailid

Address accuracy

Can you please tell me what is address accuracy and how is it calculated? I'm not able to understand as it is not a standard term

Pretrained model in CNN layer

Hi Sushant
Great work. Appreciate.

I have a few questions.

  1. Have you ever tried adding a pre-trained model such as VGG, Inception or ResNet at the CNN layer? If so, how was results? What you suggest on adding one?

  2. I have read about MD-LSTM and there are results that gives better improvement when used over BiLSTMs. Any experience or experiments on this work?

  3. By the way, have you ever attempted to run the code under GPU or TPU?

thanks
Raj

Can't classify new images?

I'm trying the model on new images but it doesn't work for any of them. Only on your samples mate. I ran the flask as well but same result. When I upload any image besides the samples it fails and gives traceback error.
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\app.py", line 2091, in call
return self.wsgi_app(environ, start_response)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "C:\Users\elmm\Desktop\Semester 3 UM\WIA2001-Database\Group Project\Database\Machine Learning Features\Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow-master\src\upload.py", line 44, in upload
render_template("Error.html", message="Files uploaded are not supported...")
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\templating.py", line 148, in render_template
ctx.app.jinja_env.get_or_select_template(template_name_or_list),
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\jinja2\environment.py", line 1068, in get_or_select_template
return self.get_template(template_name_or_list, parent, globals)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\jinja2\environment.py", line 997, in get_template
return self._load_template(name, globals)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\jinja2\environment.py", line 958, in _load_template
template = self.loader.load(self, name, self.make_globals(globals))
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\jinja2\loaders.py", line 125, in load
source, filename, uptodate = self.get_source(environment, name)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\templating.py", line 59, in get_source
return self._get_source_fast(environment, template)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\Lib\site-packages\flask\templating.py", line 95, in _get_source_fast
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: Error.html

Training process

Hello, Sushant!

For the past few days I have been trying to reproduce the results of the repository.
For that I followed the guide described in README.md but the outcome was different.

Steps:

  1. Clone the repo in a new directory
  2. Download IAM database from official site
  3. Copy lines.txt file and lines directory to the data directory (13 353 records).
  4. In the file DataLoader.py change the following line:
    gtText_list = lineSplit[9].split('|')
    to this:
    gtText_list = lineSplit[8].split('|')
    This is required because the 8-th element (not 9-th) contains ground truth labels. For example:
    a01-000u-00 ok 154 19 408 746 1661 89 A|MOVE|to|stop|Mr.|Gaitskell|from
  5. Run the following command from src_tensorflow2 directory:
    python main.py --train

Environment:

Python: 3.7.9
Tensorflow: 2.7.0

Expected behaviour:

CER is expected to descend slowly approximately to the value specified in README.md: 8.32%.

Actual behaviour:

First try:
CER after epoch 1: 28.1%
CER after epoch 2: 21.0%
But from 3rd to at least 12th epoch CER is between 45% and 52%. And it is not going to go down.

Second try.
After 8th epoch:
Train loss: 62.25793147463152
Val loss: 64.84262824781013
Character error rate: 45.535652%

After 21th epoch:
Train loss: 56.68565004330704
Val loss: 66.37841461644028
Character error rate: 44.809107%

Could you describe the correct way to train the model?

Update 2022-06-09
It seems that the problem is reproduced only in src_tensorflow2 directory.
The code in src_tensorflow1 directory (using TF 1.15.5) after third epoch gives CER 19% and loss still going down.

Update 2022-06-10
The code in src_tensorflow1 directory (using TF 1.15.5) doesn't give stable results too.
I tried 3 more times to run the training from scratch. And CER was not decreasing from some epoch.

No Model in Model Folder

Hey,

I was looking at your code and noticed there is no saved model in the model folder. Is there something I am missing? Also when I am trying to train the model myself, I run into the error -

Traceback (most recent call last):
File "main.py", line 193, in
main()
File "main.py", line 164, in main
Model.imgSize, Model.maxTextLen, load_aug=True)
File "/Users/anagh/Downloads/Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow-master/src/DataLoader.py", line 79, in init
gtText_list = lineSplit[9].split('|')
IndexError: list index out of range

Please note - I have downloaded the dataset lines.tgz and also have put lines.txt in data folder.

Output wrong

I get this for the test image. is this supposed to happen?

W/O correction
Clothed leat by leaf with the disappoisthet

W correction
Clothes left by leaf with the disappoisthet

Can please check this out. is it supposed to happen?

Recognition error: using generate_random_images(

Hi Sushant,

I really appreciate your work and need your help.

Under main.py, the solution works perfectly with function - load_different_image()

However, i am trying to create random images by using below function but predicted outcome is always wrong. Please help what is wrong.

def generate_random_images():
    imgs = []
    for i in range(1, Model.batchSize):
        imgs.append(np.random.random((Model.imgSize[0], Model.imgSize[1])))
    return imgs


def infer(model, fnImg):
    """ Recognize text in image provided by file path """
    img = preprocessor(cv2.imread(fnImg, cv2.IMREAD_GRAYSCALE), imgSize=Model.imgSize)
    imgs = generate_random_images()
    imgs = [img] + imgs
    batch = Batch(None, imgs)
    recognized = model.inferBatch(batch)  # recognize text
    print("Without` Correction", recognized[0])
    print("With Correction", correct_sentence(recognized[0]))
    return recognized[0]

Also, I have tried duplicating the input image 10 times rather than using function like load_different_image() and generate_random_images() but this way it don't recognize anything. Sharing the example below.

def infer(model, fnImg):
    """ Recognize text in image provided by file path """
    img = preprocessor(cv2.imread(fnImg, cv2.IMREAD_GRAYSCALE), imgSize=Model.imgSize)
    batch = Batch(None, [img]*Model.batchSize) #duplicating input image 10 times
    recognized = model.inferBatch(batch)  # recognize text

    print("Without Correction", recognized[0])
    print("With Correction", correct_sentence(recognized[0]))
    return recognized[0]

Please Help!!!

Keras implementation and model structure

Hello,

i am trying to implement this with keras,
in your code (Model.py) you have mentioned the layer filters, but i am not sure about the rnn and ctc layers.

please have a look at this

model

AttributeError: module 'tensorflow' has no attribute 'placeholder'

I ran tf_upgrade_v2 --intree Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow/ --outtree Handwritten-Line-Text-Recognition/ --reportfile changesreport.txt but I'm still getting the error. I installed tf-nightly but still same. can you please guide me?

2021-11-21 04:19:05.588993: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-11-21 04:19:05.602350: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
TensorFlow 2.0 Upgrade Script

Converted 0 files
Detected 0 issues that require attention

Make sure to read the detailed log 'changesreport.txt'

This is what I get.

About training from scratch

Hello sir,
I want to learn how to build handwritten text recognition using deep learning. Can you kindly suggest me which course to take to fully understand how to proceed with the code?

Thank you....

its about implementation

could you please explain how to implement it on ubuntu 18.04 ?can you guide me the steps needed for implementation of this project?
eagerly waiting for your answer.

beamsearch

Hi great work by you.
--beamsearch not included.

-- validate error

when I validate the data using python main.py --validate, console give me this error..

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.