him4318 / transformer-ocr Goto Github PK

View Code? Open in Web Editor NEW

138.0 5.0 27.0 223 KB

Handwritten text recognition using transformers.

License: The Unlicense

Python 25.61% Jupyter Notebook 74.39%

handwritten-text-recognition transformer google-colab ocr-recognition deep-learning bentham pytorch detr iam python

transformer-ocr's Introduction

Hi 👋, I'm Himanshu

A Data Scientist from India

Connect with me:

Languages and Tools:

Support:

transformer-ocr's People

Contributors

Stargazers

Watchers

transformer-ocr's Issues

Low performance when training your model on RWTH split on IAM

Hi,

I recently trained and tested your model on the RWTH split of IAM http://www.openslr.org/56/
I got the following performance: 55.9% CER 77.22% WER 99.89% SER
This scores are relatively low for any benchmark. Is this something expected?
I used the tutorial code you propose on your README.

Thank you,

Google Colab link

Hi,
Is there an updated link to the Google Colab notebook because I am unable to access it through the link on the readme.

Thanks!

Need help

I need to ask a question, would you mind sharing your email with me? mine is: [email protected]. If you agree please respond to my email, since I can not send a private message here. Im not a bot or anything weird. I just need some help implementing an OCR task. thank you so very much

TypeError: iteration over a 0-d tensor

when training the model this error occurred in the first iteration

Epoch: 01 learning rate[0.0001]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-6360cfd97324> in <module>()
     15     start_time = time.time()
     16 
---> 17     train_loss,outputs = train(model,  criterion, optimizer, scheduler, train_loader)
     18     valid_loss = evaluate(model, criterion, val_loader)
     19 

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in __iter__(self)
    605         # See gh-54457
    606         if self.dim() == 0:
--> 607             raise TypeError('iteration over a 0-d tensor')
    608         if torch._C._get_tracing_state():
    609             warnings.warn('Iterating over a tensor might cause the trace to be incorrect. '

TypeError: iteration over a 0-d tensor

Notebook loading error

Hi，the notebook url can not open successfully. Could you add the notebook file in git project? Thanks.

Sain Gall and Washington links doesnt work

When I open the links for Sain Gall and Washington I get URL not found.

LR and loss

You are better than me, I tried a similar and it worked very bad.
I would love to learn from yours.

What LR did you use?
What loss vs epoch did you get?

AttributeError: 'numpy.ndarray' object has no attribute 'decode'

This happens here:

import torchvision.transforms as T

device = torch.device("cuda")
transform = T.Compose([
T.ToTensor()])
tokenizer = Tokenizer(charset_base)

train_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'train',transform), batch_size=batch_size, shuffle=False, num_workers=2)
val_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'valid',transform), batch_size=batch_size, shuffle=False, num_workers=2)

The error I get is:
self.dataset[self.split]['gt'] = [x.decode() for x in self.dataset[self.split]['gt']]

AttributeError: 'numpy.ndarray' object has no attribute 'decode' #18

Different dataset

Sir is it possible that we can train this model to understand non-latin handwritten text such as arabic.

how to replace the training process of CNN(resnet101) with a pretrained one ?

Issue with the .hdf5 file

Hi,

I've been testing your code with my own dataset and everything seems fine before the run_epochs. For some reason, when I try to make the image compatible with the resnet, it says TypeError: byte indices must be integers or slices, not tuple.

I don't know if it's something related directly to the h5py or my images itself. The images, for what I can read, are storaged like b'./data/lines/rm_149_077_021.png' (it's the place where I keep the line images of my dataset).

Thanks

Rimes Dataset Access

HI @him4318 thanks for the repo, I have sent over an NDA ro the Rimes a2ai website but have not gotten access to it, if you have the data on a google drive, is there anyway I can get access to it?

Question regarding error metrics/dataset creation

I had a few questions/clarifications regarding the hdf5 dataset that was linked on the notebook:

I ran the notebook for training from scratch using the existing hdf5 and obtained a CER of ~0.09 using just a single model (and not an ensemble).
When creating the hdf5 from scratch and running the training procedure my CER is similar to the best/second best models (~0.16-0.18).

So, as far as I can see the main difference would be in the dataset generation/preprocessing steps or the tokenizer:
a. In the notebook there's a comment that the pretained models used a vocab size of 100 as opposed to 99 (95 characters + SOS/EOS/PAD/UNK tokens)- is there an additional token used here?
b. Was the generation procedure of the hdf5 that was linked/on the google drive a little different?

Thank you!

The link of 'Rimes' dataset can't be opened. Can you share it in some other way? Thansk!

Dataset Links

Rimes and Bentham Dataset links can't be opened! Please share some alternate links.!

Invalid argument:Not enough time for target transition sequence

Hi, i tried to run the models designed by arthurflor23 in the different dataset, but it gives me this error. My images dimensions Are (137,518) and max_len of text is 137. Any idea about how can i solve this issue?

https://github.com/arthurflor23/handwritten-text-recognition

Invalid argument: Not enough time for target transition sequence (required: 112, available: 35)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs

TypeError: normalize() argument 2 must be str, not numpy.ndarray

Hi! I'm getting this error
TypeError: normalize() argument 2 must be str, not numpy.ndarray

for the line of code:

    text = unicodedata.normalize("NFKD", text).encode("ascii", "ignore").decode("ascii")

Please help

RuntimeError: shape '[-1, 100]' is invalid for input of size 201168

Hi. I ran the Transformer-ocr.ipynb file on Google Colab without making any edits. and yields an error at the start of training in, the modules in colab notebook isn't the same in the repo code.

Using my own dataset to run prediction

Hi,

Do I need to convert my own images to .hdf5 format before I can make a prediction on them?
What structure should my data have?