GithubHelp home page GithubHelp logo

him4318 / transformer-ocr Goto Github PK

View Code? Open in Web Editor NEW
138.0 5.0 27.0 223 KB

Handwritten text recognition using transformers.

License: The Unlicense

Python 25.61% Jupyter Notebook 74.39%
handwritten-text-recognition transformer google-colab ocr-recognition deep-learning bentham pytorch detr iam python

transformer-ocr's Introduction

Hi ๐Ÿ‘‹, I'm Himanshu

A Data Scientist from India

him4318

him4318

Connect with me:

him4318

Languages and Tools:

aws docker elasticsearch flask gcp git kafka linux mongodb mysql opencv pandas postgresql postman python pytorch scikit_learn seaborn tensorflow

Support:

him4318m



ย him4318

him4318

transformer-ocr's People

Contributors

him4318 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

transformer-ocr's Issues

Google Colab link

Hi,
Is there an updated link to the Google Colab notebook because I am unable to access it through the link on the readme.

Thanks!

Need help

I need to ask a question, would you mind sharing your email with me? mine is: [email protected]. If you agree please respond to my email, since I can not send a private message here. Im not a bot or anything weird. I just need some help implementing an OCR task. thank you so very much

TypeError: iteration over a 0-d tensor

when training the model this error occurred in the first iteration

Epoch: 01 learning rate[0.0001]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-6360cfd97324> in <module>()
     15     start_time = time.time()
     16 
---> 17     train_loss,outputs = train(model,  criterion, optimizer, scheduler, train_loader)
     18     valid_loss = evaluate(model, criterion, val_loader)
     19 

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in __iter__(self)
    605         # See gh-54457
    606         if self.dim() == 0:
--> 607             raise TypeError('iteration over a 0-d tensor')
    608         if torch._C._get_tracing_state():
    609             warnings.warn('Iterating over a tensor might cause the trace to be incorrect. '

TypeError: iteration over a 0-d tensor

Notebook loading error

Hi๏ผŒthe notebook url can not open successfully. Could you add the notebook file in git project? Thanks.
image

LR and loss

You are better than me, I tried a similar and it worked very bad.
I would love to learn from yours.

What LR did you use?
What loss vs epoch did you get?

AttributeError: 'numpy.ndarray' object has no attribute 'decode'

This happens here:

import torchvision.transforms as T

device = torch.device("cuda")
transform = T.Compose([
T.ToTensor()])
tokenizer = Tokenizer(charset_base)

train_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'train',transform), batch_size=batch_size, shuffle=False, num_workers=2)
val_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'valid',transform), batch_size=batch_size, shuffle=False, num_workers=2)

The error I get is:
self.dataset[self.split]['gt'] = [x.decode() for x in self.dataset[self.split]['gt']]

AttributeError: 'numpy.ndarray' object has no attribute 'decode' #18

Different dataset

Sir is it possible that we can train this model to understand non-latin handwritten text such as arabic.

Issue with the .hdf5 file

Hi,

I've been testing your code with my own dataset and everything seems fine before the run_epochs. For some reason, when I try to make the image compatible with the resnet, it says TypeError: byte indices must be integers or slices, not tuple.

I don't know if it's something related directly to the h5py or my images itself. The images, for what I can read, are storaged like b'./data/lines/rm_149_077_021.png' (it's the place where I keep the line images of my dataset).

Thanks

Rimes Dataset Access

HI @him4318 thanks for the repo, I have sent over an NDA ro the Rimes a2ai website but have not gotten access to it, if you have the data on a google drive, is there anyway I can get access to it?

Question regarding error metrics/dataset creation

I had a few questions/clarifications regarding the hdf5 dataset that was linked on the notebook:

  1. I ran the notebook for training from scratch using the existing hdf5 and obtained a CER of ~0.09 using just a single model (and not an ensemble).
  2. When creating the hdf5 from scratch and running the training procedure my CER is similar to the best/second best models (~0.16-0.18).

So, as far as I can see the main difference would be in the dataset generation/preprocessing steps or the tokenizer:
a. In the notebook there's a comment that the pretained models used a vocab size of 100 as opposed to 99 (95 characters + SOS/EOS/PAD/UNK tokens)- is there an additional token used here?
b. Was the generation procedure of the hdf5 that was linked/on the google drive a little different?

Thank you!

Dataset Links

Rimes and Bentham Dataset links can't be opened! Please share some alternate links.!

Invalid argument:Not enough time for target transition sequence

Hi, i tried to run the models designed by arthurflor23 in the different dataset, but it gives me this error. My images dimensions Are (137,518) and max_len of text is 137. Any idea about how can i solve this issue?

https://github.com/arthurflor23/handwritten-text-recognition

Invalid argument: Not enough time for target transition sequence (required: 112, available: 35)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.