him4318 / transformer-ocr Goto Github PK
View Code? Open in Web Editor NEWHandwritten text recognition using transformers.
License: The Unlicense
Handwritten text recognition using transformers.
License: The Unlicense
Hi,
I recently trained and tested your model on the RWTH split of IAM http://www.openslr.org/56/
I got the following performance: 55.9% CER 77.22% WER 99.89% SER
This scores are relatively low for any benchmark. Is this something expected?
I used the tutorial code you propose on your README.
Thank you,
Hi,
Is there an updated link to the Google Colab notebook because I am unable to access it through the link on the readme.
Thanks!
I need to ask a question, would you mind sharing your email with me? mine is: [email protected]. If you agree please respond to my email, since I can not send a private message here. Im not a bot or anything weird. I just need some help implementing an OCR task. thank you so very much
when training the model this error occurred in the first iteration
Epoch: 01 learning rate[0.0001]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-6360cfd97324> in <module>()
15 start_time = time.time()
16
---> 17 train_loss,outputs = train(model, criterion, optimizer, scheduler, train_loader)
18 valid_loss = evaluate(model, criterion, val_loader)
19
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in __iter__(self)
605 # See gh-54457
606 if self.dim() == 0:
--> 607 raise TypeError('iteration over a 0-d tensor')
608 if torch._C._get_tracing_state():
609 warnings.warn('Iterating over a tensor might cause the trace to be incorrect. '
TypeError: iteration over a 0-d tensor
When I open the links for Sain Gall and Washington I get URL not found.
You are better than me, I tried a similar and it worked very bad.
I would love to learn from yours.
What LR did you use?
What loss vs epoch did you get?
This happens here:
import torchvision.transforms as T
device = torch.device("cuda")
transform = T.Compose([
T.ToTensor()])
tokenizer = Tokenizer(charset_base)
train_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'train',transform), batch_size=batch_size, shuffle=False, num_workers=2)
val_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'valid',transform), batch_size=batch_size, shuffle=False, num_workers=2)
The error I get is:
self.dataset[self.split]['gt'] = [x.decode() for x in self.dataset[self.split]['gt']]
AttributeError: 'numpy.ndarray' object has no attribute 'decode' #18
Sir is it possible that we can train this model to understand non-latin handwritten text such as arabic.
Hi,
I've been testing your code with my own dataset and everything seems fine before the run_epochs. For some reason, when I try to make the image compatible with the resnet, it says TypeError: byte indices must be integers or slices, not tuple.
I don't know if it's something related directly to the h5py or my images itself. The images, for what I can read, are storaged like b'./data/lines/rm_149_077_021.png' (it's the place where I keep the line images of my dataset).
Thanks
HI @him4318 thanks for the repo, I have sent over an NDA ro the Rimes a2ai website but have not gotten access to it, if you have the data on a google drive, is there anyway I can get access to it?
I had a few questions/clarifications regarding the hdf5 dataset that was linked on the notebook:
So, as far as I can see the main difference would be in the dataset generation/preprocessing steps or the tokenizer:
a. In the notebook there's a comment that the pretained models used a vocab size of 100 as opposed to 99 (95 characters + SOS/EOS/PAD/UNK tokens)- is there an additional token used here?
b. Was the generation procedure of the hdf5 that was linked/on the google drive a little different?
Thank you!
Rimes and Bentham Dataset links can't be opened! Please share some alternate links.!
Hi, i tried to run the models designed by arthurflor23
in the different dataset, but it gives me this error. My images dimensions Are (137,518) and max_len of text is 137. Any idea about how can i solve this issue?
https://github.com/arthurflor23/handwritten-text-recognition
Invalid argument: Not enough time for target transition sequence (required: 112, available: 35)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
Hi! I'm getting this error
TypeError: normalize() argument 2 must be str, not numpy.ndarray
for the line of code:
text = unicodedata.normalize("NFKD", text).encode("ascii", "ignore").decode("ascii")
Please help
Hi,
Do I need to convert my own images to .hdf5 format before I can make a prediction on them?
What structure should my data have?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.