ku21fan / str-fewer-labels Goto Github PK

Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)

License: MIT License

Python 11.36% Jupyter Notebook 88.64%

cvpr2021 deep-learning ocr real-datasets scene-text-recognition self-supervised-learning semi-supervised-learning synthetic-data text-recognition

str-fewer-labels's Introduction

Regarding the deep-text-recognition-benchmark repository, after moving from NAVER Clova to the University of Tokyo in 2020, I stopped managing it. Fortunately, most issues are already answered in GitHub issues. If you cannot find the answer, ask me via email. However, I cannot answer the questions related to the Clova OCR service.

str-fewer-labels's People

Contributors

Stargazers

Watchers

str-fewer-labels's Issues

Train CRNN with semi-supervised methods Pseudo Label (PL) miss "saved_model"

The step "Train CRNN with semi-supervised methods Pseudo Label (PL)" is incomplete as it does not include the required argument "saved_model".

failed to download data_CVPR2021.zip

I try to download the data from the link:https://www.dropbox.com/sh/1s6r4slurc5ei2n/AACg6TqoDfGdKe8t40Em1fgxa?dl=0&preview=data_CVPR2021.zip on different computers connecting different networks, but all failed. Is there any problems about whole data？
At your convenience, would you please send me the data（excluding synthetic data）via the mail？my email:[email protected]

size mismatch for module.prediction.weight/bias

Hi,

I have done the pre-train and train part with no errors and got pretty good accuracy. but for the test part I have got this error that says the torch size is different of 216 and 218. you can see the full error below.
I would appreciate any help.
Thanks in advance.

test error:

train accuracy:

question about the experimental results

Dear author, when I use the pretrained model of CRNN, the accuracy of each evaluation dataset cannot reach the result reported in the paper. I wonder what is wrong with that? Thank you.

Unlabeled lmdb dataset

I want to create my own dataset in Persian language.
I have used create_lmdb_dataset.py for labeled data and it worked.
but for unlabeled data that we do not have any labels, what must gtfile.txt contain?
Or should I try another code?

Thanks in advance.

Unable to open data files

Hi!

I'm having trouble opening the data files in this repository. When I try to open them using Microsoft Access, I receive an error message that says "unrecognized database format" (see image attached). How can I solve the problem? Is it possible to download the data in any other format?

Thank you for your help :)

About experiment with pseudo label

Thank you for your gorgeous research. I have some questions about your experiment (focus on Semi-Supervised Learning):

When you use pseudo labels to retrain your model, do you select all of them or only choose data having high confidence scores?
The model you used for the pseudo labels step was fixed?
You tried to use unlabeled real-world data for Semi-Supervised Learning on Synthetic data and fall. Can you provide me with some detail about this experiment (setting, hyperparameter, protocol)?
Finally, I will be very grateful if you give me some of your comments about the failed experiment mentioned in the question above.

From the bottom of my heart, I'd be grateful if I could get a response from you. It will help my project a lot.

About image input size

Hi, thanks for your great code and dataset!

My question is about the image input size in training. I see the default image input size is 32x100.
I am a newbie in OCR tasks, is this input size a regular input size for ocr model or, did you choose this input size for other reasons?

Since this input size seems to be small for some complex datasets such as COCO-Text

pretrained-weights of RotNet

It is quite impressive research.
Could you share pretrained-weights of RotNet?
Thanks for your helps!

instance discrimination task

You mentioned ''we use an instance discrimination task as a pretext task" in your paper. I'm really confuse which task should be used. As I understand, you feed text images to encoder and train MOCO with only Resnet and then frezze resnet and then train with Bilstm and attention. beside that, I'm confuse with the idea that you train full TRBA with moco and I don't know the next step of 2 stages moco training methods. Feel free to correct me if i misunderstood.

Network architecture

Hi,

Any Idea how to plot or draw the network Architecture?
I have used self-supervised with TRBA and RotNet
Thanks.

Result in evaluation

I am training models in Vietnamese. I modified the character to match Vietnamese and train with the following command:
!CUDA_VISIBLE_DEVICES=0 python train.py
--select_data /
--model_name TRBA
--exp_name CRNN_aug
--Aug Blur5-Crop99
--train_data train
--valid_data val
--character 'aàảãáạăằẳẵắặâầẩẫấậbcdđeèẻẽéẹêềểễếệfghiìỉĩíịjklmnoòỏõóọôồổỗốộơờởỡớợpqrstuùủũúụưừửữứựvwxyỳỷỹýỵz0123456789'
--batch_ratio 1

After 2000 iters, the model evaluates, I wonder why the model predicts [UNK] as shown below. So is my training correct?

ablation experiment question

Thanks for your excellent work, you have tried a lot of benchmark and ablation experiment, that's very nice of you. and I have a question, I was just wonder if you train the semi- and self supervised model in the same steps (200k)? Because I notice train moco or pseudo label in a larger steps may have a good result. so I was just wondering if you have contrast the result in 200k steps, 300k steps.

Predict after training on custom dataset

Hi, I have trained on Vietnamese custom dataset with this command:

!CUDA_VISIBLE_DEVICES=0 python train.py \
                --select_data / \
                --model_name TRBA \
                --exp_name CRNN_aug \
                --Aug Blur5-Crop99  \
                --train_data train \
                --valid_data val \
                --character 'aàảãáạăằẳẵắặâầẩẫấậbcdđeèẻẽéẹêềểễếệfghiìỉĩíịjklmnoòỏõóọôồổỗốộơờởỡớợpqrstuùủũúụưừửữứựvwxyỳỷỹýỵz0123456789' \
                --batch_ratio 1

after 50k iters, i predicted with best_score.pth by this command:

!python demo.py \
        --model_name TRBA \
        --image_folder /content/drive/MyDrive/UIT_CHALLENGE_2022/result/crop_img \
        --saved_model /content/drive/MyDrive/UIT_CHALLENGE_2022/src/STR-Fewer-Labels/best_score_new_reg.pth

The error:

of tokens and characters: 99

model input parameters 32 100 20 3 512 256 99 25 TPS ResNet BiLSTM Attn
loading pretrained model from /content/drive/MyDrive/UIT_CHALLENGE_2022/src/STR-Fewer-Labels/best_score_new_reg.pth
Traceback (most recent call last):
File "demo.py", line 194, in
demo(opt)
File "demo.py", line 47, in demo
model.load_state_dict(torch.load(opt.saved_model, map_location=device))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1667, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.Prediction.generator.weight: copying a param with shape torch.Size([108, 256]) from checkpoint, the shape in current model is torch.Size([99, 256]).
size mismatch for module.Prediction.generator.bias: copying a param with shape torch.Size([108]) from checkpoint, the shape in current model is torch.Size([99]).
size mismatch for module.Prediction.char_embeddings.weight: copying a param with shape torch.Size([108, 256]) from checkpoint, the shape in current model is torch.Size([99, 256]).

Pls helps, thanks!

Train from checkpoint

Hi,
I train model on colab, after a while it will be disconnected. I can train again after a few hours, but I have to train again from the author's checkpoint. How can I continue to train from the checkpoint (weight) that I trained last time.

Thanks!

Providing a processed dataset

It is quite impressive research.
Do you plan to provide a processed dataset used for training and validation such as Real-L and Real-U?

ku21fan / str-fewer-labels Goto Github PK

str-fewer-labels's Introduction

str-fewer-labels's People

Contributors

Stargazers

Watchers

Forkers

str-fewer-labels's Issues

of tokens and characters: 99

Recommend Projects

Recommend Topics

Recommend Org

Jobs