GithubHelp home page GithubHelp logo

ku21fan / str-fewer-labels Goto Github PK

View Code? Open in Web Editor NEW
170.0 170.0 27.0 1.65 MB

Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)

License: MIT License

Python 11.36% Jupyter Notebook 88.64%
cvpr2021 deep-learning ocr real-datasets scene-text-recognition self-supervised-learning semi-supervised-learning synthetic-data text-recognition

str-fewer-labels's Introduction

Regarding the deep-text-recognition-benchmark repository, after moving from NAVER Clova to the University of Tokyo in 2020, I stopped managing it. Fortunately, most issues are already answered in GitHub issues. If you cannot find the answer, ask me via email. However, I cannot answer the questions related to the Clova OCR service.

str-fewer-labels's People

Contributors

ku21fan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

str-fewer-labels's Issues

size mismatch for module.prediction.weight/bias

Hi,

I have done the pre-train and train part with no errors and got pretty good accuracy. but for the test part I have got this error that says the torch size is different of 216 and 218. you can see the full error below.
I would appreciate any help.
Thanks in advance.

test error:

4_test_3
4_test_2

train accuracy:

4_train_23

question about the experimental results

Dear author, when I use the pretrained model of CRNN, the accuracy of each evaluation dataset cannot reach the result reported in the paper. I wonder what is wrong with that? Thank you.

Unlabeled lmdb dataset

Hi

I want to create my own dataset in Persian language.
I have used create_lmdb_dataset.py for labeled data and it worked.
but for unlabeled data that we do not have any labels, what must gtfile.txt contain?
Or should I try another code?

Thanks in advance.

Unable to open data files

Hi!

I'm having trouble opening the data files in this repository. When I try to open them using Microsoft Access, I receive an error message that says "unrecognized database format" (see image attached). How can I solve the problem? Is it possible to download the data in any other format?

Thank you for your help :)

CaptureMaria

About experiment with pseudo label

Thank you for your gorgeous research. I have some questions about your experiment (focus on Semi-Supervised Learning):

  1. When you use pseudo labels to retrain your model, do you select all of them or only choose data having high confidence scores?
  2. The model you used for the pseudo labels step was fixed?
  3. You tried to use unlabeled real-world data for Semi-Supervised Learning on Synthetic data and fall. Can you provide me with some detail about this experiment (setting, hyperparameter, protocol)?
  4. Finally, I will be very grateful if you give me some of your comments about the failed experiment mentioned in the question above.

From the bottom of my heart, I'd be grateful if I could get a response from you. It will help my project a lot.

About image input size

Hi, thanks for your great code and dataset!

My question is about the image input size in training. I see the default image input size is 32x100.
I am a newbie in OCR tasks, is this input size a regular input size for ocr model or, did you choose this input size for other reasons?

Since this input size seems to be small for some complex datasets such as COCO-Text

instance discrimination task

You mentioned ''we use an instance discrimination task as a pretext task" in your paper. I'm really confuse which task should be used. As I understand, you feed text images to encoder and train MOCO with only Resnet and then frezze resnet and then train with Bilstm and attention. beside that, I'm confuse with the idea that you train full TRBA with moco and I don't know the next step of 2 stages moco training methods. Feel free to correct me if i misunderstood.

Network architecture

Hi,

Any Idea how to plot or draw the network Architecture?
I have used self-supervised with TRBA and RotNet
Thanks.

Result in evaluation

I am training models in Vietnamese. I modified the character to match Vietnamese and train with the following command:
!CUDA_VISIBLE_DEVICES=0 python train.py
--select_data /
--model_name TRBA
--exp_name CRNN_aug
--Aug Blur5-Crop99
--train_data train
--valid_data val
--character 'aàảãáạăằẳẵắặâầẩẫấậbcdđeèẻẽéẹêềểễếệfghiìỉĩíịjklmnoòỏõóọôồổỗốộơờởỡớợpqrstuùủũúụưừửữứựvwxyỳỷỹýỵz0123456789'
--batch_ratio 1

After 2000 iters, the model evaluates, I wonder why the model predicts [UNK] as shown below. So is my training correct?

image

ablation experiment question

Thanks for your excellent work, you have tried a lot of benchmark and ablation experiment, that's very nice of you. and I have a question, I was just wonder if you train the semi- and self supervised model in the same steps (200k)? Because I notice train moco or pseudo label in a larger steps may have a good result. so I was just wondering if you have contrast the result in 200k steps, 300k steps.

Predict after training on custom dataset

Hi, I have trained on Vietnamese custom dataset with this command:

!CUDA_VISIBLE_DEVICES=0 python train.py \
                --select_data / \
                --model_name TRBA \
                --exp_name CRNN_aug \
                --Aug Blur5-Crop99  \
                --train_data train \
                --valid_data val \
                --character 'aàảãáạăằẳẵắặâầẩẫấậbcdđeèẻẽéẹêềểễếệfghiìỉĩíịjklmnoòỏõóọôồổỗốộơờởỡớợpqrstuùủũúụưừửữứựvwxyỳỷỹýỵz0123456789' \
                --batch_ratio 1 

after 50k iters, i predicted with best_score.pth by this command:

!python demo.py \
        --model_name TRBA \
        --image_folder /content/drive/MyDrive/UIT_CHALLENGE_2022/result/crop_img \
        --saved_model /content/drive/MyDrive/UIT_CHALLENGE_2022/src/STR-Fewer-Labels/best_score_new_reg.pth

The error:

of tokens and characters: 99

model input parameters 32 100 20 3 512 256 99 25 TPS ResNet BiLSTM Attn
loading pretrained model from /content/drive/MyDrive/UIT_CHALLENGE_2022/src/STR-Fewer-Labels/best_score_new_reg.pth
Traceback (most recent call last):
File "demo.py", line 194, in
demo(opt)
File "demo.py", line 47, in demo
model.load_state_dict(torch.load(opt.saved_model, map_location=device))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1667, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.Prediction.generator.weight: copying a param with shape torch.Size([108, 256]) from checkpoint, the shape in current model is torch.Size([99, 256]).
size mismatch for module.Prediction.generator.bias: copying a param with shape torch.Size([108]) from checkpoint, the shape in current model is torch.Size([99]).
size mismatch for module.Prediction.char_embeddings.weight: copying a param with shape torch.Size([108, 256]) from checkpoint, the shape in current model is torch.Size([99, 256]).

Pls helps, thanks!

Train from checkpoint

Hi,
I train model on colab, after a while it will be disconnected. I can train again after a few hours, but I have to train again from the author's checkpoint. How can I continue to train from the checkpoint (weight) that I trained last time.

Thanks!

Providing a processed dataset

It is quite impressive research.
Do you plan to provide a processed dataset used for training and validation such as Real-L and Real-U?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.