GithubHelp home page GithubHelp logo

mjq11302010044 / tatt Goto Github PK

View Code? Open in Web Editor NEW
163.0 163.0 17.0 14.61 MB

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

License: MIT License

Python 99.92% Shell 0.08%

tatt's Introduction

Jianqi Ma

Jianqi Ma's GitHub stats

tatt's People

Contributors

mjq11302010044 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tatt's Issues

Where is RPE used?

I've read the supplement describing the details of the recurrent positional encoding (RPE).

However, I cannot seem to find the code which RPE is implemented and used.

Would the authors kindly point out where the implementation of RPE is in the released codebase?

IndexError: too many index for tensor of dimension 4

Thanks for sharing ! when I testing with code vis = true. The error came up about : too many index for tensor of dimension 4 . I checked the size of the input , its (16, 1 , 32 , 100), what should i do next ?

A request for help about train process,log file,code arguments.

Thanks for your work. I thoroughly enjoyed reading the paper and I feel confused about the train process,I notice that you set two training orders,and the second order seems to fine tune the first-training result.Is that the necessary procedure for the whole training? Could you give me some specifics about this issue?
In addition, after training, I only found the weight file with '.th' as suffix under the folder ckpt/TATT or ckpt/TATT_ft, it seems that there is no log file.Where I can find it?
About the parser.add_argument() function in your code file. I noticed that the help parameter is left blank for several arguments in the parser.Could you please provide a brief description for these arguments? This would be greatly helpful for users who are new to your program, as it would give them a better understanding of the purpose of each argument and how to use them correctly.
I understand that you may be busy, but if you could spare a moment to update the help messages, it would be much appreciated.
I'm just a beginner,forgive my ignorance.Again,thank you so much!

Pretrained model

Accepted from CVPR 2022 a long time ago, and still there is no pertained model for TextZoom.
Authors mentioned this problem in closed issue #1, said they will release pre-trained model in later release version, but still don't upload checkpoint for measuring their performance.

images_sr = model(images_hr)

At line 1734 in interfaces/super_resolution.py,

images_sr = model(images_hr)

It seems that HR images are input into the model. Is this correct specification?

location of code about your text prior architecture

Can you point where is the location of your text prior architecture in your code, please?
I really want to know how your architecture use the output of CRNN as text prior.
It is hard to find it.

And another question, does it seem that your code don't use text prior in testing?

How to set up training on other data sets?

Thanks for sharing! If I want to use the tatt model proposed in this paper to train non MDB dataset files (such as datasets packaged in traditional image format), where should I modify the code.

Table5精度指标问题

Table5精度指标,代码中只统计小写字母和数字,不考虑大写字母和标点符号吗?这是TextZoom惯用的统计方式吗?

Is the OCR evaluation model (ster\crnn) and tatt end-to-end?

Is the OCR evaluation model (ster\crnn) and tatt end-to-end ? OR first use SR model to output results, and then input OCR?
just like the code below:
def getitem(self, index):
...
...
label_str = str_filt(word, self.voc_type)
return img_HR, img_lr, img_HRy, img_lry, label_str

Does “label_str” participate in the training of the whole model?

I meet a problem in training "No such file or directory: 'ckpt/TATT/model_best_acc_0.pth' ", how can I solve it ?

Thanks for you work.I meet a problem in training.How can I solve it?
First, I meet this problem.

No such file or directory: 'ckpt/TATT/

Then, I make a directory named 'TATT' in 'ckpt', but I meet another problem

No such file or directory: 'ckpt/TATT/model_best_acc_0.pth' 

I'm working on my graduation project, and I need to reproduce your code, so it's important to me.Thanks for your work, and looking forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.