mjq11302010044 / tatt Goto Github PK
View Code? Open in Web Editor NEWA Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)
License: MIT License
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)
License: MIT License
How to create my own LMDB DataSets like textzoom to train model,how to use datasetFiles anyone knows,plz help
I've read the supplement describing the details of the recurrent positional encoding (RPE).
However, I cannot seem to find the code which RPE is implemented and used.
Would the authors kindly point out where the implementation of RPE is in the released codebase?
I have noticed that when i training the data. there is an output print called "save display images",but i can't find where the display images are, please help me, thank you very much!
@mjq11302010044 'RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED' occurred after several training epoches.I do not know why this happened.
Would it be possible to make a pre-trained model available?
In which directory should I put the TextZoom dataset?
Thanks for sharing ! when I testing with code vis = true. The error came up about : too many index for tensor of dimension 4 . I checked the size of the input , its (16, 1 , 32 , 100), what should i do next ?
Thanks for your work. I thoroughly enjoyed reading the paper and I feel confused about the train process,I notice that you set two training orders,and the second order seems to fine tune the first-training result.Is that the necessary procedure for the whole training? Could you give me some specifics about this issue?
In addition, after training, I only found the weight file with '.th' as suffix under the folder ckpt/TATT or ckpt/TATT_ft, it seems that there is no log file.Where I can find it?
About the parser.add_argument() function in your code file. I noticed that the help parameter is left blank for several arguments in the parser.Could you please provide a brief description for these arguments? This would be greatly helpful for users who are new to your program, as it would give them a better understanding of the purpose of each argument and how to use them correctly.
I understand that you may be busy, but if you could spare a moment to update the help messages, it would be much appreciated.
I'm just a beginner,forgive my ignorance.Again,thank you so much!
Accepted from CVPR 2022 a long time ago, and still there is no pertained model for TextZoom.
Authors mentioned this problem in closed issue #1, said they will release pre-trained model in later release version, but still don't upload checkpoint for measuring their performance.
At line 1734 in interfaces/super_resolution.py
,
images_sr = model(images_hr)
It seems that HR images are input into the model. Is this correct specification?
Can you point where is the location of your text prior architecture in your code, please?
I really want to know how your architecture use the output of CRNN as text prior.
It is hard to find it.
And another question, does it seem that your code don't use text prior in testing?
Thanks for sharing! If I want to use the tatt model proposed in this paper to train non MDB dataset files (such as datasets packaged in traditional image format), where should I modify the code.
Table5精度指标,代码中只统计小写字母和数字,不考虑大写字母和标点符号吗?这是TextZoom惯用的统计方式吗?
At lines 1918-1921 in dataset/dataset.py
,
if len(word) > 4:
word = [ch for ch in word]
word[2] = "e"
word = "".join(word)
the letter "e" seems to be inserted into gt labels. What is the intention behind this process?
Is the OCR evaluation model (ster\crnn) and tatt end-to-end ? OR first use SR model to output results, and then input OCR?
just like the code below:
def getitem(self, index):
...
...
label_str = str_filt(word, self.voc_type)
return img_HR, img_lr, img_HRy, img_lry, label_str
Does “label_str” participate in the training of the whole model?
Thanks for you work.I meet a problem in training.How can I solve it?
First, I meet this problem.
No such file or directory: 'ckpt/TATT/
Then, I make a directory named 'TATT' in 'ckpt', but I meet another problem
No such file or directory: 'ckpt/TATT/model_best_acc_0.pth'
I'm working on my graduation project, and I need to reproduce your code, so it's important to me.Thanks for your work, and looking forward to your reply!
Great job! Thank you for sharing the code.
Do you have any plan on chinese text enhancement?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.