Comments (3)
I check the provided 4M data json files and find that only 4441 tags exists, which is smaller than the 4585 tags as claimed in the paper. May be the other 144 tags are in the 12M dataset? Will the missing labels cause bad effects on the training of 4M dataset training? As mentioned above, i relabel the json file and the model can't perform well.
I would be truly grateful if the author could offer any insights or responses to the questions I've presented.
from recognize-anything.
I also raise two more questions. In Appendix A in paper, batch_size is 720. However, the batch_size in config file "pretrain" is set to 52. According to the paper, the model is trained by 8 A100, so the batch_size should be 52X8=416? Also, the loss is computed as the sum of (loss_tag+loss_align+loss_diss). However, i observe the magnitudes of the three losses are in the hundreds, tens, and less than one, respectively. I'm curious if simply adding losses with such significant differences in magnitude without balancing them with any weights will affect the training? Sincerely awaiting for your reply.
from recognize-anything.
Did you load the ImageNet-pretrained image encoder weights? It is fixed by subsequent PR.
from recognize-anything.
Related Issues (20)
- NameError: name '_C' is not defined HOT 1
- VisionTransformer undefined in ram.models.utils.py
- HuggingFace App is not working HOT 1
- Uncertain output results
- 【Bug】BertLayer should be used as a decoder model if cross attention is added
- finetuning on specific tag list
- How can I obtain the file ram_plus_swin_large_14m.pth? HOT 1
- how to form a ram_plus_tag_embedding_class_4585_des_51.pth for my own data. HOT 3
- Unable to proceed with command 'pip install -e .' HOT 2
- Can't load tokenizer for 'bert-base-uncased'
- tag_encoder and text_decoder HOT 1
- pip install error HOT 2
- Normalize image features while calculating the L1 loss
- i think it is the best to call it MAM(match-anything-model)
- CUDA out of memory error
- Pip Install Error HOT 1
- Checkpoints for smaller versions of Swin
- Relax transformers dependency
- Tag2Text模型微调问题
- retrieval code of Tag2text
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from recognize-anything.