Comments (7)
Hi, sorry for the delay reply due to my busy schedule.
Yes, Tag-Text-Generation is unnecessary, if you only have image-tag dataset.
By the way, recently I found a miss in the released code, if you train from scratch, don't forget to load the pertained Swin backbone.
if stage == 'train_from_scratch':
# download from https://github.com/microsoft/Swin-Transformer
state_dict = torch.load(vision_config['ckpt'], map_location="cpu")['model']
for k in list(state_dict.keys()):
if 'relative_position_bias_table' in k:
dst_num_pos = (2 * vision_config['window_size'] - 1) ** 2
state_dict[k] = interpolate_relative_pos_embed(state_dict[k], dst_num_pos, param_name=k)
elif ('relative_position_index' in k) or ('attn_mask' in k):
del state_dict[k]
print("### Load Vision BackboneL", vit)
msg = self.visual_encoder.load_state_dict(state_dict, strict = False)
print("missing_keys: ", msg.missing_keys)
print("unexpected_keys: ", msg.unexpected_keys)
from recognize-anything.
Thanks for the reply.
from recognize-anything.
@xinyu1205
batch_size=32, 8*a100, loss_tag is very large,
`
Train Epoch: [0] [ 50/14844] eta: 5:44:45 lr: 0.000017 loss_tag: 437.2372 loss_dis: 0.4514 time: 1.1722 data: 0.1298 max mem: 24075
Train Epoch: [0] [ 100/14844] eta: 5:16:42 lr: 0.000034 loss_tag: 492.6901 loss_dis: 0.4581 time: 1.1805 data: 0.1312 max mem: 24075
Train Epoch: [0] [ 150/14844] eta: 5:06:15 lr: 0.000050 loss_tag: 458.0364 loss_dis: 0.4138 time: 1.1658 data: 0.1228 max mem: 24075
Train Epoch: [0] [ 200/14844] eta: 4:59:15 lr: 0.000067 loss_tag: 398.1259 loss_dis: 0.3213 time: 1.1435 data: 0.1240 max mem: 24075
Train Epoch: [0] [ 250/14844] eta: 4:53:46 lr: 0.000083 loss_tag: 440.8609 loss_dis: 0.3175 time: 1.1279 data: 0.1260 max mem: 24075
Train Epoch: [0] [ 300/14844] eta: 4:49:10 lr: 0.000100 loss_tag: 388.8815 loss_dis: 0.2827 time: 1.1128 data: 0.1257 max mem: 24075
Train Epoch: [0] [ 350/14844] eta: 4:45:09 lr: 0.000100 loss_tag: 360.2780 loss_dis: 0.2842 time: 1.0958 data: 0.1268 max mem: 24075
Train Epoch: [0] [ 400/14844] eta: 4:41:33 lr: 0.000100 loss_tag: 429.6051 loss_dis: 0.2782 time: 1.0945 data: 0.1249 max mem: 24075
Train Epoch: [0] [ 450/14844] eta: 4:38:28 lr: 0.000100 loss_tag: 360.5910 loss_dis: 0.2774 time: 1.0925 data: 0.1237 max mem: 24075
Train Epoch: [0] [ 500/14844] eta: 4:36:15 lr: 0.000100 loss_tag: 388.7999 loss_dis: 0.2806 time: 1.1121 data: 0.1259 max mem: 24075
Train Epoch: [0] [ 550/14844] eta: 4:34:38 lr: 0.000100 loss_tag: 336.6460 loss_dis: 0.2823 time: 1.1294 data: 0.1218 max mem: 24075
Train Epoch: [0] [ 600/14844] eta: 4:33:22 lr: 0.000100 loss_tag: 335.7622 loss_dis: 0.2787 time: 1.1403 data: 0.1255 max mem: 24075
Train Epoch: [0] [ 650/14844] eta: 4:32:21 lr: 0.000100 loss_tag: 497.8206 loss_dis: 0.2815 time: 1.1540 data: 0.1272 max mem: 24075
Train Epoch: [0] [ 700/14844] eta: 4:31:29 lr: 0.000100 loss_tag: 1359.2052 loss_dis: 0.4506 time: 1.1637 data: 0.1278 max mem: 24075
Train Epoch: [0] [ 750/14844] eta: 4:30:50 lr: 0.000100 loss_tag: 496.2915 loss_dis: 0.4714 time: 1.1740 data: 0.1257 max mem: 24075
Train Epoch: [0] [ 800/14844] eta: 4:30:14 lr: 0.000100 loss_tag: 468.4954 loss_dis: 0.4744 time: 1.1810 data: 0.1311 max mem: 24075
Train Epoch: [0] [ 850/14844] eta: 4:29:37 lr: 0.000100 loss_tag: 435.1667 loss_dis: 0.4665 time: 1.1759 data: 0.1290 max mem: 24075
Train Epoch: [0] [ 900/14844] eta: 4:28:58 lr: 0.000100 loss_tag: 460.7809 loss_dis: 0.4758 time: 1.1790 data: 0.1286 max mem: 24075
`
from recognize-anything.
It's common, since the reduction of tagging loss is sum().
https://github.com/xinyu1205/recognize-anything/blob/main/ram/models/utils.py#L326
Different loss scale is balanced with detach().
https://github.com/xinyu1205/recognize-anything/blob/main/pretrain.py#L121
Moreover, since RAM converges quickly, you can evaluate performance after training fewer epochs.
from recognize-anything.
https://github.com/xinyu1205/recognize-anything/blob/main/pretrain.py#L121
@xinyu1205
I only have loss_tag
and loss_dis
. Do I need to balance these two losses?
from recognize-anything.
If you want to highlight the open set tagging ability, I think balance is better.
from recognize-anything.
@xinyu1205
Could you please provide the training log? loss_t2t
, loss_tag
and loss_dis
from recognize-anything.
Related Issues (20)
- NameError: name '_C' is not defined HOT 1
- VisionTransformer undefined in ram.models.utils.py
- HuggingFace App is not working HOT 1
- Uncertain output results
- 【Bug】BertLayer should be used as a decoder model if cross attention is added
- finetuning on specific tag list
- How can I obtain the file ram_plus_swin_large_14m.pth? HOT 1
- how to form a ram_plus_tag_embedding_class_4585_des_51.pth for my own data. HOT 3
- Unable to proceed with command 'pip install -e .' HOT 2
- Can't load tokenizer for 'bert-base-uncased'
- tag_encoder and text_decoder HOT 1
- pip install error HOT 2
- Normalize image features while calculating the L1 loss
- i think it is the best to call it MAM(match-anything-model)
- CUDA out of memory error
- Pip Install Error HOT 1
- Checkpoints for smaller versions of Swin
- Relax transformers dependency
- Tag2Text模型微调问题
- retrieval code of Tag2text
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from recognize-anything.