Can I remove this part from the source code? <a href="https://github.com/xinyu1205/rec

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It's common, since the reduction of tagging loss is sum(). <a href="https://github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Is `loss_t2t` in RAM necessary? Can I just remove it? about recognize-anything HOT 7 CLOSED

felixfuu commented on September 26, 2024

Is `loss_t2t` in RAM necessary? Can I just remove it?

from recognize-anything.

Comments (7)

xinyu1205 commented on September 26, 2024

Hi, sorry for the delay reply due to my busy schedule.
Yes, Tag-Text-Generation is unnecessary, if you only have image-tag dataset.
By the way, recently I found a miss in the released code, if you train from scratch, don't forget to load the pertained Swin backbone.

            if stage == 'train_from_scratch':
                # download from https://github.com/microsoft/Swin-Transformer
                state_dict = torch.load(vision_config['ckpt'], map_location="cpu")['model']

                for k in list(state_dict.keys()):
                    if 'relative_position_bias_table' in k:
                        dst_num_pos = (2 * vision_config['window_size'] - 1) ** 2
                        state_dict[k] = interpolate_relative_pos_embed(state_dict[k], dst_num_pos, param_name=k)
                    elif ('relative_position_index' in k) or ('attn_mask' in k):
                        del state_dict[k]
                print("### Load Vision BackboneL", vit)
                msg = self.visual_encoder.load_state_dict(state_dict, strict = False)
                print("missing_keys: ", msg.missing_keys)
                print("unexpected_keys: ", msg.unexpected_keys)

from recognize-anything.

felixfuu commented on September 26, 2024

Thanks for the reply.

from recognize-anything.

felixfuu commented on September 26, 2024

@xinyu1205
batch_size=32, 8*a100, loss_tag is very large,

`
Train Epoch: [0] [ 50/14844] eta: 5:44:45 lr: 0.000017 loss_tag: 437.2372 loss_dis: 0.4514 time: 1.1722 data: 0.1298 max mem: 24075

Train Epoch: [0] [ 100/14844] eta: 5:16:42 lr: 0.000034 loss_tag: 492.6901 loss_dis: 0.4581 time: 1.1805 data: 0.1312 max mem: 24075

Train Epoch: [0] [ 150/14844] eta: 5:06:15 lr: 0.000050 loss_tag: 458.0364 loss_dis: 0.4138 time: 1.1658 data: 0.1228 max mem: 24075

Train Epoch: [0] [ 200/14844] eta: 4:59:15 lr: 0.000067 loss_tag: 398.1259 loss_dis: 0.3213 time: 1.1435 data: 0.1240 max mem: 24075

Train Epoch: [0] [ 250/14844] eta: 4:53:46 lr: 0.000083 loss_tag: 440.8609 loss_dis: 0.3175 time: 1.1279 data: 0.1260 max mem: 24075

Train Epoch: [0] [ 300/14844] eta: 4:49:10 lr: 0.000100 loss_tag: 388.8815 loss_dis: 0.2827 time: 1.1128 data: 0.1257 max mem: 24075

Train Epoch: [0] [ 350/14844] eta: 4:45:09 lr: 0.000100 loss_tag: 360.2780 loss_dis: 0.2842 time: 1.0958 data: 0.1268 max mem: 24075

Train Epoch: [0] [ 400/14844] eta: 4:41:33 lr: 0.000100 loss_tag: 429.6051 loss_dis: 0.2782 time: 1.0945 data: 0.1249 max mem: 24075

Train Epoch: [0] [ 450/14844] eta: 4:38:28 lr: 0.000100 loss_tag: 360.5910 loss_dis: 0.2774 time: 1.0925 data: 0.1237 max mem: 24075

Train Epoch: [0] [ 500/14844] eta: 4:36:15 lr: 0.000100 loss_tag: 388.7999 loss_dis: 0.2806 time: 1.1121 data: 0.1259 max mem: 24075

Train Epoch: [0] [ 550/14844] eta: 4:34:38 lr: 0.000100 loss_tag: 336.6460 loss_dis: 0.2823 time: 1.1294 data: 0.1218 max mem: 24075

Train Epoch: [0] [ 600/14844] eta: 4:33:22 lr: 0.000100 loss_tag: 335.7622 loss_dis: 0.2787 time: 1.1403 data: 0.1255 max mem: 24075

Train Epoch: [0] [ 650/14844] eta: 4:32:21 lr: 0.000100 loss_tag: 497.8206 loss_dis: 0.2815 time: 1.1540 data: 0.1272 max mem: 24075

Train Epoch: [0] [ 700/14844] eta: 4:31:29 lr: 0.000100 loss_tag: 1359.2052 loss_dis: 0.4506 time: 1.1637 data: 0.1278 max mem: 24075

Train Epoch: [0] [ 750/14844] eta: 4:30:50 lr: 0.000100 loss_tag: 496.2915 loss_dis: 0.4714 time: 1.1740 data: 0.1257 max mem: 24075

Train Epoch: [0] [ 800/14844] eta: 4:30:14 lr: 0.000100 loss_tag: 468.4954 loss_dis: 0.4744 time: 1.1810 data: 0.1311 max mem: 24075

Train Epoch: [0] [ 850/14844] eta: 4:29:37 lr: 0.000100 loss_tag: 435.1667 loss_dis: 0.4665 time: 1.1759 data: 0.1290 max mem: 24075

Train Epoch: [0] [ 900/14844] eta: 4:28:58 lr: 0.000100 loss_tag: 460.7809 loss_dis: 0.4758 time: 1.1790 data: 0.1286 max mem: 24075
`

from recognize-anything.

xinyu1205 commented on September 26, 2024

It's common, since the reduction of tagging loss is sum().
https://github.com/xinyu1205/recognize-anything/blob/main/ram/models/utils.py#L326
Different loss scale is balanced with detach().
https://github.com/xinyu1205/recognize-anything/blob/main/pretrain.py#L121
Moreover, since RAM converges quickly, you can evaluate performance after training fewer epochs.

from recognize-anything.

felixfuu commented on September 26, 2024

https://github.com/xinyu1205/recognize-anything/blob/main/pretrain.py#L121

@xinyu1205
I only have loss_tag and loss_dis. Do I need to balance these two losses?

from recognize-anything.

xinyu1205 commented on September 26, 2024

If you want to highlight the open set tagging ability, I think balance is better.

from recognize-anything.

felixfuu commented on September 26, 2024

@xinyu1205
Could you please provide the training log? loss_t2t , loss_tag and loss_dis

from recognize-anything.

Is `loss_t2t` in RAM necessary? Can I just remove it? about recognize-anything HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs