GithubHelp home page GithubHelp logo

Comments (7)

xinyu1205 avatar xinyu1205 commented on September 26, 2024

Hi, sorry for the delay reply due to my busy schedule.
Yes, Tag-Text-Generation is unnecessary, if you only have image-tag dataset.
By the way, recently I found a miss in the released code, if you train from scratch, don't forget to load the pertained Swin backbone.

            if stage == 'train_from_scratch':
                # download from https://github.com/microsoft/Swin-Transformer
                state_dict = torch.load(vision_config['ckpt'], map_location="cpu")['model']

                for k in list(state_dict.keys()):
                    if 'relative_position_bias_table' in k:
                        dst_num_pos = (2 * vision_config['window_size'] - 1) ** 2
                        state_dict[k] = interpolate_relative_pos_embed(state_dict[k], dst_num_pos, param_name=k)
                    elif ('relative_position_index' in k) or ('attn_mask' in k):
                        del state_dict[k]
                print("### Load Vision BackboneL", vit)
                msg = self.visual_encoder.load_state_dict(state_dict, strict = False)
                print("missing_keys: ", msg.missing_keys)
                print("unexpected_keys: ", msg.unexpected_keys)

from recognize-anything.

felixfuu avatar felixfuu commented on September 26, 2024

Thanks for the reply.

from recognize-anything.

felixfuu avatar felixfuu commented on September 26, 2024

@xinyu1205
batch_size=32, 8*a100, loss_tag is very large,

`
Train Epoch: [0] [ 50/14844] eta: 5:44:45 lr: 0.000017 loss_tag: 437.2372 loss_dis: 0.4514 time: 1.1722 data: 0.1298 max mem: 24075

Train Epoch: [0] [ 100/14844] eta: 5:16:42 lr: 0.000034 loss_tag: 492.6901 loss_dis: 0.4581 time: 1.1805 data: 0.1312 max mem: 24075

Train Epoch: [0] [ 150/14844] eta: 5:06:15 lr: 0.000050 loss_tag: 458.0364 loss_dis: 0.4138 time: 1.1658 data: 0.1228 max mem: 24075

Train Epoch: [0] [ 200/14844] eta: 4:59:15 lr: 0.000067 loss_tag: 398.1259 loss_dis: 0.3213 time: 1.1435 data: 0.1240 max mem: 24075

Train Epoch: [0] [ 250/14844] eta: 4:53:46 lr: 0.000083 loss_tag: 440.8609 loss_dis: 0.3175 time: 1.1279 data: 0.1260 max mem: 24075

Train Epoch: [0] [ 300/14844] eta: 4:49:10 lr: 0.000100 loss_tag: 388.8815 loss_dis: 0.2827 time: 1.1128 data: 0.1257 max mem: 24075

Train Epoch: [0] [ 350/14844] eta: 4:45:09 lr: 0.000100 loss_tag: 360.2780 loss_dis: 0.2842 time: 1.0958 data: 0.1268 max mem: 24075

Train Epoch: [0] [ 400/14844] eta: 4:41:33 lr: 0.000100 loss_tag: 429.6051 loss_dis: 0.2782 time: 1.0945 data: 0.1249 max mem: 24075

Train Epoch: [0] [ 450/14844] eta: 4:38:28 lr: 0.000100 loss_tag: 360.5910 loss_dis: 0.2774 time: 1.0925 data: 0.1237 max mem: 24075

Train Epoch: [0] [ 500/14844] eta: 4:36:15 lr: 0.000100 loss_tag: 388.7999 loss_dis: 0.2806 time: 1.1121 data: 0.1259 max mem: 24075

Train Epoch: [0] [ 550/14844] eta: 4:34:38 lr: 0.000100 loss_tag: 336.6460 loss_dis: 0.2823 time: 1.1294 data: 0.1218 max mem: 24075

Train Epoch: [0] [ 600/14844] eta: 4:33:22 lr: 0.000100 loss_tag: 335.7622 loss_dis: 0.2787 time: 1.1403 data: 0.1255 max mem: 24075

Train Epoch: [0] [ 650/14844] eta: 4:32:21 lr: 0.000100 loss_tag: 497.8206 loss_dis: 0.2815 time: 1.1540 data: 0.1272 max mem: 24075

Train Epoch: [0] [ 700/14844] eta: 4:31:29 lr: 0.000100 loss_tag: 1359.2052 loss_dis: 0.4506 time: 1.1637 data: 0.1278 max mem: 24075

Train Epoch: [0] [ 750/14844] eta: 4:30:50 lr: 0.000100 loss_tag: 496.2915 loss_dis: 0.4714 time: 1.1740 data: 0.1257 max mem: 24075

Train Epoch: [0] [ 800/14844] eta: 4:30:14 lr: 0.000100 loss_tag: 468.4954 loss_dis: 0.4744 time: 1.1810 data: 0.1311 max mem: 24075

Train Epoch: [0] [ 850/14844] eta: 4:29:37 lr: 0.000100 loss_tag: 435.1667 loss_dis: 0.4665 time: 1.1759 data: 0.1290 max mem: 24075

Train Epoch: [0] [ 900/14844] eta: 4:28:58 lr: 0.000100 loss_tag: 460.7809 loss_dis: 0.4758 time: 1.1790 data: 0.1286 max mem: 24075
`

from recognize-anything.

xinyu1205 avatar xinyu1205 commented on September 26, 2024

It's common, since the reduction of tagging loss is sum().
https://github.com/xinyu1205/recognize-anything/blob/main/ram/models/utils.py#L326
Different loss scale is balanced with detach().
https://github.com/xinyu1205/recognize-anything/blob/main/pretrain.py#L121
Moreover, since RAM converges quickly, you can evaluate performance after training fewer epochs.

from recognize-anything.

felixfuu avatar felixfuu commented on September 26, 2024

https://github.com/xinyu1205/recognize-anything/blob/main/pretrain.py#L121

@xinyu1205
I only have loss_tag and loss_dis. Do I need to balance these two losses?

from recognize-anything.

xinyu1205 avatar xinyu1205 commented on September 26, 2024

If you want to highlight the open set tagging ability, I think balance is better.

from recognize-anything.

felixfuu avatar felixfuu commented on September 26, 2024

@xinyu1205
Could you please provide the training log? loss_t2t , loss_tag and loss_dis

from recognize-anything.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.