GithubHelp home page GithubHelp logo

lihong2303 / agm Goto Github PK

View Code? Open in Web Editor NEW
23.0 1.0 2.0 150 KB

[ICCV2023] The repo for "Boosting Multi-modal Model Performance with Adaptive Gradient Modulation".

License: MIT License

Python 100.00%

agm's Introduction

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Here is the official Pytorch implementation of AGM proposed in "Boosting Multi-modal Model Performance with Adaptive Gradient Modulation".

Paper Title: Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Authors: Hong Li * , Xingyu Li * , Pengbo Hu, Yinuo Lei, Chunxiao Li, Yi Zhou

Accepted by: ICCV 2023

[arXiv] [ICCV Proceedings]

Dataset

1. AV-MNIST

This dataset can be downloaded from here.

2. CREMA-D

This dataset can be downloaded from here. Data preprocessing can refer to here.

3. UR-Funny

This raw dataset can be downloaded from here. Also, the processed data can be obtained from here.

4. AVE

This dataset can be downloaded from here.

5. CMU-MOSEI

This dataset can be downloaded from here.

Training

Environment config

  1. Python: 3.9.13
  2. CUDA Version: 11.3
  3. Pytorch: 1.12.1
  4. Torchvision: 0.13.1

Train

To train the model using the following command:

python main.py --data_root '' --device cuda:0 --methods Normal --modality Multimodal --fusion_type late_fusion --random_seed 999 --expt_dir checkpoint --expt_name test --batch_size 64 --EPOCHS 100 --learning_rate 0.0001 --dataset AV-MNIST --alpha 2.5 --SHAPE_contribution False

Citation

@inproceedings{li2023boosting,
  title={Boosting Multi-modal Model Performance with Adaptive Gradient Modulation},
  author={Li, Hong and Li, Xingyu and Hu, Pengbo and Lei, Yinuo and Li, Chunxiao and Zhou, Yi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={22214--22224},
  year={2023}
}

agm's People

Contributors

lihong2303 avatar

Stargazers

Ray avatar Yanglin Feng avatar  avatar Wu Yuxian avatar Qian Wu avatar  avatar  avatar JuneHyoung Kwon avatar  avatar  avatar Hai-Tao Wu avatar Sun-Kyung Lee avatar Rainylt avatar  avatar Xingyu Li avatar pbhu1024 avatar Chenxin Li avatar Jinfa Huang avatar Wangzeda_44 avatar  avatar Xiaotao avatar timemeansalot avatar  avatar

Watchers

 avatar

Forkers

lingy12 kkontras

agm's Issues

Grad Clipping

Hi,

I noticed that you have a custom gradient clipping enabling process, based on the gradients of the weights of the perpendicular layer. Could you motivate this decision?

Furthermore, have you used that also with OGM experiments? It seems to be a keypoint in AGM according to my experiments and I am curious what its contribution is, let me know if you have some clear view.

Thanks!

Parameter settings of CREMA-D

May I ask what is the parameter setting for this dataset CREMA-D to reproduce the results in the paper? Additionally, what is the setting of a single mode that enables visual mode to reach 75.93? Looking forward to your reply.

Parameter settings of URFunny

May I ask what is the parameter alpha setting for this dataset URFunny to reproduce the results in the paper? Looking forward to your reply.

Question regarding the config files

Hi,

This is not an issue but just a question. I wanted to understand if the config to reproduce the results are there, specifically I am questioning for CREMA-D the alpha=1.0, the end_epoch=20 for the modulating and what would the grad_norm_clip=0 mean?

About the grad_norm_clip, I didnt find any usage in the files, do you still clip all your gradients despite that?

Thanks once again!

AVE dataset

Could you release as well the preprocessing steps for AVE dataset? Since you are using it in h5 format although it comes as a mp4 I assume there is a whole preprocessing pipeline.

CREMA-D: Why 90% of the Test set is used and 10% for Val?

Hi,

Thank you very much for sharing the code for the paper, it helps a lot. I noticed that in crema-d loader you have the lines:
self.item = self.test_item[:int(len(self.test_item) * 0.9)]
self.image = self.test_image[:int(len(self.test_image) * 0.9)]
self.audio = self.test_audio[:int(len(self.test_audio) * 0.9)]
self.label = self.test_label[:int(len(self.test_label) * 0.9)]

Is there any specific reason you are keeping the 90% of the test set, and you take that 10% for the validation set?

Crema-D: train-test splits

Hi,

Could you share the train/test splits for the CREMA-D?

By your unimodal numbers I can assume that you have used the OGM-GE splits. However, you will see that in those the train and test set include clips from the same actors. Could you verify that the splits do not have a leakage?

Processing of Shapley Module

It is a good idea to incorporate the Shapley value into balanced multi-modality learning. However, I couldn't locate the Shapley module in this project. Could you please help me find its location? Thank you.

Experiment Parameters

Hi,

Congratulations on the great work and thank you for open sourcing the codebase. I am trying to use this codebase to experiment with different multi-modal methods. However, I found the methods I am testing do not get comparable results to those reported in the paper. I am wandering whether this is due to my experiment parameters. Currently, I am using the default ones in the codebase if not otherwise specified in the paper. Could you please clarify the

  1. Number of epochs,
  2. alpha value for AGM method, and
  3. Modulation window (start and end epoch for modulation)

used for AVMNIST, URFUNNY, and CREMA-D datasets? Many thanks in advance!

Bugs in codes

Hi,
I think your codes need to be carefully reviewed because there are many bugs and unused variables inside. Such as 'iteration = (epoch -1) * cfgs.batch_size + step + 1' in URFunny_main.py. The code style of scripts for different datasets is even different, making replication and migration very difficult and annoying.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.