lihong2303 / agm Goto Github PK

[ICCV2023] The repo for "Boosting Multi-modal Model Performance with Adaptive Gradient Modulation".

License: MIT License

Python 100.00%

agm's Introduction

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Here is the official Pytorch implementation of AGM proposed in "Boosting Multi-modal Model Performance with Adaptive Gradient Modulation".

Paper Title: Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Authors: Hong Li^*, Xingyu Li^*, Pengbo Hu, Yinuo Lei, Chunxiao Li, Yi Zhou

Accepted by: ICCV 2023

[arXiv] [ICCV Proceedings]

Dataset

1. AV-MNIST

This dataset can be downloaded from here.

2. CREMA-D

This dataset can be downloaded from here. Data preprocessing can refer to here.

3. UR-Funny

This raw dataset can be downloaded from here. Also, the processed data can be obtained from here.

4. AVE

This dataset can be downloaded from here.

5. CMU-MOSEI

This dataset can be downloaded from here.

Training

Environment config

Python: 3.9.13
CUDA Version: 11.3
Pytorch: 1.12.1
Torchvision: 0.13.1

Train

To train the model using the following command:

python main.py --data_root '' --device cuda:0 --methods Normal --modality Multimodal --fusion_type late_fusion --random_seed 999 --expt_dir checkpoint --expt_name test --batch_size 64 --EPOCHS 100 --learning_rate 0.0001 --dataset AV-MNIST --alpha 2.5 --SHAPE_contribution False

Citation

@inproceedings{li2023boosting,
  title={Boosting Multi-modal Model Performance with Adaptive Gradient Modulation},
  author={Li, Hong and Li, Xingyu and Hu, Pengbo and Lei, Yinuo and Li, Chunxiao and Zhou, Yi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={22214--22224},
  year={2023}
}

agm's People

Contributors

Stargazers

Watchers

Forkers

lingy12 kkontras

agm's Issues

Grad Clipping

Hi,

I noticed that you have a custom gradient clipping enabling process, based on the gradients of the weights of the perpendicular layer. Could you motivate this decision?

Furthermore, have you used that also with OGM experiments? It seems to be a keypoint in AGM according to my experiments and I am curious what its contribution is, let me know if you have some clear view.

Thanks!

Parameter settings of CREMA-D

May I ask what is the parameter setting for this dataset CREMA-D to reproduce the results in the paper? Additionally, what is the setting of a single mode that enables visual mode to reach 75.93? Looking forward to your reply.

Parameter settings of URFunny

May I ask what is the parameter alpha setting for this dataset URFunny to reproduce the results in the paper? Looking forward to your reply.

Question regarding the config files

Hi,

This is not an issue but just a question. I wanted to understand if the config to reproduce the results are there, specifically I am questioning for CREMA-D the alpha=1.0, the end_epoch=20 for the modulating and what would the grad_norm_clip=0 mean?

About the grad_norm_clip, I didnt find any usage in the files, do you still clip all your gradients despite that?

Thanks once again!

AVE dataset

Could you release as well the preprocessing steps for AVE dataset? Since you are using it in h5 format although it comes as a mp4 I assume there is a whole preprocessing pipeline.

CREMA-D: Why 90% of the Test set is used and 10% for Val?

Hi,

Thank you very much for sharing the code for the paper, it helps a lot. I noticed that in crema-d loader you have the lines:
self.item = self.test_item[:int(len(self.test_item) * 0.9)]
self.image = self.test_image[:int(len(self.test_image) * 0.9)]
self.audio = self.test_audio[:int(len(self.test_audio) * 0.9)]
self.label = self.test_label[:int(len(self.test_label) * 0.9)]

Is there any specific reason you are keeping the 90% of the test set, and you take that 10% for the validation set?

The code of calculating the strength of a modality seems incorrect.

In paper, the strength of a modality is that its score is subtracted from the scores of the other modalities, but in the code it seems to be in reverse order.

Crema-D: train-test splits

Hi,

Could you share the train/test splits for the CREMA-D?

By your unimodal numbers I can assume that you have used the OGM-GE splits. However, you will see that in those the train and test set include clips from the same actors. Could you verify that the splits do not have a leakage?

Processing of Shapley Module

It is a good idea to incorporate the Shapley value into balanced multi-modality learning. However, I couldn't locate the Shapley module in this project. Could you please help me find its location? Thank you.

Experiment Parameters

Hi,

Congratulations on the great work and thank you for open sourcing the codebase. I am trying to use this codebase to experiment with different multi-modal methods. However, I found the methods I am testing do not get comparable results to those reported in the paper. I am wandering whether this is due to my experiment parameters. Currently, I am using the default ones in the codebase if not otherwise specified in the paper. Could you please clarify the

Number of epochs,
alpha value for AGM method, and
Modulation window (start and end epoch for modulation)

used for AVMNIST, URFUNNY, and CREMA-D datasets? Many thanks in advance!

Bugs in codes

Hi,
I think your codes need to be carefully reviewed because there are many bugs and unused variables inside. Such as 'iteration = (epoch -1) * cfgs.batch_size + step + 1' in URFunny_main.py. The code style of scripts for different datasets is even different, making replication and migration very difficult and annoying.

lihong2303 / agm Goto Github PK

agm's Introduction

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Dataset

1. AV-MNIST

2. CREMA-D

3. UR-Funny

4. AVE

5. CMU-MOSEI

Training

Environment config

Train

Citation

agm's People

Contributors

Stargazers

Watchers

Forkers

agm's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs