yilunlee / missing_aware_prompts Goto Github PK

Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23

Home Page: https://yilunlee.github.io/missing_aware_prompts/

Python 100.00%

transformer computer-vision visual-recognition cvpr missing-modality multimodal-learning

missing_aware_prompts's Introduction

Multimodal Prompting with Missing Modalities for Visual Recognition (CVPR 2023)

Official PyTorch implementaton of CVPR 2023 paper "Multimodal Prompting with Missing Modalities for Visual Recognition".
You can visit our project website here.

Introduction

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model.

Usage

Enviroment

Prerequisites

Python = 3.7.13

Pytorch = 1.10.0

CUDA = 11.3

Other requirements

pip install -r requirements.txt

Prepare Dataset

We use three vision and language datasets: MM-IMDb, UPMC Food-101, and Hateful Memes. Please download the datasets by yourself. We use pyarrow to serialize the datasets, the conversion codes are located in vilt/utils/wirte_*.py. Please see DATA.md to organize the datasets, otherwise you may need to revise the write_*.py files to meet your dataset path and files. Run the following script to create the pyarrow binary file:

python make_arrow.py --dataset [DATASET] --root [YOUR_DATASET_ROOT]

Evaluation

python run.py with data_root=<ARROW_ROOT> \
        num_gpus=<NUM_GPUS> \
        num_nodes=<NUM_NODES> \
        per_gpu_batchsize=<BS_FITS_YOUR_GPU> \
        <task_finetune_mmimdb or task_finetune_food101 or task_finetune_hatememes> \
        load_path=<MODEL_PATH> \
        exp_name=<EXP_NAME> \
        prompt_type=<PROMPT_TYPE> \
        test_ratio=<TEST_RATIO> \
        test_type=<TEST_TYPE> \
        test_only=True

Train

Download the pre-trained ViLT model weights from here.
Start to train.

python run.py with data_root=<ARROW_ROOT> \
        num_gpus=<NUM_GPUS> \
        num_nodes=<NUM_NODES> \
        per_gpu_batchsize=<BS_FITS_YOUR_GPU> \
        <task_finetune_mmimdb or task_finetune_food101 or task_finetune_hatememes> \
        load_path=<PRETRAINED_MODEL_PATH> \
        exp_name=<EXP_NAME>

Citation

If you find this work useful for your research, please cite:

@inproceedings{lee2023cvpr,
 title = {Multimodal Prompting with Missing Modalities for Visual Recognition},
 author = {Yi-Lun Lee and Yi-Hsuan Tsai and Wei-Chen Chiu and Chen-Yu Lee},
 booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 year = {2023}
}

Acknowledgements

This code is based on ViLT.

missing_aware_prompts's People

Contributors

Stargazers

Watchers

Forkers

yam0214 cv-ip wupeng1989 xiaowensun-lab trae1oung jackieqiang hungvo304ml kuaizr towertosky

missing_aware_prompts's Issues

Testing issue

When i test with weights i trained on a datasets, e.g., MM-IMDb, i used to get different results with that it got during training. How about solving this issue?

About dataset hateful_memes

Name 'text_aug_dir' is not defined in line 26 in write_hatefulmemes.py. How to solve it?

Hateful Meme testset

Thank you for releasing the code for your interesting paper! I was trying to test on the Hateful-meme dataset. However, it seems the testset of this dataset does not contain the 'label' field, which is required for the write_*py function. Could you kindly let me know where to download the labels for the test set?

About the hatememes_dataset.py

I notice that the class Hatememes set the text_column_name to 'plots' as follows
`class HateMemesDataset(BaseDataset):
def init(self, *args, split="", missing_info={}, **kwargs):
assert split in ["train", "val", "test"]
self.split = split

    if split == "train":
        names = ["hatememes_train"]
    elif split == "val":
        names = ["hatememes_dev"]
    elif split == "test":
        names = ["hatememes_test"] 

    super().__init__(
        *args,
        **kwargs,
        names=names,
        text_column_name="plots",
        remove_duplicate=False,
    )`

However, the make_arrow in write_hatememes.py, no column named 'plots' is defined,
dataframe = pd.DataFrame( data_list, columns=[ "image", "text", "label", "split", ], )
This may cause error "KeyError: 'Field "plots" does not exist in schema'" when training. I wonder if its a mistake or my misunderstood?

a training problem

hi, thanks for your great work! I try to train the model based on the hateful dataset, but when the first epoch reaches 34%, the training process will always get stuck here. I don't know why. Have you encountered this situation?

About dataset UPMC Food-101

Thanks for your interesting work! When I download the UPMC Food-101 dataset, the link https://visiir.isir.upmc.fr/explore is invalid with 502 bad gateway. I have downloaded the images of UPMC Food-101 from other resources, can you send me the text files?

Looking forward to your reply!

inquery about dataset

Hi there,

Thank you for your contribution to this project! I noticed a couple of issues while running the code and I was hoping to get some clarification:

I encountered an error when trying to use the 'hateful_memes' dataset with the vilt.utils.write_hatememes function. Specifically, there is no 'label' key in the 'test.jsonl' file, which causes the function to raise an error. Could you please advise on how to resolve this issue?
I am also unclear about how to handle modalities with missing values in the make_arrow function. Specifically, how should we set the 'img' and 'text' fields in such cases? Should we set 'img' to None and 'text' to an empty string ('')? Any guidance on this would be much appreciated.

Could you please provide the three json files of the upmcfood-101 dataset, I can’t download them online, thanks

multi-layer prompt query

hello,I am interested in your work,but when I read the code,I have some questions,as u can see in the picture,why choose prompts[:,self.prompt_layers.index(i)] (shape:prompt_number,emb_dim) instead of prompts [self.prompt_layers.index(i),:] (shape:prompt_length,emb_dim).because I think we need to attach shape of (prompt_length,emb_dim) to input embedding.

loss does not drop

Hi, Thank you for releasing the code. I was trying to repeat the experiments. However, it seems the loss does not drop at all. Maybe I was not running it correctly. Would you please help me to check the training procedure?

I was training on the mmimdb dataset (but the same issue observed for the other 2 datasets as well).

CUDA_VISIBLE_DEVICES=0 python run.py with data_root=PATH_TO_DATA/mmimdb \
        num_gpus=1 \
        num_nodes=1 \
        per_gpu_batchsize=16 \
        task_finetune_mmimdb \
        load_path=PATH_TO_PRETRAIN/vilt_200k_mlm_itm.ckpt \
        exp_name=finetune_mmimdb \
        missing_table_root=PATH_TO_RESULT/missing_table \
        log_dir=PATH_TO_RESULT/log \
        prompt_type=input

Here is the printout in console

WARNING - root - Changed type of config entry "max_steps" from int to NoneType
WARNING - ViLT - No observers have been added to this run
INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - torch.distributed.nn.jit.instantiator - Created a temporary directory at /tmp/tmp3iqqs12z
INFO - torch.distributed.nn.jit.instantiator - Writing /tmp/tmp3iqqs12z/_remote_module_non_scriptable.py
here -- multilayer enabled=True
Learning of Prompt is enabled
256 16 1 1
/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:147: UserWarning: You passed `deterministic=True` and `benchmark=True`. Note that PyTorch ignores torch.backends.cudnn.deterministic=True when torch.backends.cudnn.benchmark=True.
  rank_zero_warn(
/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:455: UserWarning: The flag `devices=gpu` will be ignored, instead the device specific number 1 will be used
  rank_zero_warn(
Using 16bit native Automatic Mixed Precision (AMP)
/dev38/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py:52: LightningDeprecationWarning: Setting `max_steps = None` is deprecated in v1.5 and will no longer be supported in v1.7. Use `max_steps = -1` instead.
  rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:61: LightningDeprecationWarning: Setting `Trainer(flush_logs_every_n_steps=10)` is deprecated in v1.5 and will be removed in v1.7. Please configure flushing in the logger instead.
  rank_zero_deprecation(
Global seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0
INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/dev38/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(

  | Name                   | Type              | Params
-------------------------------------------------------------
0 | text_embeddings        | BertEmbeddings    | 24.2 M
1 | token_type_embeddings  | Embedding         | 1.5 K 
2 | transformer            | VisionTransformer | 87.5 M
3 | pooler                 | Pooler            | 590 K 
4 | mmimdb_classifier      | Sequential        | 1.2 M 
5 | train_mmimdb_F1_scores | F1_Score          | 0     
6 | train_mmimdb_loss      | Scalar            | 0     
7 | val_mmimdb_F1_scores   | F1_Score          | 0     
8 | val_mmimdb_loss        | Scalar            | 0     
-------------------------------------------------------------
2.0 M     Trainable params
111 M     Non-trainable params
113 M     Total params
227.583   Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]/dev38/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:486: PossibleUserWarning: Your `val_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test/predict dataloaders.
  rank_zero_warn(
Sanity Checking DataLoader 0:   0%|                                              | 0/2 [00:00<?, ?it/s]/dev38/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/dev38/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:72: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 16. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`

Is it appropriate to use only the titles of the Food 101 as text?

Thanks for your work! The UPMC-Food-101 dataset contains rich textual information, is it appropriate to only use the sample titles? I also noticed that the maximum text length is set to 512, but you know, most of the titles are short, and this doesn't seem to make sense.

integer division or modulo by zero

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python run.py with data_root=/data1/yq/004_intention/missing_aware_prompts/datasets/mmimdb num_gpus=8 num_nodes=1 per_gpu_batchsize=64 task_finetune_mmimdb load_path=/data1/yq/004_intention/missing_aware_prompts/vilt/models/vilt_200k_mlm_itm.ckpt exp_name='test1'

ERROR - ViLT - Failed after 0:00:21!
ERROR - ViLT - Failed after 0:00:17!
Traceback (most recent calls WITHOUT Sacred internals):
File "/data1/yq/004_intention/missing_aware_prompts/run.py", line 75, in main
trainer.fit(model, datamodule=dm)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
results = self.accelerator_backend.train()
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train
results = self.ddp_train(process_idx=self.task_idx, model=model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 286, in ddp_train
self.setup_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 145, in setup_optimizers
optimizers, lr_schedulers, optimizer_frequencies = self.trainer.init_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/optimizers.py", line 31, in init_optimizers
optim_conf = model.configure_optimizers()
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_missing_aware_prompt_module.py", line 366, in configure_optimizers
return vilt_utils.set_schedule(self)
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_utils.py", line 323, in set_schedule
// pl_module.trainer.accumulate_grad_batches
ZeroDivisionError: integer division or modulo by zero

Traceback (most recent calls WITHOUT Sacred internals):
File "/data1/yq/004_intention/missing_aware_prompts/run.py", line 75, in main
trainer.fit(model, datamodule=dm)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
results = self.accelerator_backend.train()
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train
results = self.ddp_train(process_idx=self.task_idx, model=model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 286, in ddp_train
self.setup_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 145, in setup_optimizers
optimizers, lr_schedulers, optimizer_frequencies = self.trainer.init_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/optimizers.py", line 31, in init_optimizers
optim_conf = model.configure_optimizers()
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_missing_aware_prompt_module.py", line 366, in configure_optimizers
return vilt_utils.set_schedule(self)
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_utils.py", line 323, in set_schedule
// pl_module.trainer.accumulate_grad_batches
ZeroDivisionError: integer division or modulo by zero

ERROR - ViLT - Failed after 0:00:28!
Traceback (most recent calls WITHOUT Sacred internals):
File "/data1/yq/004_intention/missing_aware_prompts/run.py", line 75, in main
trainer.fit(model, datamodule=dm)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
results = self.accelerator_backend.train()
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train
results = self.ddp_train(process_idx=self.task_idx, model=model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 286, in ddp_train
self.setup_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 145, in setup_optimizers
optimizers, lr_schedulers, optimizer_frequencies = self.trainer.init_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/optimizers.py", line 31, in init_optimizers
optim_conf = model.configure_optimizers()
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_missing_aware_prompt_module.py", line 366, in configure_optimizers
return vilt_utils.set_schedule(self)
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_utils.py", line 323, in set_schedule
// pl_module.trainer.accumulate_grad_batches
ZeroDivisionError: integer division or modulo by zero

ERROR - ViLT - Failed after 0:00:35!
Traceback (most recent calls WITHOUT Sacred internals):
File "run.py", line 75, in main
trainer.fit(model, datamodule=dm)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
results = self.accelerator_backend.train()
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train
results = self.ddp_train(process_idx=self.task_idx, model=model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 286, in ddp_train
self.setup_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 145, in setup_optimizers
optimizers, lr_schedulers, optimizer_frequencies = self.trainer.init_optimizers(model)
File "/home/yangqu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/optimizers.py", line 31, in init_optimizers
optim_conf = model.configure_optimizers()
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_missing_aware_prompt_module.py", line 366, in configure_optimizers
return vilt_utils.set_schedule(self)
File "/data1/yq/004_intention/missing_aware_prompts/vilt/modules/vilt_utils.py", line 323, in set_schedule

ERROR When running this code.

INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - lightning - Global seed set to 0
ERROR - ViLT - Failed after 0:01:33!
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
return self.run(
File "/root/miniconda3/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
run()
File "/root/miniconda3/lib/python3.8/site-packages/sacred/run.py", line 238, in call
self.result = self.main_function(*args)
File "/root/miniconda3/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
result = wrapped(*args, **kwargs)
File "/root/missing_aware_prompts-main/run.py", line 16, in main
dm = MTDataModule(_config, dist=True)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 49, in call
obj = type.call(cls, *args, **kwargs)
File "/root/missing_aware_prompts-main/vilt/datamodules/multitask_datamodule.py", line 19, in init
self.dm_dicts = {key: _datamoduleskey for key in datamodule_keys}
File "/root/missing_aware_prompts-main/vilt/datamodules/multitask_datamodule.py", line 19, in
self.dm_dicts = {key: _datamoduleskey for key in datamodule_keys}
KeyError: 'coco'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
print_filtered_stacktrace()
File "/root/miniconda3/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
File "/root/miniconda3/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
return "".join(filtered_traceback_format(tb_exception))
File "/root/miniconda3/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'
python-BaseException

looking for your reply,thanks!

The number of prompts

Hi, thanks for the excellent work!

I am confused by the number of prompts, i.e., M^2 -1. When M=3, why the number of prompts is 8? Suppose there are a, b, c, modalities, the prompts for only a, only b, only c, a+b, b+c, a+c, a+b+c are 7.

Looking forward to the reply, thanks again!

Training settings

Many thanks for your great work! May i ask how many GPUs are needed during training?

About experiment setting

Thx for your great work!
I have some questions about your code.

Did you set the random seed to 0 for all experiments?
I reproduced as you did (NUM_GPUS=2, BS_FITS_YOUR_GPU=2) like below

ARROW_ROOT=./datasets/mmimdb
NUM_GPUS=2
NUM_NODES=1
BS_FITS_YOUR_GPU=2
PRETRAINED_MODEL_PATH=./pretrained_weight/vilt_200k_mlm_itm.ckpt
EXP_NAME=mmimdb

python run.py with data_root=${ARROW_ROOT}
num_gpus=${NUM_GPUS}
num_nodes=${NUM_NODES}
per_gpu_batchsize=${BS_FITS_YOUR_GPU}
task_finetune_mmimdb
load_path=${PRETRAINED_MODEL_PATH}
exp_name=${EXP_NAME}

and I got 40.65 (paper: 42.66) on test set with same setting. Can I reproduce the paper's results without changing parameters like a learning rate or are there some optimized hyperparameters for each dataset?

Inquiry about Dataset

Hi Lee, your work on Missing Modality is very fascinating and attractive. I tried to understand your coding logic through debugging and apply it on the private dataset, but failed. I don't quite understand your writing about the dataset and dataloader, so I'm not sure how to integrate my data into the model. Could you provide some examples, such as how hatememes_dataset.py and hatememes_datamodule.py create the modal-missing dataset from the original dataset? Thanks sincerely for your help.

Error when running code due to missing 'label' key in 'test.jsonl'