GithubHelp home page GithubHelp logo

facebookresearch / mmf Goto Github PK

View Code? Open in Web Editor NEW
5.4K 115.0 924.0 17.49 MB

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Home Page: https://mmf.sh/

License: Other

Python 98.89% Shell 0.11% C 0.20% JavaScript 0.55% CSS 0.16% MDX 0.10%
pytorch vqa pretrained-models multimodal deep-learning captioning dialog textvqa hateful-memes multi-tasking

mmf's Introduction


MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-the-art vision and language models and has powered multiple research projects at Facebook AI Research. See full list of project inside or built on MMF here.

MMF is powered by PyTorch, allows distributed training and is un-opinionated, scalable and fast. Use MMF to bootstrap for your next vision and language multimodal research project by following the installation instructions. Take a look at list of MMF features here.

MMF also acts as starter codebase for challenges around vision and language datasets (The Hateful Memes, TextVQA, TextCaps and VQA challenges). MMF was formerly known as Pythia. The next video shows an overview of how datasets and models work inside MMF. Checkout MMF's video overview.

Installation

Follow installation instructions in the documentation.

Documentation

Learn more about MMF here.

Citation

If you use MMF in your work or use any models published in MMF, please cite:

@misc{singh2020mmf,
  author =       {Singh, Amanpreet and Goswami, Vedanuj and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and
                 Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},
  title =        {MMF: A multimodal framework for vision and language research},
  howpublished = {\url{https://github.com/facebookresearch/mmf}},
  year =         {2020}
}

License

MMF is licensed under BSD license available in LICENSE file

mmf's People

Contributors

amyreese avatar ananthsub avatar ankitade avatar antonk52 avatar apsdehal avatar dependabot[bot] avatar deviparikh avatar ebsmothers avatar endernewton avatar four4fish avatar huaizhengzhang avatar jjenniferdai avatar jknoxville avatar lichengunc avatar meetps avatar ninginthecloud avatar pushkalkatara avatar rizavelioglu avatar ronghanghu avatar ryan-qiyu-jiang avatar shirgur avatar shubhamagarwal92 avatar simran2905 avatar stanislavglebik avatar stmugisha avatar suzyahyah avatar tsungyu avatar ultrons avatar vedanuj avatar yujiang01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmf's Issues

some errors about loading pretrained model

Hi, when I loaded the detectron_100_resnet_most_data model, some problems happened:

While copying the parameter named "module.image_embedding_models_list.0.0.image_attention_model.modal_combine.Fa_image.main.0.weight_g", whose dimensions in the model are torch.Size([]) and whose dimensions in the checkpoint are torch.Size([1]). While copying the parameter named "module.image_embedding_models_list.0.0.image_attention_model.modal_combine.Fa_txt.main.0.weight_g", whose dimensions in the model are torch.Size([]) and whose dimensions in the checkpoint are torch.Size([1]).

I think might weight_norm caused it, but I don't know the reason. Can you give me some advices.

Training Lorra on VQA2

I'm trying to train Lorra model for VQA2 dataset but I'm getting the following error
ValueError: /raid/saransh/pythia/pythia/.vector_cache/wiki.en.bin cannot be opened for loading!

What is the average accuracy?

> avg_accuracy += (1 - accuracy_decay) * (accuracy - avg_accuracy)

I tired to search about "average accuracy" but I didn't find anything useful. I didn't find that in the paper also
Can anyone please tell me what this line is? and what is the "average accuracy"? Does it have multiple names in the literature because I did not find anything similar to it before?
I am not a deep learning expert so maybe I need to learn those 😅 .

Getting Object Labels

Is there a way for getting the object labels for bounding boxes, given by the (fine-tuned) Detectron model?

docker demo doesnt run

Ran the instructions at https://github.com/facebookresearch/pythia#docker-demo :

git clone https://github.com/facebookresearch/pythia.git
nvidia-docker build pythia -t pythia:latest
nvidia-docker run -ti --net=host pythia:latest

then loaded localhost:8888 in my web-browser. This showed a file listing containing 'vqa_demo' and 'vqa_standalone_image_demo'. Tried opening both of these, and doing 'restart and run all'. Both gave errors, though different errors:

vqa_demo:

FileNotFoundError: [Errno 2] No such file or directory: 'data/imdb/imdb_test2015.npy'

vqa_standalone_image_demo:

FileNotFoundError: [Errno 2] No such file or directory: '/private/home/nvivek/VQA/pythia/vqa_detectron_master/config.yaml'

m5sum checksum target values in wrong order?

Hi, I downloaded yesterday the coco.tar.gz features file (240GB) and when I made the md5sum checksum I got ab7947b04f3063c774b87dfbf4d0e981 instead of the target b22e80997b2580edaf08d7e3a896e324 value. The funny thing is that ab7947b04f3063c774b87dfbf4d0e981 is the target value said to belong to the OpenImages features file, so I believe the target values where misplaced by accident. Is that correct? Thanks.

Out of memory issue

Hello there,

I have a cuda runtime error after the end of epoch 3000 by running the following code:

$ python train.py --config config/keep/detectron_100_resnet_most_data.yaml

Also I am running pytorch 0.3.1 on a single V100 (32G memory)

>>> torch.__version__
'0.3.1

By any chance, did someone encounter this error?

Thanks for the help

Traceback:

i_epoch: 1 i_iter: 2000 val_loss:3.4700 val_acc:0.6148 runtime: 67.63 min
iter: 2100 train_loss: 2.8821  train_score: 0.6225  avg_train_score: 0.6105 val_score: 0.6008 val_loss: 3.4801 time(s): 561.6 s
iter: 2200 train_loss: 2.6174  train_score: 0.6195  avg_train_score: 0.6135 val_score: 0.6803 val_loss: 3.1912 time(s): 218.7 s
iter: 2300 train_loss: 2.7957  train_score: 0.6426  avg_train_score: 0.6190 val_score: 0.6205 val_loss: 3.3420 time(s): 412.5 s
iter: 2400 train_loss: 2.4924  train_score: 0.6453  avg_train_score: 0.6207 val_score: 0.6117 val_loss: 3.4666 time(s): 192.8 s
iter: 2500 train_loss: 2.7591  train_score: 0.6234  avg_train_score: 0.6243 val_score: 0.6293 val_loss: 3.4114 time(s): 190.6 s
iter: 2600 train_loss: 2.9420  train_score: 0.5928  avg_train_score: 0.6237 val_score: 0.6400 val_loss: 3.2718 time(s): 185.9 s
iter: 2700 train_loss: 2.6800  train_score: 0.6441  avg_train_score: 0.6247 val_score: 0.6590 val_loss: 3.0637 time(s): 176.4 s
iter: 2800 train_loss: 2.7028  train_score: 0.6506  avg_train_score: 0.6303 val_score: 0.6828 val_loss: 3.1584 time(s): 189.8 s
iter: 2900 train_loss: 2.6380  train_score: 0.6432  avg_train_score: 0.6326 val_score: 0.6340 val_loss: 3.3097 time(s): 183.0 s
iter: 3000 train_loss: 2.7275  train_score: 0.6227  avg_train_score: 0.6311 val_score: 0.6725 val_loss: 3.1253 time(s): 187.7 s
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train.py", line 230, in <module>
    scheduler=scheduler,best_val_accuracy=best_accuracy)
  File "/home/rcadene/pythia/train_model/Engineer.py", line 159, in one_stage_train
    data_reader_eval)
  File "/home/rcadene/pythia/train_model/Engineer.py", line 87, in save_a_snapshot
    loss_criterion=loss_criterion)
  File "/home/rcadene/pythia/train_model/Engineer.py", line 204, in one_stage_eval_model
    score, loss, n_sample = compute_a_batch(batch, myModel, eval_mode=True, loss_criterion=loss_criterion)
  File "/home/rcadene/pythia/train_model/Engineer.py", line 191, in compute_a_batch
    logit_res = one_stage_run_model(batch, my_model, add_graph, log_dir, eval_mode)
  File "/home/rcadene/pythia/train_model/Engineer.py", line 249, in one_stage_run_model
    image_feat_variables=image_feat_variables)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rcadene/pythia/top_down_bottom_up/top_down_bottom_up_model.py", line 103, in forward
    question_embedding_total, image_dim_variable_use)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rcadene/pythia/top_down_bottom_up/image_embedding.py", line 39, in forward
    image_feat_variable, question_embedding, image_dims)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rcadene/pythia/top_down_bottom_up/image_attention.py", line 140, in forward
    joint_feature = self.modal_combine(image_feat, question_embedding)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rcadene/pythia/top_down_bottom_up/multi_modal_combine.py", line 142, in forward
    joint_feature = self.dropout(joint_feature)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/modules/dropout.py", line 46, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/functional.py", line 526, in dropout
    return _functions.dropout.Dropout.apply(input, p, training, inplace)
  File "/home/rcadene/.conda/envs/pythia/lib/python3.6/site-packages/torch/nn/_functions/dropout.py", line 32, in forward
    output = input.clone()
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

Some questions about the demo

Hi , I am trying to run the demo, but when I load the pretrained model ,something is wrong,just like this:

RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/THCTensorCopy.cu:31
During handling of the above exception, another exception occurred:
RuntimeError: While copying the parameter named question_embedding_models.0.embedding.weight, whose dimensions in the model are torch.Size([25541, 300]) and whose dimensions in the checkpoint are torch.Size([17871, 300]).

If you have time ,I wish you can help me ! Thank you very much!

Colab demo fails

The Colab demo fails. I believe the problem is that the CUDA version install now on Colab is not what this version of PyTorch uses. I noticed that in general the Colab Python package versions are ahead of what this demo uses.

If you want to have a permanently working Colab demo, I suspect that you need to nuke & pave the standard Colab runtime. That is, remove everything installed by PIP, the CUDA runtime, whatever else you can think of, and install from the repos. For example, the Python packages need:
!pip freeze > /tmp/all_packages.txt
!pip uninstall -r /tmp/all_packages.txt

Also, the demo demands GPU and will not work in CPU-only mode. I have not tried the TPU runtime, but suspect it will also not work.

Stack trace:

/content/pythia/pythia/.vector_cache/glove.6B.zip: 862MB [01:03, 13.5MB/s]
100%|█████████▉| 399163/400000 [00:50<00:00, 7829.19it/s]

RuntimeError Traceback (most recent call last)
in ()
----> 1 demo = PythiaDemo()

8 frames
in init(self)
40 def init(self):
41 self._init_processors()
---> 42 self.pythia_model = self._build_pythia_model()
43 self.detection_model = self._build_detection_model()
44 self.resnet_model = self._build_resnet_model()

in _build_pythia_model(self)
82
83 model.load_state_dict(state_dict)
---> 84 model.to("cuda")
85 model.eval()
86

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
379 return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
380
--> 381 return self._apply(convert)
382
383 def register_backward_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
185 def _apply(self, fn):
186 for module in self.children():
--> 187 module._apply(fn)
188
189 for param in self._parameters.values():

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
185 def _apply(self, fn):
186 for module in self.children():
--> 187 module._apply(fn)
188
189 for param in self._parameters.values():

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
185 def _apply(self, fn):
186 for module in self.children():
--> 187 module._apply(fn)
188
189 for param in self._parameters.values():

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
185 def _apply(self, fn):
186 for module in self.children():
--> 187 module._apply(fn)
188
189 for param in self._parameters.values():

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)
115 def _apply(self, fn):
116 ret = super(RNNBase, self)._apply(fn)
--> 117 self.flatten_parameters()
118 return ret
119

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in flatten_parameters(self)
111 all_weights, (4 if self.bias else 2),
112 self.input_size, rnn.get_cudnn_mode(self.mode), self.hidden_size, self.num_layers,
--> 113 self.batch_first, bool(self.bidirectional))
114
115 def _apply(self, fn):

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

The training time increases for every iteration especially afrer 1000 iterations

Hi guys,
I don't know why the training time increases for every iteration especially after 1000 iterations like that. Is that a bug?

BEGIN TRAINING...
iter: 100 train_loss: 3.6883  train_score: 0.2910  avg_train_score: 0.1358 val_score: 0.2867 val_loss: 3.3840 time(s): 307.7 s
iter: 200 train_loss: 3.2849  train_score: 0.3172  avg_train_score: 0.2345 val_score: 0.2809 val_loss: 3.0615 time(s): 268.5 s
iter: 300 train_loss: 2.7410  train_score: 0.3990  avg_train_score: 0.2964 val_score: 0.3617 val_loss: 2.9113 time(s): 282.1 s
iter: 400 train_loss: 2.4218  train_score: 0.4266  avg_train_score: 0.3522 val_score: 0.3578 val_loss: 2.6786 time(s): 299.6 s
iter: 500 train_loss: 2.1596  train_score: 0.4867  avg_train_score: 0.3994 val_score: 0.3617 val_loss: 2.5952 time(s): 285.3 s
iter: 600 train_loss: 2.1245  train_score: 0.4711  avg_train_score: 0.4355 val_score: 0.3736 val_loss: 2.5149 time(s): 329.9 s
iter: 700 train_loss: 2.0011  train_score: 0.4941  avg_train_score: 0.4629 val_score: 0.3406 val_loss: 2.7058 time(s): 356.2 s
iter: 800 train_loss: 1.9192  train_score: 0.5035  avg_train_score: 0.4865 val_score: 0.3521 val_loss: 2.5427 time(s): 324.3 s
iter: 900 train_loss: 1.8463  train_score: 0.5258  avg_train_score: 0.5033 val_score: 0.3668 val_loss: 2.4809 time(s): 335.9 s
iter: 1000 train_loss: 1.7650  train_score: 0.5379  avg_train_score: 0.5152 val_score: 0.3984 val_loss: 2.5709 time(s): 351.2 s
i_epoch: 1 i_iter: 1000 val_loss:2.5339 val_acc:0.3811 runtime: 58.30 min
iter: 1100 train_loss: 1.6619  train_score: 0.5543  avg_train_score: 0.5284 val_score: 0.3645 val_loss: 2.7063 time(s): 761.0 s
iter: 1200 train_loss: 1.6474  train_score: 0.5939  avg_train_score: 0.5371 val_score: 0.3746 val_loss: 2.4699 time(s): 412.1 s
iter: 1300 train_loss: 1.7397  train_score: 0.5527  avg_train_score: 0.5425 val_score: 0.3492 val_loss: 2.4378 time(s): 464.8 s
iter: 1400 train_loss: 1.6970  train_score: 0.5656  avg_train_score: 0.5492 val_score: 0.3955 val_loss: 2.4459 time(s): 506.1 s
iter: 1500 train_loss: 1.6235  train_score: 0.5543  avg_train_score: 0.5565 val_score: 0.3902 val_loss: 2.3059 time(s): 518.4 s
iter: 1600 train_loss: 1.6067  train_score: 0.5924  avg_train_score: 0.5603 val_score: 0.3059 val_loss: 2.6804 time(s): 515.8 s
iter: 1700 train_loss: 1.5721  train_score: 0.5746  avg_train_score: 0.5633 val_score: 0.3906 val_loss: 2.5084 time(s): 555.8 s
iter: 1800 train_loss: 1.4407  train_score: 0.5904  avg_train_score: 0.5684 val_score: 0.4020 val_loss: 2.3391 time(s): 577.6 s
iter: 1900 train_loss: 1.8080  train_score: 0.5508  avg_train_score: 0.5692 val_score: 0.4139 val_loss: 2.3122 time(s): 630.0 s
iter: 2000 train_loss: 1.5992  train_score: 0.5533  avg_train_score: 0.5716 val_score: 0.3705 val_loss: 2.6689 time(s): 1204.8 s
i_epoch: 1 i_iter: 2000 val_loss:2.4600 val_acc:0.3718 runtime: 113.33 min
iter: 2100 train_loss: 1.5500  train_score: 0.5908  avg_train_score: 0.5771 val_score: 0.3785 val_loss: 2.5471 time(s): 2415.6 s
iter: 2200 train_loss: 1.6981  train_score: 0.5525  avg_train_score: 0.5822 val_score: 0.3852 val_loss: 2.5870 time(s): 1327.5 s
iter: 2300 train_loss: 1.4888  train_score: 0.5959  avg_train_score: 0.5826 val_score: 0.4281 val_loss: 2.3235 time(s): 1194.0 s
iter: 2400 train_loss: 1.5351  train_score: 0.6010  avg_train_score: 0.5838 val_score: 0.4047 val_loss: 2.3183 time(s): 1595.9 s
iter: 2500 train_loss: 1.5369  train_score: 0.5912  avg_train_score: 0.5889 val_score: 0.3975 val_loss: 2.3280 time(s): 2286.7 s
iter: 2600 train_loss: 1.5912  train_score: 0.5668  avg_train_score: 0.5916 val_score: 0.4189 val_loss: 2.2325 time(s): 3049.3 s
iter: 2700 train_loss: 1.5094  train_score: 0.5900  avg_train_score: 0.5932 val_score: 0.3729 val_loss: 2.3636 time(s): 2395.5 s

Are the different kinds of attention in "image_attention.py" redundant?

Hi 😅
In image_attention.py There are three classes

concatenate_attention
project_attention
double_project_attention

But they are not being used at all and there is no place for them to be called in the code from other file or function.(I think we don't need them because we use the MFH model)
We only uses top_down_attention class in the build_image_attention_module function.
My questions are, Are they redundant? and If we wanted to use the ordinary concatenate_attention or project_attention should I modify the build_image_attention_module function to

return concatenate_attention(image_feat_dim, txt_rnn_embeding_dim, hidden_size)

?

fc7_w.pkl not found

I am trying to run the vqa_standalone_image_demo. I have followed the steps to preprocess the data. But I there are some files apparently necessary, but there is no mention how to get them.
fc7_w.pkl and fc7_b.pkl are the one i am stuck with.

data_preprocess

In the step Preprocess dataset,the first command is cd ../../VQA_suite.
But there is no such directory,I create the directory and run following commands.
First, I run command python data_prep/vqa_v2.0/extract_vocabulary.py --input_files ../orig_data/vqa_v2.0/v2_OpenEnded_mscoco_train2014_questions.json ../orig_data/vqa_v2.0/v2_OpenEnded_mscoco_val2014_questions.json ../orig_data/vqa_v2.0/v2_OpenEnded_mscoco_test2015_questions.json --out_dir data/
directly, and the result is "python: can't open file 'data_prep/vqa_v2.0/extract_vocabulary.py': [Errno 2] No such file or directory"

So I change the cmd and run python ../data_prep/vqa_v2.0/extract_vocabulary.py --input_files ../orig_data/vqa_v2.0/v2_OpenEnded_mscoco_train2014_questions.json ../orig_data/vqa_v2.0/v2_OpenEnded_mscoco_val2014_questions.json ../orig_data/vqa_v2.0/v2_OpenEnded_mscoco_test2015_questions.json --out_dir data/
and result is "Traceback (most recent call last):
File "../data_prep/vqa_v2.0/extract_vocabulary.py", line 13, in
from dataset_utils.text_processing import tokenize
ImportError: No module named dataset_utils.text_processing"

I don't know where is my problem, and need your help. Thanks ahead!

Inference on K40m GPU

I tried running inference for pre-trained pythia mdoel on K40m. It didn't start for quite some time and then the ETA was oscillating around 10-15 hours.
So I enabled multi-GPU training using dataparallel flag. But now its not starting, I've waited for around 30 minutes before stopping it. I got the following errors on stopping:
"
File "/mnt/data_g/saransh/anaconda3/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
".
I've tried re-running it but the problem persists. Earlier I tried it on a V100, it was working fine on that.
Could you suggest something?

Question on TextVQA - Pythia vs LoRRA

Hey

I have few questions on the ablation studies conducted for the LoRRA model.

In Table 2 in the TextVQA paper , what is the difference between Pythia + O + C and Pythia + LoRRA. Is it that the second one also gets to choose its answers from a fixed lexicon ( either SA or LA)? Is it the only difference between the two?

While reading section 3 of the paper I get a feel that LoRRA has a VQA part ( image + question) and a reading part ( OCR tokens + question) and an answering module. This suggests that by LoRRA you mean the complete system. But in experiments Pythia + LoRRA is used to denote the best performing model. This nomenclature is bit confusing. Does it mean that to an existing Pythia style model you again add LoRRA which on its own has a vqa + reading + answering modules ?

Broken urls

I tried on two of my computers, but the links after wget in "README/Quick Start" seem to be broken. Please check.

Also mkdir data seems unnecessary in "README/Quick Start".

Also in README, https://www.continuum.io/downloads seems to be an obsolete link to Anaconda.

size of rcnn_10_100.tar.gz

I download rcnn_10_100.tar.gz twice, and find that it's size is about 33.8 GB.
But it should be 71.0 GB according to AWS s3 dataset summary.

Using detectron features

Hi,

I am trying to run the model and i am unable to download the detectron or the detectron_fix_100 (gunzip detectron_fix_100.tar.gz outputs gzip: detectron_fix_100.tar.gz: Input/output error), is there different link to the detectron features?

Thanks!

Why do we we need to split config['data']['image_feat_train'][0]?

In train_model/helper.py

> num_image_feat = len(config['data']['image_feat_train'][0].split(','))

although in config there is
__C.data.image_feat_train = ["rcnn_10_100/vqa/train2014", "rcnn_10_100/vqa/val2014"]
So config['data']['image_feat_train'][0] is equal "rcnn_10_100/vqa/train2014".
Why do we need to split that string using .split(',')?

Also, what does this if-condition mean?

> if hasattr(my_model, 'module'):
>    model = my_model.module

This if-condition is false(I mean hasattr(my_model, 'module') is false. But I don't know what those two lines mean.
Thank you so much for answering my last two questions by the way 😊 your code and your paper are great guides for me ☺️

What happen when you have duplicate class name to register?

Let's say you register following classes:

@registry.register_model("lorra")
class LoRRA(Pythia):
     ...


@registry.register_model("lorra")
class LoRRA_mod(Pythia):
     ...

i assume the class LORRA_mod will override class LoRRA. is my assumption correct?

Pretrained model performance?

Hi. Thanks for the resourceful repository.

I was wondering if you could share the pre-trained model performances for the validation and the test sets for the various datasets.

Did you experience increase in time while training?

In the code we print the time in save_a_report and save_a_snapshot functions. I find out the time increases while training (starts small and keep increasing with more iterations). and sometimes it increases dramatically.
Example in save_a_snapshot:
from [iteration 6000]

i_epoch: 2 i_iter: 6000 val_loss:2.4745 val_acc:0.3974 runtime: 33.43 min

to [iteration 7000]

i_epoch: 3 i_iter: 7000 val_loss:2.4929 val_acc:0.3963 runtime: 260.19 min

Example in save_a_report:
from [iteration 9200]

iter: 9200 train_loss: 1.1883  train_score: 0.6186  avg_train_score: 0.6580 val_score: 0.3975 val_loss: 2.6476 time(s): 195.1 s

to [iterations 9300, 9400]

iter: 9300 train_loss: 1.1513  train_score: 0.6588  avg_train_score: 0.6578 val_score: 0.3871 val_loss: 2.5129 time(s): 1371.4 s
iter: 9400 train_loss: 1.1269  train_score: 0.6826  avg_train_score: 0.6573 val_score: 0.4008 val_loss: 2.6888 time(s): 205.5 s

So it increases continuously in general(by small steps) while training and sometimes it increases dramatically(by big steps)
Another example:
from [the first thousand iterations]

BEGIN TRAINING...
iter: 100 train_loss: 3.6458  train_score: 0.3031  avg_train_score: 0.1377 val_score: 0.3258 val_loss: 3.4013 time(s): 199.5 s
iter: 200 train_loss: 3.1655  train_score: 0.3158  avg_train_score: 0.2387 val_score: 0.3410 val_loss: 2.9842 time(s): 228.1 s
iter: 300 train_loss: 2.6502  train_score: 0.3777  avg_train_score: 0.3034 val_score: 0.3326 val_loss: 2.8516 time(s): 192.2 s
iter: 400 train_loss: 2.3548  train_score: 0.4258  avg_train_score: 0.3544 val_score: 0.3467 val_loss: 2.5927 time(s): 193.4 s
iter: 500 train_loss: 2.1484  train_score: 0.4705  avg_train_score: 0.4003 val_score: 0.3934 val_loss: 2.5520 time(s): 215.1 s
iter: 600 train_loss: 2.1211  train_score: 0.4840  avg_train_score: 0.4367 val_score: 0.3977 val_loss: 2.4975 time(s): 183.0 s
iter: 700 train_loss: 2.0060  train_score: 0.4648  avg_train_score: 0.4661 val_score: 0.3475 val_loss: 2.6645 time(s): 182.8 s
iter: 800 train_loss: 1.8998  train_score: 0.5230  avg_train_score: 0.4891 val_score: 0.3543 val_loss: 2.5015 time(s): 187.2 s
iter: 900 train_loss: 1.8344  train_score: 0.5258  avg_train_score: 0.5037 val_score: 0.3783 val_loss: 2.4491 time(s): 185.4 s
iter: 1000 train_loss: 1.7774  train_score: 0.5184  avg_train_score: 0.5165 val_score: 0.3938 val_loss: 2.5243 time(s): 183.8 s
i_epoch: 1 i_iter: 1000 val_loss:2.4742 val_acc:0.3838 runtime: 34.87 min

to [the thirteenth thousand iterations]

i_epoch: 5 i_iter: 13000 val_loss:2.7267 val_acc:0.3917 runtime: 54.67 min
iter: 13100 train_loss: 1.0795  train_score: 0.6867  avg_train_score: 0.6843 val_score: 0.3723 val_loss: 2.8208 time(s): 1550.9 s
iter: 13200 train_loss: 1.1232  train_score: 0.6627  avg_train_score: 0.6836 val_score: 0.4021 val_loss: 2.8624 time(s): 196.6 s
iter: 13300 train_loss: 1.0556  train_score: 0.6756  avg_train_score: 0.6826 val_score: 0.4186 val_loss: 2.5904 time(s): 210.9 s
iter: 13400 train_loss: 1.0774  train_score: 0.6979  avg_train_score: 0.6825 val_score: 0.4125 val_loss: 2.5742 time(s): 207.8 s
iter: 13500 train_loss: 1.0958  train_score: 0.6840  avg_train_score: 0.6843 val_score: 0.4084 val_loss: 2.5981 time(s): 201.2 s
iter: 13600 train_loss: 1.0693  train_score: 0.6816  avg_train_score: 0.6870 val_score: 0.4365 val_loss: 2.5409 time(s): 202.8 s
iter: 13700 train_loss: 1.1302  train_score: 0.6598  avg_train_score: 0.6871 val_score: 0.3939 val_loss: 2.7158 time(s): 197.0 s
iter: 13800 train_loss: 1.0662  train_score: 0.6736  avg_train_score: 0.6859 val_score: 0.3746 val_loss: 2.7563 time(s): 197.9 s
iter: 13900 train_loss: 1.0325  train_score: 0.6984  avg_train_score: 0.6857 val_score: 0.3762 val_loss: 2.9416 time(s): 214.2 s
iter: 14000 train_loss: 0.9614  train_score: 0.7232  avg_train_score: 0.6857 val_score: 0.3832 val_loss: 2.6673 time(s): 270.2 s
i_epoch: 5 i_iter: 14000 val_loss:2.6989 val_acc:0.3935 runtime: 60.91 min

I tried pytorch 4.0 and pytorch 1.0.
PS: I am training with datasets [imdb_train2014.npy, imdb_val2train2014.npy, imdb_genome.npy, imdb_vdtrain.npy] but I don't think that this will make any difference.

LR hyperparameters tuning method

Hi,

Thanks again for your code. Unfortunately, I ran into a little issue. I can't reproduce some of your results because I am obliged to reduce my batch size (from 512 (yours) to 75). Thus I need to change the hyperparameters related to the learning rate.

Finding the right learning rate can be easily done by a little grid search. However, I would like to know how did you tune the hyperparameters related to the scheduler.
Especially:

  • __C.training_parameters.wu_factor = 0.2
  • __C.training_parameters.wu_iters = 1000
  • __C.training_parameters.lr_steps = [5000, 7000, 9000, 11000]
  • __C.training_parameters.lr_ratio = 0.1

Sharing your method would be awesome :)

Thanks for your help!
Remi

404 File not found when downloading VQA2.0

Hi, guys!
I was wondering why Annotations and Questions in VQA2.0 were unaccessible to me. I got
ERROR 404: Not Found.
when running file download_vqa_2.0.sh.
Any advice? Thanks!

Adding mirror data make the training worse

data:
batch_size: 512
data_root_dir: data
dataset: vqa_2.0
image_depth_first: false
image_fast_reader: false
image_feat_test:

  • detectron_fix_100/fc6/vqa/val2014/
    image_feat_train:
  • detectron_fix_100/fc6/vqa/train2014/
  • detectron_fix_100/fc6/vqa_mirror/train2014
    image_feat_val:
  • detectron_fix_100/fc6/vqa/val2014/
    image_max_loc: 100
    imdb_file_test:
  • imdb/imdb_minival2014.npy
    imdb_file_train:
  • imdb/imdb_train2014.npy
  • imdb/imdb_mirror_train2014.npy
    imdb_file_val:
  • imdb/imdb_val2014.npy
    num_workers: 20
    question_max_len: 14
    vocab_answer_file: answers_vqa.txt
    vocab_question_file: vocabulary_vqa.txt
    exp_name: intrainter_baseline

Hi, when I add the mirror data of training into training. The performance drop a lot for validation.
Do you have any suggestion?

Broken link in README

The link for BAN in the README is broken

Model Zoo: Reference implementations for state-of-the-art vision and language model including LoRRA (SoTA on VQA and TextVQA), Pythia model (VQA 2018 challenge winner) and BAN.

Does GRU be used in question embedding?

In the paper, you mentioned about "we used 300D GloVe [11] vectors to initialize the word embeddings and then passed it to a GRU network and a question attention module to extract attentive text features"

However, in question_embeding.py, there are two methods for question embedding. As I noticed in the config, you're using att_que_embed, which does not pass through a GRU layer.

def build_question_encoding_module(method, par, num_vocab):
    if method == "default_que_embed":
        return QuestionEmbeding(num_vocab, **par)
    elif method == "att_que_embed":
        return AttQuestionEmbedding(num_vocab, **par)
    else:
        raise NotImplementedError(
            "unknown question encoding model %s" % method)

class QuestionEmbeding(nn.Module):
    def __init__(self, **kwargs):
        super(QuestionEmbeding, self).__init__()
        self.text_out_dim = kwargs['LSTM_hidden_size']
        self.num_vocab = kwargs['num_vocab']
        self.embedding_dim = kwargs['embedding_dim']
        self.embedding = nn.Embedding(
            kwargs['num_vocab'], kwargs['embedding_dim'])
        self.gru = nn.GRU(
            input_size=kwargs['embedding_dim'],
            hidden_size=kwargs['LSTM_hidden_size'],
            num_layers=kwargs['lstm_layer'],
            dropout=kwargs['lstm_dropout'],
            batch_first=True)
        self.batch_first = True

        if 'embedding_init' in kwargs and kwargs['embedding_init'] is not None:
            self.embedding.weight.data.copy_(
                torch.from_numpy(kwargs['embedding_init']))

    def forward(self, input_text):
        embeded_txt = self.embedding(input_text)
        out, hidden_state = self.gru(embeded_txt)
        res = out[:, -1]
        return res


class AttQuestionEmbedding(nn.Module):
    def __init__(self, num_vocab, **kwargs):
        super(AttQuestionEmbedding, self).__init__()
        self.embedding = nn.Embedding(num_vocab, kwargs['embedding_dim'])
        self.LSTM = nn.LSTM(input_size=kwargs['embedding_dim'],
                            hidden_size=kwargs['LSTM_hidden_size'],
                            num_layers=kwargs['LSTM_layer'],
                            batch_first=True)
        self.Dropout = nn.Dropout(p=kwargs['dropout'])
        self.conv1 = nn.Conv1d(
            in_channels=kwargs['LSTM_hidden_size'],
            out_channels=kwargs['conv1_out'],
            kernel_size=kwargs['kernel_size'],
            padding=kwargs['padding'])
        self.conv2 = nn.Conv1d(
            in_channels=kwargs['conv1_out'],
            out_channels=kwargs['conv2_out'],
            kernel_size=kwargs['kernel_size'],
            padding=kwargs['padding'])
        self.text_out_dim = kwargs['LSTM_hidden_size'] * kwargs['conv2_out']

        if 'embedding_init_file' in kwargs \
                and kwargs['embedding_init_file'] is not None:
            if os.path.isabs(kwargs['embedding_init_file']):
                embedding_file = kwargs['embedding_init_file']
            else:
                embedding_file = os.path.join(
                    cfg.data.data_root_dir, kwargs['embedding_init_file'])
            embedding_init = np.load(embedding_file)
            self.embedding.weight.data.copy_(torch.from_numpy(embedding_init))

    def forward(self, input_text):
        batch_size, _ = input_text.data.shape
        embed_txt = self.embedding(input_text)          # N * T * embedding_dim

        # self.LSTM.flatten_parameters()
        lstm_out, _ = self.LSTM(embed_txt)  # N * T * LSTM_hidden_size
        lstm_drop = self.Dropout(lstm_out)  # N * T * LSTM_hidden_size
        lstm_reshape = lstm_drop.permute(0, 2, 1)  # N * LSTM_hidden_size * T

        qatt_conv1 = self.conv1(lstm_reshape)  # N x conv1_out x T
        qatt_relu = F.relu(qatt_conv1)
        qatt_conv2 = self.conv2(qatt_relu)  # N x conv2_out x T

        qtt_softmax = F.softmax(qatt_conv2, dim=2)
        # N * conv2_out * LSTM_hidden_size
        qtt_feature = torch.bmm(qtt_softmax, lstm_drop)
        # N * (conv2_out * LSTM_hidden_size)
        qtt_feature_concat = qtt_feature.view(batch_size, -1)

        return qtt_feature_concat

Performance of pre-trained model

On using the pretrained pythia model downloaded, I'm just getting 66.7% overall accuracy on test-dev, which is much lower than reported single-model accuracy. Am I doing something wrong?
I tried downloading train+dev model from https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/pythia_train_val.pth
but the link seems broken.

Also, how much GPU memory is needed to train the model without any changes? I've a V100 but I'm getting not sufficient memory errors.
Thanks

Error when setup

Hi, I keep meeting this error and can not figure out what happen. Can you give me some suggestion to solve?

The below is the screen messages.

running develop
Checking .pth file support in /usr/local/lib/python3.5/dist-packages/
/usr/bin/python3 -E -c pass
TEST PASSED: /usr/local/lib/python3.5/dist-packages/ appears to support .pth files
running egg_info
writing pythia.egg-info/PKG-INFO
writing dependency_links to pythia.egg-info/dependency_links.txt
writing requirements to pythia.egg-info/requires.txt
writing top-level names to pythia.egg-info/top_level.txt
reading manifest file 'pythia.egg-info/SOURCES.txt'
writing manifest file 'pythia.egg-info/SOURCES.txt'
running build_ext
Creating /usr/local/lib/python3.5/dist-packages/pythia.egg-link (link to .)
pythia 0.3 is already the active version in easy-install.pth

Installed /home/victor/VQA/Pythia
Processing dependencies for pythia==0.3
Searching for fastText
Best match: fastText [unknown version]
Downloading https://github.com/facebookresearch/fastText/tarball/master#egg=fastText

Processing master
Writing /tmp/easy_install-37swd4hf/facebookresearch-fastText-6dd2e11/setup.cfg
Running facebookresearch-fastText-6dd2e11/setup.py -q bdist_egg --dist-dir /tmp/easy_install-37swd4hf/facebookresearch-fastText-6dd2e11/egg-dist-tmp-htcy2yd7
warning: no files found matching 'PATENTS'
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
error: Setup script exited with error: SandboxViolation: mkdir('/home/victor/.local/lib', 448) {}

The package setup script has attempted to modify files on your system
that are not within the EasyInstall build area, and has been aborted.

This package cannot be safely installed by EasyInstall, and may not
support alternate installation locations even if you run its setup
script by hand.  Please inform the package's author and the EasyInstall
maintainers to find out if a fix or workaround is available.

What does "imdb" refer to?

I know it's a silly question 😅 And what is the layout_max_len, vocab_layout_file and the has_gt_layout? I mean, what is "layout"? 😅

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.