linjieli222 / vqa_regat Goto Github PK

View Code? Open in Web Editor NEW

176.0 6.0 38.0 1.36 MB

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

Home Page: https://arxiv.org/abs/1903.12314

License: MIT License

Python 97.05% Shell 2.95%

pytorch attention vqa

vqa_regat's People

Contributors

Stargazers

Watchers

vqa_regat's Issues

about spa_adj_matrix & sem_adj_matrix

Sincerely thank for your sharing codes.

I have some questions about adjacency matrix generation. Generally, semantic graph is directed, and spatial one is undirected graph. So I want to know how to generate both adjacency matrix. Could you share the codes of them?

Some question about semantic relationship classification

Hi author,

I find your paper is very interesting and want to use it on my model. But I have some questions about semantic relationship classification.

You said you have top 15 relationships after normalizing the predicates with relationship-alias, like wearing, holding, can I ask:

How do you do with the other relationships which are not included in Top 15 sets? Set them as the non-relation or drop out them? If you set them to the non-relation labels, the number of the non-relation sets could be far more than other labels. How do you deal with this problem?

Thanks.

Loss can't backward

Hi, I encountered an issue that the loss can't backward. It seems there is an operation to modify a tensor inplace, but I can't find where is it. Could you please look into it?
Here is the stacktrace:
Traceback (most recent call last):
File "/home/qiyuan/2021summer/VQA_ReGAT/main.py", line 275, in
main()
File "/home/qiyuan/2021summer/VQA_ReGAT/main.py", line 271, in main
train(model, train_loader, eval_loader, args, device)
File "/home/qiyuan/2021summer/VQA_ReGAT/train.py", line 111, in train
loss.backward()
File "/home/qiyuan/miniconda3/envs/torch14/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/qiyuan/miniconda3/envs/torch14/lib/python3.8/site-packages/torch/autograd/init.py", line 97, in backward
Variable._execution_engine.run_backward(
RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.
0%| | 0/3467 [00:04<?, ?it/s]

Process finished with exit code 1

Thanks.

Training with Visual Genome

When trying to train with the train/val splits and Visual Genome
python main.py --config config/butd_vqa.json --seed 1 --use_both --use_vg

I got this error
nParams= 41192896
optim: adamax lr=0.0010, decay_step=2, decay_rate=0.25,grad_clip=0.25
LR decay epochs: 15,17,19
gradual warmup lr: 0.0005
Traceback (most recent call last):
File "main.py", line 289, in
train(model, train_loader, eval_loader, args, device)
File "train.py", line 91, in train
sem_adj_matrix) in enumerate(train_loader):
File "python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "utils.py", line 172, in trim_collate
return [trim_collate(samples) for samples in transposed]
File "utils.py", line 172, in
return [trim_collate(samples) for samples in transposed]
File "utils.py", line 161, in trim_collate
return torch.LongTensor(batch)
TypeError: an integer is required (got type str)

I think the problem is in the process of encoding Genome questions. Do you have any ideas?

How to test your model with image and text input by a user?

Following the steps in the README, I successfully ran eval.py, but it only can give me an eval score.
So, How can I test your pertained models with image and text input by a user?

Questions for categories

Dear scholar, I want to ask you where can i find the 11 different categories for Spatial Graph and 14 different categories for semantic Graph ? Thanks.

pos_box/bb

I want to know the meaning of pos_boxes?
When it is adaptive, that the number of bounding boxes is free, how can the final bounding boxes be obtained from pos_boxes?

GPU will low memory

Thank you for your code.
I have 8 GPUs with 12 GB each. How can I modify the code to run training?

Learning rate related issues

I am very interested in your work, but I have some doubts about the setting of the learning rate.I currently only have a 2080ti graphics card.How should the learning rate be adjusted?

Test with new images

Can you provide the pretrained model for generating visual features, so that I can test with new images?

Can you update a new download.sh?

Thank you for sharing your perfect job.
When I use “source tools / download.sh” to download the preprocessed data, some download links in download.sh are invalid. Can you update a new download.sh?

Code to build the spatial adjacency matrix

Hi,

Thanks for sharing the code. Could you also share the code for building the adjacency matrix of spatial relations from bounding boxes?

Thanks.

Can you tell me the specific label of the semantic relationship type?

Hi, It's a great job!
In your semantic relationship encoding, you use one-hot encoding in semantic matrix. I wnat to use word embedding to encode the semantic relationships, so can you tell me the specific label of the semantic relationship type? such as, 0: no_relation, 1:wearing, 2:sitting on, 3:standing on, ...., 15: looking at. Or have you used word embedding to encode and how effective is it?

Thank you very much, I'm looking forward your reply.

Does the setting of num_workers in DataLoder affect the final result？

Hello, I am very interested in your work, I have a question about does the setting of num_workers in DataLoder affect the final result？Due to my own workstation problem, I had to set num_workers=0 in Dataloder, otherwise I would report an error.

What are the specific categories in your semantic relationship classification?

about weighted sum of the three modules

Dear scholar,
Thanks for your perfect work. I want to ask about the trade-off weights which is decided on which datasets, is it reffered to the val datasets. If so that, the result of the three module integrated model could have a little bias? Because i think the hyper-parameter could be attained on val datasets, but the final model result should decide on the test dataset?
I am eager to wait for your reply.

Question about dir(i,j) matrix in equation(8)

Sincerely thank for your sharing codes.

However, I found the dir matrix0 and dir[1] are actually the same in spatial relation. That is the condensed_adj_matrix in the condensed_adj_matrix = torch.sum(input_adj_matrix, dim=-1). If the condensed_adj_matrix[0][i][j]=1, then condensed_adj_matrix[1][i][j] = 1 too which means bbx[0] and bbx[1] has a spatial relation(edge). So I feel very confused about it~

Best parameter configuration？

Could I ask for the best best parameter configuration in your code?
Looking forward to your reply！

Code for learned image and question attentions

Dear author, you showed some examples of learned image attentions (Figure 4).
Can you post the code you used to extract and create the learned image attentions? Thank you!

The semantic_embedding and spatic_embedding types.

Hi, it is a great work for VQA. I did't download the datasets. So I want to konw the types of semantic_embedding and spatic_embedding, are they one-hot embedding or word embedding or extract features from model? I'm looking forward you reply, thanks!

How can I compute sem_adj_matrix, spa_adj_matrix ？

How can I compute sem_adj_matrix, spa_adj_matrix ？Where is the related code ？

Code for extracting image features

Could you provide the Code for extracting image features for this paper ?

loading features from the fixed h5 file

Hi lin! thanks your code
when I used the fixed image features from the h5 file , I got the error as follow:
loading dictionary from data/glove/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
File "/home/xatu/anaconda2/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in init
swmr=swmr)
File "/home/xatu/anaconda2/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'data/Bottom-up-features-adaptive/train36.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

but the adaptive image features is good, can you give me some resolution? thank you very much!

unhandled cuda error

How should I deal with this error：“terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL Error 1: unhandled cuda error
Aborted (core dumped) ”?

Low accuracy re-train with the provided config files of the pretrained models

Thank you for sharing your perfect job.
I use the pretrained_models/regat_implicit/ban_1_implicit_vqa_196/hps.json for the training phase with 2 GPU, each 10GB. The CUDA version is 10.0 and python is 3.6. All datasets are downloaded.
The trained dataset is VQA 2.0 dataset .

Here is my detail config:

     epochs: 20  
     base_lr: 0.001  
     lr_decay_start: 15  
     lr_decay_rate: 0.25  
     lr_decay_step: 2  
     lr_decay_based_on_val: false  
     grad_accu_steps: 1  
     grad_clip: 0.25  
     weight_decay: 0  
     batch_size: 128  
     output: "saved_models/regat_implicit/ban_1_implicit_vqa_196"  
     save_optim: false  
     log_interval: -1  
     seed: 196  
     checkpoint: ""  
     dataset: "vqa"  
     data_folder: "./data"  
     use_both: false  
     use_vg: false  
     adaptive: true  
     relation_type: "implicit"  
     fusion: "ban"  
     tfidf: true  
     op: "c"  
     num_hid: 1024  
     ban_gamma: 1  
     mutan_gamma: 2  
     imp_pos_emb_dim: 64  
     spa_label_num: 11  
     sem_label_num: 15  
     dir_num: 2  
     relation_dim: 1024  
     nongt_dim: 20  
     num_heads: 16  
     num_steps: 1  
     residual_connection: true  
     label_bias: false  
     lr_decrease_start: 15

The results are very poor. After 20 epochs, the log.txt shows:

epoch 15, time: 746.65  
	train_loss: 4.19, norm: 3.2776, score: 44.76  
	eval score: 43.99 (92.66)  
	entropy:  0.00  
saving current model weights to folder  
lr: 0.0005  
epoch 16, time: 726.26  
	train_loss: 4.16, norm: 4.1697, score: 45.21  
	eval score: 44.36 (92.66)  
	entropy:  0.01  
saving current model weights to folder  
decreased lr: 0.0001  
epoch 17, time: 750.88  
	train_loss: 4.12, norm: 2.4974, score: 45.56  
	eval score: 44.41 (92.66)  
	entropy:  0.01  
saving current model weights to folder  
lr: 0.0001  
epoch 18, time: 745.58  
	train_loss: 4.16, norm: 4.6050, score: 45.61  
	eval score: 44.47 (92.66)  
	entropy:  0.01  
saving current model weights to folder  
decreased lr: 0.0000  
epoch 19, time: 743.70  
	train_loss: 4.10, norm: 3.3921, score: 45.80  
	eval score: 44.46 (92.66)  
	entropy:  0.01

I also train the dataset using pretrained_models/regat_implicit/butd_implicit_vqa_6371, and it reach 58

Can you give me some advice about reproducing the accuracy score of the paper?

Training confusion

During training, when constructing the model, do you decide to use val_dset?
model = build_regat(val_dset, args).to(device)

Where do you add b_{lab(i,j)} to your value term (see eq{2} in the paper)

Hi Authors,

Thanks for releasing the code! This is very helpful.

I was wondering where do you add the bias term $b_{lab(i,j)}$ as mentioned in equation 2 of your paper in this code.

I was looking at Line 155 here:
https://github.com/linjieli222/VQA_ReGAT/blob/master/model/graph_att_layer.py#L155

It seems like the attention weights are directly multiplied to Value vector. If you can point us to where in the code are you adding the bias term $b_{lab}$, that will be great.

Thanks!

download.sh

Hi, I don't understand the two identical download codes......

VQA cp-v2 Annotations

mkdir data/cp_v2_annotations
wget -P data/cp_v2_annotations https://computing.ece.vt.edu/~aish/vqacp/vqacp_v2_train_annotations.json
wget -P data/cp_v2_annotations https://computing.ece.vt.edu/~aish/vqacp/vqacp_v2_train_annotations.json

attention map

Hello, I would like to know how to obtain the attention map in Figure 4, and which attention map is it

Could you please share the code on how to get semantic and spatial information in the .hdf5 files?

Hi, this is a great job. I' very interested in the 'image_adj_matrix' and 'semantic_adj_matrix' which represent the spatial and semantic information,respectively. But in your project, this two important information had write in the .hdf5 files, so could you please share the code or pretrained models on how to get this two information. I'm looking forward for your reply, thanks!
Best wishes for you!

Wdir(i,j) in Function 8 in the explicit model

Dear scholar,
I want to ask you whether the dimention of W dir(i,j) is dh×(dq+dv) and the bias of b lab(i,j) is one-hot vector?
And I doubt about the meaning of W dir(i,j).

Can you provide a well-trained model?

Hello, I just want to use your model to test directly on other data sets. Can you provide the model that is finally used for testing?

weighted sum confusion

I want to work on vqa2 dataset

Could you please explain how did you implement the stated line in code.
Our best results are achieved by combining the best single relation models through weighted sum

I mean how do we combine all the models while evaluating using
python3 eval.py --output_folder pretrained_models/regat_implicit/ban_1_implicit_vqa_196

Also, please let me know did you use BAN model in place of butd for best results when using the weighted sum?
python3 main.py --config config/butd_vqa.json

GAT Loss Function

Hi,
Thanks for your great work! I have a problem with GAT. I can't find the loss function of GAT in your works, is the GAT model supervised and how is it trained in the current work?

Load the cake/val_target. pkl

When the cake/val_target.pkl file is loaded, some labels appear with no content.But I looked at the Annotations in the official website, and saw that your answer differs from it.For example, question_id=393225000,answer=foodiebakercom.But your labels:[].There are also certain labels that appear differently than you gave them. For example, question_id:393225001,label=4 should correspond to 55.

The difference between implicit relation and spatial relation.

Sincerely thank for your sharing codes.

Also I am not clear about the difference between implicit relation and spatial relation. As implicit relation uses bbox coordinate to calculate bbox weight in order to calculate relative geometric position, at the same time, the spatial relation uses bbox coordinate to build spatial graph in order to calculate attention, so is there a fundamental difference between these two?

There may be a better memory allocation scheme when loading datasets

I found this code in dataset.py.

with h5py.File(h5_path, 'r') as hf:

  self.features = np.array(hf.get('image_features'))

  self.normalized_bb = np.array(hf.get('spatial_features'))

That will cause huge memory usage. I think you can use some like feat_dset=hf['image_features'];feat_dset[index] in __getitem__.

Some questions about the union bounding box feature vector for classifer mode

Hi author,

I'm very interested in your paper and I want to use it for another model.

But I'm a little confused about union bounding box feature vector. How to get the union bounding box features from two regions feature maps? Because the bottom-up attention just provided the regions feature maps and didn't provide the union bounding boxes features.

Does it mean we should use the pre-trained Faster-RCNN model to extract the features then mapping the union bounding boxes to the feature maps?

Thanks.

NOthing

Thanks for your perfect work！

features/model that are interrupted during download doesn't continue from the last checkpoint

I am using ubuntu 16.04 with tesla servers
I am trying to download the models and features using the links that have been given in the download.sh file. However after sometime the server's connection keeps on disconnecting and when I try to continue the remaining file from the last checkpoint it doesn't work and starts from the scratch.

wget -c https://convaisharables.blob.core.windows.net/vqa-regat/pretrained_models.zip

could you tell me why isn't it working. As I have used the -c flag with other links in the past as well and they worked well. They create a wget-log to take the last checkpoint of downloadable file.

The error of disconnection

pretrained_models.zip 94%[=================> ] 2.63G 243KB/s in 2h 4m
2021-04-14 01:09:30 (368 KB/s) - Read error at byte 2822012928/2970938152 (Connection reset by peer). Retrying.
--2021-04-14 01:09:33-- (try: 4) https://convaisharables.blob.core.windows.net/vqa-regat/pretrained_models.zip
Connecting to convaisharables.blob.core.windows.net (convaisharables.blob.core.windows.net)|13.77.184.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2970938152 (2.8G) [application/x-zip-compressed]
Saving to: ‘pretrained_models.zip’
pretrained_models.zip 0%[ ] 13.16M 449KB/s eta 3h 11m ^

Can you provide a Google Drive link for dataset?

Thank you for sharing your perfect job.
Can you provide a Google Drive link for the datasets? It is too slow to download these datasets.

linjieli222 / vqa_regat Goto Github PK

vqa_regat's People

Contributors

Stargazers

Watchers

Forkers

vqa_regat's Issues

VQA cp-v2 Annotations

Recommend Projects

Recommend Topics

Recommend Org

Jobs