linjieli222 / vqa_regat Goto Github PK
View Code? Open in Web Editor NEWResearch Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Home Page: https://arxiv.org/abs/1903.12314
License: MIT License
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Home Page: https://arxiv.org/abs/1903.12314
License: MIT License
Sincerely thank for your sharing codes.
I have some questions about adjacency matrix generation. Generally, semantic graph is directed, and spatial one is undirected graph. So I want to know how to generate both adjacency matrix. Could you share the codes of them?
Hi author,
I find your paper is very interesting and want to use it on my model. But I have some questions about semantic relationship classification.
You said you have top 15 relationships after normalizing the predicates with relationship-alias, like wearing, holding, can I ask:
Thanks.
Hi, I encountered an issue that the loss can't backward. It seems there is an operation to modify a tensor inplace, but I can't find where is it. Could you please look into it?
Here is the stacktrace:
Traceback (most recent call last):
File "/home/qiyuan/2021summer/VQA_ReGAT/main.py", line 275, in
main()
File "/home/qiyuan/2021summer/VQA_ReGAT/main.py", line 271, in main
train(model, train_loader, eval_loader, args, device)
File "/home/qiyuan/2021summer/VQA_ReGAT/train.py", line 111, in train
loss.backward()
File "/home/qiyuan/miniconda3/envs/torch14/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/qiyuan/miniconda3/envs/torch14/lib/python3.8/site-packages/torch/autograd/init.py", line 97, in backward
Variable._execution_engine.run_backward(
RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.
0%| | 0/3467 [00:04<?, ?it/s]
Process finished with exit code 1
Thanks.
When trying to train with the train/val splits and Visual Genome
python main.py --config config/butd_vqa.json --seed 1 --use_both --use_vg
I got this error
nParams= 41192896
optim: adamax lr=0.0010, decay_step=2, decay_rate=0.25,grad_clip=0.25
LR decay epochs: 15,17,19
gradual warmup lr: 0.0005
Traceback (most recent call last):
File "main.py", line 289, in
train(model, train_loader, eval_loader, args, device)
File "train.py", line 91, in train
sem_adj_matrix) in enumerate(train_loader):
File "python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "utils.py", line 172, in trim_collate
return [trim_collate(samples) for samples in transposed]
File "utils.py", line 172, in
return [trim_collate(samples) for samples in transposed]
File "utils.py", line 161, in trim_collate
return torch.LongTensor(batch)
TypeError: an integer is required (got type str)
I think the problem is in the process of encoding Genome questions. Do you have any ideas?
Following the steps in the README, I successfully ran eval.py
, but it only can give me an eval score.
So, How can I test your pertained models with image and text input by a user?
Dear scholar, I want to ask you where can i find the 11 different categories for Spatial Graph and 14 different categories for semantic Graph ? Thanks.
I want to know the meaning of pos_boxes?
When it is adaptive, that the number of bounding boxes is free, how can the final bounding boxes be obtained from pos_boxes?
Thank you for your code.
I have 8 GPUs with 12 GB each. How can I modify the code to run training?
I am very interested in your work, but I have some doubts about the setting of the learning rate.I currently only have a 2080ti graphics card.How should the learning rate be adjusted?
Can you provide the pretrained model for generating visual features, so that I can test with new images?
Thank you for sharing your perfect job.
When I use “source tools / download.sh” to download the preprocessed data, some download links in download.sh are invalid. Can you update a new download.sh?
Hi,
Thanks for sharing the code. Could you also share the code for building the adjacency matrix of spatial relations from bounding boxes?
Thanks.
Hi, It's a great job!
In your semantic relationship encoding, you use one-hot encoding in semantic matrix. I wnat to use word embedding to encode the semantic relationships, so can you tell me the specific label of the semantic relationship type? such as, 0: no_relation, 1:wearing, 2:sitting on, 3:standing on, ...., 15: looking at. Or have you used word embedding to encode and how effective is it?
Thank you very much, I'm looking forward your reply.
Hello, I am very interested in your work, I have a question about does the setting of num_workers in DataLoder affect the final result?Due to my own workstation problem, I had to set num_workers=0 in Dataloder, otherwise I would report an error.
What are the specific categories in your semantic relationship classification?
Dear scholar,
Thanks for your perfect work. I want to ask about the trade-off weights which is decided on which datasets, is it reffered to the val datasets. If so that, the result of the three module integrated model could have a little bias? Because i think the hyper-parameter could be attained on val datasets, but the final model result should decide on the test dataset?
I am eager to wait for your reply.
Sincerely thank for your sharing codes.
However, I found the dir matrix0 and dir[1] are actually the same in spatial relation. That is the condensed_adj_matrix in the condensed_adj_matrix = torch.sum(input_adj_matrix, dim=-1)
. If the condensed_adj_matrix[0][i][j]=1, then condensed_adj_matrix[1][i][j] = 1 too which means bbx[0] and bbx[1] has a spatial relation(edge). So I feel very confused about it~
Could I ask for the best best parameter configuration in your code?
Looking forward to your reply!
Dear author, you showed some examples of learned image attentions (Figure 4).
Can you post the code you used to extract and create the learned image attentions? Thank you!
Hi, it is a great work for VQA. I did't download the datasets. So I want to konw the types of semantic_embedding and spatic_embedding, are they one-hot embedding or word embedding or extract features from model? I'm looking forward you reply, thanks!
How can I compute sem_adj_matrix, spa_adj_matrix ?Where is the related code ?
Could you provide the Code for extracting image features for this paper ?
Hi lin! thanks your code
when I used the fixed image features from the h5 file , I got the error as follow:
loading dictionary from data/glove/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
File "/home/xatu/anaconda2/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in init
swmr=swmr)
File "/home/xatu/anaconda2/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'data/Bottom-up-features-adaptive/train36.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
but the adaptive image features is good, can you give me some resolution? thank you very much!
How should I deal with this error:“terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL Error 1: unhandled cuda error
Aborted (core dumped) ”?
Thank you for sharing your perfect job.
I use the pretrained_models/regat_implicit/ban_1_implicit_vqa_196/hps.json
for the training phase with 2 GPU, each 10GB. The CUDA version is 10.0 and python is 3.6. All datasets are downloaded.
The trained dataset is VQA 2.0 dataset .
Here is my detail config:
epochs: 20
base_lr: 0.001
lr_decay_start: 15
lr_decay_rate: 0.25
lr_decay_step: 2
lr_decay_based_on_val: false
grad_accu_steps: 1
grad_clip: 0.25
weight_decay: 0
batch_size: 128
output: "saved_models/regat_implicit/ban_1_implicit_vqa_196"
save_optim: false
log_interval: -1
seed: 196
checkpoint: ""
dataset: "vqa"
data_folder: "./data"
use_both: false
use_vg: false
adaptive: true
relation_type: "implicit"
fusion: "ban"
tfidf: true
op: "c"
num_hid: 1024
ban_gamma: 1
mutan_gamma: 2
imp_pos_emb_dim: 64
spa_label_num: 11
sem_label_num: 15
dir_num: 2
relation_dim: 1024
nongt_dim: 20
num_heads: 16
num_steps: 1
residual_connection: true
label_bias: false
lr_decrease_start: 15
The results are very poor. After 20 epochs, the log.txt shows:
epoch 15, time: 746.65
train_loss: 4.19, norm: 3.2776, score: 44.76
eval score: 43.99 (92.66)
entropy: 0.00
saving current model weights to folder
lr: 0.0005
epoch 16, time: 726.26
train_loss: 4.16, norm: 4.1697, score: 45.21
eval score: 44.36 (92.66)
entropy: 0.01
saving current model weights to folder
decreased lr: 0.0001
epoch 17, time: 750.88
train_loss: 4.12, norm: 2.4974, score: 45.56
eval score: 44.41 (92.66)
entropy: 0.01
saving current model weights to folder
lr: 0.0001
epoch 18, time: 745.58
train_loss: 4.16, norm: 4.6050, score: 45.61
eval score: 44.47 (92.66)
entropy: 0.01
saving current model weights to folder
decreased lr: 0.0000
epoch 19, time: 743.70
train_loss: 4.10, norm: 3.3921, score: 45.80
eval score: 44.46 (92.66)
entropy: 0.01
I also train the dataset using pretrained_models/regat_implicit/butd_implicit_vqa_6371
, and it reach 58
Can you give me some advice about reproducing the accuracy score of the paper?
During training, when constructing the model, do you decide to use val_dset?
model = build_regat(val_dset, args).to(device)
Hi Authors,
Thanks for releasing the code! This is very helpful.
I was wondering where do you add the bias term
I was looking at Line 155 here:
https://github.com/linjieli222/VQA_ReGAT/blob/master/model/graph_att_layer.py#L155
It seems like the attention weights are directly multiplied to Value vector. If you can point us to where in the code are you adding the bias term
Thanks!
Hi, I don't understand the two identical download codes......
mkdir data/cp_v2_annotations
wget -P data/cp_v2_annotations https://computing.ece.vt.edu/~aish/vqacp/vqacp_v2_train_annotations.json
wget -P data/cp_v2_annotations https://computing.ece.vt.edu/~aish/vqacp/vqacp_v2_train_annotations.json
Hello, I would like to know how to obtain the attention map in Figure 4, and which attention map is it
Hi, this is a great job. I' very interested in the 'image_adj_matrix' and 'semantic_adj_matrix' which represent the spatial and semantic information,respectively. But in your project, this two important information had write in the .hdf5 files, so could you please share the code or pretrained models on how to get this two information. I'm looking forward for your reply, thanks!
Best wishes for you!
Dear scholar,
I want to ask you whether the dimention of W dir(i,j) is dh×(dq+dv) and the bias of b lab(i,j) is one-hot vector?
And I doubt about the meaning of W dir(i,j).
Hello, I just want to use your model to test directly on other data sets. Can you provide the model that is finally used for testing?
I want to work on vqa2 dataset
Could you please explain how did you implement the stated line in code.
Our best results are achieved by combining the best single relation models through weighted sum
I mean how do we combine all the models while evaluating using
python3 eval.py --output_folder pretrained_models/regat_implicit/ban_1_implicit_vqa_196
Also, please let me know did you use BAN model in place of butd for best results when using the weighted sum?
python3 main.py --config config/butd_vqa.json
Hi,
Thanks for your great work! I have a problem with GAT. I can't find the loss function of GAT in your works, is the GAT model supervised and how is it trained in the current work?
When the cake/val_target.pkl file is loaded, some labels appear with no content.But I looked at the Annotations in the official website, and saw that your answer differs from it.For example, question_id=393225000,answer=foodiebakercom.But your labels:[].There are also certain labels that appear differently than you gave them. For example, question_id:393225001,label=4 should correspond to 55.
Sincerely thank for your sharing codes.
Also I am not clear about the difference between implicit relation and spatial relation. As implicit relation uses bbox coordinate to calculate bbox weight in order to calculate relative geometric position, at the same time, the spatial relation uses bbox coordinate to build spatial graph in order to calculate attention, so is there a fundamental difference between these two?
I found this code in dataset.py.
with h5py.File(h5_path, 'r') as hf:
self.features = np.array(hf.get('image_features'))
self.normalized_bb = np.array(hf.get('spatial_features'))
That will cause huge memory usage. I think you can use some like feat_dset=hf['image_features'];feat_dset[index]
in __getitem__.
Hi author,
I'm very interested in your paper and I want to use it for another model.
But I'm a little confused about union bounding box feature vector. How to get the union bounding box features from two regions feature maps? Because the bottom-up attention just provided the regions feature maps and didn't provide the union bounding boxes features.
Does it mean we should use the pre-trained Faster-RCNN model to extract the features then mapping the union bounding boxes to the feature maps?
Thanks.
I am using ubuntu 16.04 with tesla servers
I am trying to download the models and features using the links that have been given in the download.sh file. However after sometime the server's connection keeps on disconnecting and when I try to continue the remaining file from the last checkpoint it doesn't work and starts from the scratch.
wget -c https://convaisharables.blob.core.windows.net/vqa-regat/pretrained_models.zip
could you tell me why isn't it working. As I have used the -c flag with other links in the past as well and they worked well. They create a wget-log to take the last checkpoint of downloadable file.
The error of disconnection
pretrained_models.zip 94%[=================> ] 2.63G 243KB/s in 2h 4m
2021-04-14 01:09:30 (368 KB/s) - Read error at byte 2822012928/2970938152 (Connection reset by peer). Retrying.
--2021-04-14 01:09:33-- (try: 4) https://convaisharables.blob.core.windows.net/vqa-regat/pretrained_models.zip
Connecting to convaisharables.blob.core.windows.net (convaisharables.blob.core.windows.net)|13.77.184.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2970938152 (2.8G) [application/x-zip-compressed]
Saving to: ‘pretrained_models.zip’
pretrained_models.zip 0%[ ] 13.16M 449KB/s eta 3h 11m ^
Thank you for sharing your perfect job.
Can you provide a Google Drive link for the datasets? It is too slow to download these datasets.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.