Comments (17)
We tried a bunch of other configurations and finally we got excellent results on the state description version of the dataset.
By applying your learning rate schedule, accuracy trend is as follows:
This is already a very good result: accuracy reached 94%.
Then, we tried to feed question words in the LSTM in reverse order (in our code, activated by using --invert-questions
option).
By training using your learning rate policy we achieved state-of-the-art results for the state description version of the dataset: 98% accuracy, as shown in the following accuracy graph:
As you can see, we were able to reach the best results even in much less training epochs.
These results are reproducible by using the following command:
python3 train.py --clevr-dir path/to/CLEVR_v1.0 --batch-size 640 --lr 0.000005 --lr-step 20 --lr-gamma 2 --lr-max 0.0005 --epochs 250 --state-description --clip-norm 50 --invert-questions
We reached a similar result using question inversion together with our batch size policy (constant learning rate while doubling batch size every 50 epochs, starting from 32 up to 640):
Even in this case we reached excellent results in much less training epochs. Note that, since initial batch size in this configuration is small, training can be quite slow during the first epochs.
Code to reproduce the above result:
python3 train.py --clevr-dir path/to/CLEVR_v1.0 --batch-size 32 --bs-gamma 2 --bs-step 50 --bs-max 640 --lr 0.0001 --epochs 250 --state-description --clip-norm 50 --invert-questions
Hope these settings will give good results even when training using the full architecture including the visual pipeline.
from relationnetworks-clevr.
Hi @aelnouby,
we are currently able to reach around 65% on the CLEVR image test set, still far from the results in the paper. We are in contact with the authors, and we are trying to figure out where the problem might be. We'll let you know if there's any news about it.
Here are the training plots and hyper-parameters of some of our best tries: plots
Hope it helps.
from relationnetworks-clevr.
Hi @aelnouby, we can provide you some accuracy plots from the training with state descriptions (details in the paper), which is useful for debugging since it is much faster than the full architecture including the visual pipeline (that is, using the CNN before the RN). We broke down the accuracy into the answer classes in order to have detailed insights.
Using paper parameters (Batch size: 640, 2% dropout, 0.0001 learning rate):
Test loss plots (that I'm not going to report here), evidence a premature overfit.
We discovered that by doubling the batch size during training, starting from 32 up to 640 every 45 epochs, accuracy starts raising. However we are reaching 85% accuracy this way, while authors claim 96%. We are still investigating this behavior.
from relationnetworks-clevr.
Thanks for your reply.
Yeah that is exactly what I am getting. The accuracy increases very fast till 40% then it increases extremely slowly till it get around 60% after 1M iterations !
I will keep working at it and I will let you know if I found anything useful.
from relationnetworks-clevr.
@mesnico Thanks so much for sharing.
This is a very interesting finding ! I will run experiments with the visual pipeline as well and will notify you with the results. Thanks again for sharing.
from relationnetworks-clevr.
Hi @fabiocarrara , @mesnico ,
I have run an experiment on the visual pipeline. However, applying the same schedule as you did with the state descriptions would be very slow. So Instead of starting from small batch size to big. I have applied a scheme similar to the warm up used here https://arxiv.org/pdf/1706.02677.pdf . I used 640 batch size throughout the training, however I started with a small learning rate 1.56e-5 and doubled the learning rate every 20 epochs until 5e-4. The plots are below, it showed an improvement, I got 72% validation accuracy before overfitting, but it took almost full 2 days to run on 2 P100 GPUs.
Also please note that I found a bug in my data augmentation, I am not sure how much it will affect the results.
I am running another experiment now with SGD instead of Adam, stopping at 2.5e-4 and gradient clipping similar to what you did.
from relationnetworks-clevr.
The model converged faster (both in terms of # epochs and total training time) when I doubled the size of the layers:
{
"state_description": false,
"g_layers": [512,512,512,512],
"question_injection_position": 0,
"f_fc1": 512,
"f_fc2": 512,
"dropout": 0.5,
"lstm_hidden": 256,
"lstm_word_emb": 32
}
Got 96% within 63 epochs with the bigger model (batch size 128, and lr scheduling).
Got 93% in 140 epochs with smaller model (original-fp config, batch size 320, lr scheduling).
My experiments were with bottom-up features though.
from relationnetworks-clevr.
Hi @erobic, thank you for sharing these interesting findings!
I will keep this in mind for future trainings and, as soon as possible, I'll take care of sharing this interesting configuration in the readme file.
from relationnetworks-clevr.
Hi @LMdeLiangMi, this finding was surprising to us too. There is an explanation for this choice when addressing sequence to sequence translation, as explained here.
Concerning VQA, we have not performed an extensive study on this phenomenon yet. However, I think this could be because some of the most important question details are at the beginning of the sentence, and the LSTMs have some problems memorizing information from tokens seen too far in the past.
from relationnetworks-clevr.
For example, for question [1, 2, 3, 4], we pad zero at the end then get [1, 2, 3, 4, 0, 0, 0, 0], and then reverse and get [0, 0, 0, 0, 4, 3, 2, 1].
Maybe we should pad the question [1, 2, 3, 4] like [0, 0, 0, 0, 1, 2, 3, 4]? The initial state of RNN is 0, too.
========
Reached 0.924 (from pixel), so [0, 0, 0, 0, 4, 3, 2, 1] should be better.
from relationnetworks-clevr.
@LMdeLiangMi thank you for sharing your experiment.
You are right, in my code the inverted question looks like [0, 0, 0, 0, 4, 3, 2, 1].
In this branch I changed how the inverted question is handled. The question is first inverted and then it is padded so that it results in [4, 3, 2, 1, 0, 0, 0, 0]. Then, I used PyTorch sequence APIs (torch.nn.utils.rnn.pack_padded_sequence
) to make the LSTM capable of handling variable-length sequences in a batch by ignoring the padding. The padding is only used to put the questions in a tensor form so that we can build a batch of questions, gaining in efficiency.
However, I performed some experimentation with this setup and it seems that these changes are not very significant. I hope I will be able to merge these changes as soon as possible since this is the correct way of handling batches of sequences.
from relationnetworks-clevr.
Hi @aelnouby, these are very interesting results! Thank you for your contribution. In the background we are still training with the visual pipeline, hoping to get out of the plateau. Accuracy is shyly increasing but it is very slow. Maybe your settings can help speeding up training (and, as we can see from your plot, possibly reaching higher accuracies). We'll give it a try. Thanks!
from relationnetworks-clevr.
@mesnico That very cool ! great results.
I have not tried on the state descriptions at all. But with the visual pipeline the results are not as good (at least in my implementation), may be it needs more tuning for learning rate schedule, but it is extremely hard since it takes 2~3 days to train one experiment.
from relationnetworks-clevr.
I am wondering how batch size scheduling and learning rate scheduling differed for "from pixels" configuration. The README page reports lr scheduling approach and it seems to achieve 80+ accuracy only after 200 epochs. Did batch scheduling converge even slower?
In fact, I am trying to train RN with pre-trained features (from bottom-up attention), but the overall accuracy remains < 50% even after 60 epochs with either of the scheduling mechanisms. Wondering if there are new findings regarding faster convergence.
from relationnetworks-clevr.
@erobic As of now, we did not train the network using batch size scheduling for "from pixels" configuration. Regarding lr, during "from pixels" experiments we used the very same scheduling used for "state description".
We preferred lr scheduling since the batch size increasing policy is quite inefficient, especially during the first epochs, where the batch size is really small and the GPU utilization is very low. Even if it seems converging faster if we look at the accuracy graph, we must keep into consideration that the first epochs are very slow. We did not compare the two approaches in a comprehensive manner, using elapsed time instead of the number of elapsed epochs, but they could result to be perfectly comparable if considering elapsed time.
Regarding your use case, maybe 60 epochs could be not enough. We noticed that the training is highly susceptible to hyper-parameters configuration. A slight variation can bring to longer training times to obtain the same accuracy. Also, we observed that accuracy tends to increase non uniformly: it remains almost constant for a relative long time before raising up.
Hope this helps
from relationnetworks-clevr.
@mesnico Thank you for the reply. I will use lr scheduling and run for more epochs. However, staircase-like convergence and not knowing whether it is actually going to converge after all those epochs is a bit concerning. The paper does mention very large number of iterations (1.4 M), but not sure if they also got the staircase convergence. I will update once my experiments are done.
Thanks again!
from relationnetworks-clevr.
Hi, @mesnico Would you mind telling me that why --invert-questions works?
from relationnetworks-clevr.
Related Issues (9)
- Terminating at Epoch 9 HOT 3
- Does not converge HOT 1
- logfile is not showing any runs for the test set. The plots also don't show anything for test set and accuracy. HOT 2
- stack() fails HOT 2
- Do you know how to calculate the accuracy of the Count, Exists, Compare Numbers, etc HOT 2
- About the running time and gpu memory usage HOT 6
- Training time? HOT 1
- AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/GoodPaperCode/RelationNetworks-CLEVR-master/env/lib/python3.6/site-packages/torch/_utils.py'> HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from relationnetworks-clevr.