GithubHelp home page GithubHelp logo

Comments (17)

mesnico avatar mesnico commented on May 25, 2024 4

We tried a bunch of other configurations and finally we got excellent results on the state description version of the dataset.
By applying your learning rate schedule, accuracy trend is as follows:
accuracy
This is already a very good result: accuracy reached 94%.
Then, we tried to feed question words in the LSTM in reverse order (in our code, activated by using --invert-questions option).
By training using your learning rate policy we achieved state-of-the-art results for the state description version of the dataset: 98% accuracy, as shown in the following accuracy graph:
accuracy
As you can see, we were able to reach the best results even in much less training epochs.
These results are reproducible by using the following command:

python3 train.py --clevr-dir path/to/CLEVR_v1.0 --batch-size 640 --lr 0.000005 --lr-step 20 --lr-gamma 2 --lr-max 0.0005 --epochs 250 --state-description --clip-norm 50 --invert-questions

We reached a similar result using question inversion together with our batch size policy (constant learning rate while doubling batch size every 50 epochs, starting from 32 up to 640):
accuracy

Even in this case we reached excellent results in much less training epochs. Note that, since initial batch size in this configuration is small, training can be quite slow during the first epochs.
Code to reproduce the above result:

python3 train.py --clevr-dir path/to/CLEVR_v1.0 --batch-size 32 --bs-gamma 2 --bs-step 50 --bs-max 640 --lr 0.0001 --epochs 250 --state-description --clip-norm 50 --invert-questions

Hope these settings will give good results even when training using the full architecture including the visual pipeline.

from relationnetworks-clevr.

fabiocarrara avatar fabiocarrara commented on May 25, 2024 2

Hi @aelnouby,
we are currently able to reach around 65% on the CLEVR image test set, still far from the results in the paper. We are in contact with the authors, and we are trying to figure out where the problem might be. We'll let you know if there's any news about it.

Here are the training plots and hyper-parameters of some of our best tries: plots

Hope it helps.

from relationnetworks-clevr.

mesnico avatar mesnico commented on May 25, 2024 2

Hi @aelnouby, we can provide you some accuracy plots from the training with state descriptions (details in the paper), which is useful for debugging since it is much faster than the full architecture including the visual pipeline (that is, using the CNN before the RN). We broke down the accuracy into the answer classes in order to have detailed insights.

Using paper parameters (Batch size: 640, 2% dropout, 0.0001 learning rate):
accuracy
Test loss plots (that I'm not going to report here), evidence a premature overfit.

We discovered that by doubling the batch size during training, starting from 32 up to 640 every 45 epochs, accuracy starts raising. However we are reaching 85% accuracy this way, while authors claim 96%. We are still investigating this behavior.
accuracy

from relationnetworks-clevr.

aelnouby avatar aelnouby commented on May 25, 2024 1

Thanks for your reply.

Yeah that is exactly what I am getting. The accuracy increases very fast till 40% then it increases extremely slowly till it get around 60% after 1M iterations !

I will keep working at it and I will let you know if I found anything useful.

from relationnetworks-clevr.

aelnouby avatar aelnouby commented on May 25, 2024 1

@mesnico Thanks so much for sharing.

This is a very interesting finding ! I will run experiments with the visual pipeline as well and will notify you with the results. Thanks again for sharing.

from relationnetworks-clevr.

aelnouby avatar aelnouby commented on May 25, 2024 1

Hi @fabiocarrara , @mesnico ,

I have run an experiment on the visual pipeline. However, applying the same schedule as you did with the state descriptions would be very slow. So Instead of starting from small batch size to big. I have applied a scheme similar to the warm up used here https://arxiv.org/pdf/1706.02677.pdf . I used 640 batch size throughout the training, however I started with a small learning rate 1.56e-5 and doubled the learning rate every 20 epochs until 5e-4. The plots are below, it showed an improvement, I got 72% validation accuracy before overfitting, but it took almost full 2 days to run on 2 P100 GPUs.

Also please note that I found a bug in my data augmentation, I am not sure how much it will affect the results.
clevr
lr

I am running another experiment now with SGD instead of Adam, stopping at 2.5e-4 and gradient clipping similar to what you did.

from relationnetworks-clevr.

erobic avatar erobic commented on May 25, 2024 1

The model converged faster (both in terms of # epochs and total training time) when I doubled the size of the layers:

{
   "state_description": false,
   "g_layers": [512,512,512,512],
   "question_injection_position": 0,
   "f_fc1": 512,
   "f_fc2": 512,
   "dropout": 0.5,
   "lstm_hidden": 256,
   "lstm_word_emb": 32
}

Got 96% within 63 epochs with the bigger model (batch size 128, and lr scheduling).
Got 93% in 140 epochs with smaller model (original-fp config, batch size 320, lr scheduling).

My experiments were with bottom-up features though.

from relationnetworks-clevr.

mesnico avatar mesnico commented on May 25, 2024 1

Hi @erobic, thank you for sharing these interesting findings!
I will keep this in mind for future trainings and, as soon as possible, I'll take care of sharing this interesting configuration in the readme file.

from relationnetworks-clevr.

mesnico avatar mesnico commented on May 25, 2024 1

Hi @LMdeLiangMi, this finding was surprising to us too. There is an explanation for this choice when addressing sequence to sequence translation, as explained here.

Concerning VQA, we have not performed an extensive study on this phenomenon yet. However, I think this could be because some of the most important question details are at the beginning of the sentence, and the LSTMs have some problems memorizing information from tokens seen too far in the past.

from relationnetworks-clevr.

LinkToPast1990 avatar LinkToPast1990 commented on May 25, 2024 1

For example, for question [1, 2, 3, 4], we pad zero at the end then get [1, 2, 3, 4, 0, 0, 0, 0], and then reverse and get [0, 0, 0, 0, 4, 3, 2, 1].

Maybe we should pad the question [1, 2, 3, 4] like [0, 0, 0, 0, 1, 2, 3, 4]? The initial state of RNN is 0, too.

========
Reached 0.924 (from pixel), so [0, 0, 0, 0, 4, 3, 2, 1] should be better.

image

from relationnetworks-clevr.

mesnico avatar mesnico commented on May 25, 2024 1

@LMdeLiangMi thank you for sharing your experiment.
You are right, in my code the inverted question looks like [0, 0, 0, 0, 4, 3, 2, 1].

In this branch I changed how the inverted question is handled. The question is first inverted and then it is padded so that it results in [4, 3, 2, 1, 0, 0, 0, 0]. Then, I used PyTorch sequence APIs (torch.nn.utils.rnn.pack_padded_sequence) to make the LSTM capable of handling variable-length sequences in a batch by ignoring the padding. The padding is only used to put the questions in a tensor form so that we can build a batch of questions, gaining in efficiency.

However, I performed some experimentation with this setup and it seems that these changes are not very significant. I hope I will be able to merge these changes as soon as possible since this is the correct way of handling batches of sequences.

from relationnetworks-clevr.

mesnico avatar mesnico commented on May 25, 2024

Hi @aelnouby, these are very interesting results! Thank you for your contribution. In the background we are still training with the visual pipeline, hoping to get out of the plateau. Accuracy is shyly increasing but it is very slow. Maybe your settings can help speeding up training (and, as we can see from your plot, possibly reaching higher accuracies). We'll give it a try. Thanks!

from relationnetworks-clevr.

aelnouby avatar aelnouby commented on May 25, 2024

@mesnico That very cool ! great results.

I have not tried on the state descriptions at all. But with the visual pipeline the results are not as good (at least in my implementation), may be it needs more tuning for learning rate schedule, but it is extremely hard since it takes 2~3 days to train one experiment.

from relationnetworks-clevr.

erobic avatar erobic commented on May 25, 2024

I am wondering how batch size scheduling and learning rate scheduling differed for "from pixels" configuration. The README page reports lr scheduling approach and it seems to achieve 80+ accuracy only after 200 epochs. Did batch scheduling converge even slower?

In fact, I am trying to train RN with pre-trained features (from bottom-up attention), but the overall accuracy remains < 50% even after 60 epochs with either of the scheduling mechanisms. Wondering if there are new findings regarding faster convergence.

from relationnetworks-clevr.

mesnico avatar mesnico commented on May 25, 2024

@erobic As of now, we did not train the network using batch size scheduling for "from pixels" configuration. Regarding lr, during "from pixels" experiments we used the very same scheduling used for "state description".
We preferred lr scheduling since the batch size increasing policy is quite inefficient, especially during the first epochs, where the batch size is really small and the GPU utilization is very low. Even if it seems converging faster if we look at the accuracy graph, we must keep into consideration that the first epochs are very slow. We did not compare the two approaches in a comprehensive manner, using elapsed time instead of the number of elapsed epochs, but they could result to be perfectly comparable if considering elapsed time.

Regarding your use case, maybe 60 epochs could be not enough. We noticed that the training is highly susceptible to hyper-parameters configuration. A slight variation can bring to longer training times to obtain the same accuracy. Also, we observed that accuracy tends to increase non uniformly: it remains almost constant for a relative long time before raising up.

Hope this helps

from relationnetworks-clevr.

erobic avatar erobic commented on May 25, 2024

@mesnico Thank you for the reply. I will use lr scheduling and run for more epochs. However, staircase-like convergence and not knowing whether it is actually going to converge after all those epochs is a bit concerning. The paper does mention very large number of iterations (1.4 M), but not sure if they also got the staircase convergence. I will update once my experiments are done.

Thanks again!

from relationnetworks-clevr.

LinkToPast1990 avatar LinkToPast1990 commented on May 25, 2024

Hi, @mesnico Would you mind telling me that why --invert-questions works?

from relationnetworks-clevr.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.