Are the hyper parameters referenced <a href="https://github.com/facebookresearch/simmc

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I didn't change anything in the except that I run it with CUDA_VISIBLE_DEVICES=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Incorrect Hyperparameters ? about simmc HOT 8 CLOSED

datarpit commented on May 29, 2024

Incorrect Hyperparameters ?

from simmc.

Comments (8)

shanemoon commented on May 29, 2024

Hi @datarpit, thank you for your comment. The baseline reported in README does use num_train_epochs=1, but please feel free to do further hyperparameter search.

The GPT2 based model does seem to have a bit of train/inference variability, but it's odd that the trained model is achieving accuracy less than 1%. Which metric (act/slot F1/prec/recall) are you referring to? Also please note that the reported numbers are in fraction (not in %), hence maximum value would be 1.0 (=100%) for all metrics.

from simmc.

datarpit commented on May 29, 2024

Numbers I see are consistently low around 0.0012. During inference, for most of devtest the model doesn't predict anything for belief state, but only system response. When I increased the epochs to 100 The results changed to below, but are still much lower than what README describes.
fashion

{
  "joint_accuracy": 0.14777937901218394,
  "act_rec": 0.2267784619415695,
  "act_prec": 0.7445161290322581,
  "act_f1": 0.34766017272544686,
  "slot_rec": 0.24814931485273273,
  "slot_prec": 0.7452696310312205,
  "slot_f1": 0.37232659813304975
}

fashion_to

{
  "joint_accuracy": 0.052142014935150006,
  "act_rec": 0.09236211188261496,
  "act_prec": 0.7654723127035831,
  "act_f1": 0.16483516483516483,
  "slot_rec": 0.09867695700110253,
  "slot_prec": 0.6976614699331849,
  "slot_f1": 0.17289913067476195
}

Another thing I wanted to mention is in the evaluation script it seems to pick targets from a folder gpt2_dst/data/v2 however there is no such folder and I had to change the script to pick from gpt2_dst/data/. Can you please check if everything is in order and there isn't a silly bug.

from simmc.

shanemoon commented on May 29, 2024

Hi @datarpit, thank you for sharing the results. Would you mind sharing the train configuration you used as well (e.g. --n_gpu, --nocuda, batchsize, fp16 training, etc.), and perhaps the version of the gpt2 model please? Alternatively, if you have a log file of the training process, I'll take a look. We'll soon share the baseline checkpoints to mitigate the issue for now.

Also thank you for catching gpt2_dst/data/v2, it should read gpt2_dst/data and the fix is now pushed.

from simmc.

datarpit commented on May 29, 2024

I didn't change anything in the script except that I run it with CUDA_VISIBLE_DEVICES=0. That will be great.

from simmc.

chetannaik commented on May 29, 2024

Same issue. I ran the baseline using the code in the repo (without any changes) and the f1 number that I see are lesser than what's reported.

Here's what I got for "Fashion (multimodal)",

Obtained results:

~/simmc/mm_dst/gpt2_dst/results/fashion
❯ jq . fashion_devtest_dials_report.json
{
  "joint_accuracy": 0.06052666055286257,
  "act_rec": 0.09603039434036421,
  "act_prec": 0.5154711673699015,
  "act_f1": 0.16189950303699616,
  "slot_rec": 0.08459525885990644,
  "slot_prec": 0.5049692380501657,
  "slot_f1": 0.1449137579790846
}

Expected/Reported baseline results:

Baseline	Dialog Act F1	Slot F1
GPT2 - Fashion (multimodal)	44.3	46.6

Note:
I ran this on just 1 GPU, multi-GPU training was throwing the following error during eval step.

07/17/2020 00:15:53 - INFO - __main__ -   ***** Running evaluation  *****
07/17/2020 00:15:53 - INFO - __main__ -     Num examples = 3513
07/17/2020 00:15:53 - INFO - __main__ -     Batch size = 32
Evaluating:  99%|█████████▉| 109/110 [00:26<00:00,  4.18it/s]
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ec2-user/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/chetnaik/simmc/mm_dst/gpt2_dst/scripts/run_language_modeling.py", line 821, in <module>
    main()
  File "/home/chetnaik/simmc/mm_dst/gpt2_dst/scripts/run_language_modeling.py", line 813, in main
    result = evaluate(args, model, tokenizer, prefix=prefix)
  File "/home/chetnaik/simmc/mm_dst/gpt2_dst/scripts/run_language_modeling.py", line 459, in evaluate
    outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels)
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
    return self.gather(outputs, self.output_device)
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 68, in forward
    return comm.gather(inputs, ctx.dim, ctx.target_device)
  File "/home/chetnaik/simmc_env/lib/python3.7/site-packages/torch/cuda/comm.py", line 165, in gather
    return torch._C._gather(tensors, dim, destination)
RuntimeError: Gather got an input of invalid size: got [2, 1, 12, 168, 64], but expected [2, 4, 12, 168, 64]

from simmc.

skiingpacman commented on May 29, 2024

Hi @chetannaik, re. multi-GPU crashes I've seen the same and have a patch which I'll try to push this week.

from simmc.

shanemoon commented on May 29, 2024

Hi @chetannaik, the patch for the issue above has been just pushed.

@datarpit @chetannaik - please take a look at the model snapshots for the MM-DST baselines (link), which should give a good starting point - please feel free to fine-tune it further, etc. You can download it and put it under /simmc/mm_dst/save/.

The README file has been updated for the results obtained with these snapshots (trained with 2 GPUs - you can load training_args.bin for more details). Since n_gpu of the machine effectively changes the batch size for training (for which the GPT2 model is very sensitive), it is recommended that you find the right epoch & batch size that work the best (among other hyperparameters), to avoid overfitting & underfitting. Please feel free to re-open this if the issue persists after hyperparameter sweep. Thank you!

from simmc.

cccntu commented on May 29, 2024

I encountered the same issue, and I was using Transformers v3.0.2.
Switching to v2.8.0 seems to solve the problem.
@shanemoon Can you share your exact version? Thanks. 😃

from simmc.

Incorrect Hyperparameters ? about simmc HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs