Training error (num_gpu argument) about dl4mt-nonauto HOT 8 CLOSED

butsugiri commented on July 20, 2024

Training error (num_gpu argument)

from dl4mt-nonauto.

Comments (8)

mansimov commented on July 20, 2024 1

Hi,

I forgot to add in README that you need to use my modified torchtext that supports num_gpus argument https://github.com/mansimov/pytorch_text_multigpu

I will update the README. Also try using PyTorch 0.4.* for consistency
Can you try it and let me know ?

from dl4mt-nonauto.

baoy-nlp commented on July 20, 2024

Thank you for sharing the code!

I tried running your model with multiple GPU settings as follows, and I got an error from BucketIterator.
It seems that BucketIterator (from torchtext) does not accept num_gpus argument.
I am using torchtext (version 0.3.1).

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob 0.5 --layerwise_denoising_weight --use_distillation --num_gpus 3
2019-02-20 16:05:59 INFO: - random seed is 19920206
2019-02-20 16:05:59 INFO: - TRAINING CORPUS : /work01/kiyono/dl4mt-nonauto-data/iwslt/en-de/distill/ende/train.tags.en-de.bpe
2019-02-20 16:06:02 INFO: - before pruning : 195897 training examples
2019-02-20 16:06:02 INFO: - after pruning : 195897 training examples
Traceback (most recent call last):
  File "run.py", line 572, in <module>
    num_gpus=args.num_gpus)
TypeError: __init__() got an unexpected keyword argument 'num_gpus'

Do you have any ideas about how to avoid this error?

you can checkout the branch to "multigpu".

from dl4mt-nonauto.

butsugiri commented on July 20, 2024

I am already using multigpu branch (commit: e15acb2).

In https://github.com/nyu-dl/dl4mt-nonauto/blob/multigpu/run.py#L536-L537, there is num_gpu argument, which is not available in torchtext.data.BucketIterator (https://torchtext.readthedocs.io/en/latest/data.html#torchtext.data.BucketIterator)

from dl4mt-nonauto.

butsugiri commented on July 20, 2024

Thank you for your reply!
I will try the modified version and see what happens.

from dl4mt-nonauto.

baoy-nlp commented on July 20, 2024

Thank you very much for sharing, and I would like to ask that, how we can run the code for the performance consistent with paper, specifically the IWSLT 16-ENDE experiment. I've tried to run it, but BLEU is always below that of the paper about five to six. Could you give us a set of specific settings for IWLT-ENDE by the way? Thank you very much.

from dl4mt-nonauto.

mansimov commented on July 20, 2024

Off the top of my head, try running the following script in the main branch

python run.py --dataset iwslt-ende --vocab_size 40000 --load_vocab --ffw_block highway --params small --batch_size 2048 --eval_every 1000 --lr_schedule anneal --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --denoising_prob --layerwise_denoising_weight --use_distillation

After training it you need to train the length prediction module by running above script with --load_from with specified trained model and --resume --trg_len_option predict --finetune_trg_len

The script should be similar in multigpu branch

python run.py --dataset iwslt-ende --vocab_size 40000 --load_vocab --ffw_block highway --params small --batch_size 2048 --num_gpus 2 --eval_every 1000 --lr_schedule anneal --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --denoising_prob --layerwise_denoising_weight --use_distillation

from dl4mt-nonauto.

butsugiri commented on July 20, 2024

@mansimov I installed the modified version of torchtext and confirmed that the training actually works.
Thank you again for your advice.

from dl4mt-nonauto.

mansimov commented on July 20, 2024

Great!
@butsugiri & @baoy-nlp feel free to ask me any other questions and update me on your progress!

from dl4mt-nonauto.

Training error (num_gpu argument) about dl4mt-nonauto HOT 8 CLOSED

Comments (8)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs