I encountered a little problem. After deploying all the resources, I found that I coul

不是这样改，应该是BART_PATH=”./bart.large/model.pt” <p dir="auto

I encountered a little problem when I Training Single-View model. After deploying all the resources, I found that I could not run this program. Could you please help me check it? thank you about multi-view-seq2seq HOT 13 CLOSED

PYMAQ commented on May 27, 2024

I encountered a little problem when I Training Single-View model. After deploying all the resources, I found that I could not run this program. Could you please help me check it? thank you

from multi-view-seq2seq.

Comments (13)

jiaaoc commented on May 27, 2024 2

It means that you need to use GPU with larger memories.

The multi-GPU does not work in this way. Please check the relevant docs (fairseq/pytorch).

Multi-GPU can help you expand the batch size, but can not adapt to max_seq_len in generation tasks. Basically, multi-GPU helps you assign different training data in a batch to different GPUs and then aggregate them to achieve a larger batch size.

from multi-view-seq2seq.

jiaaoc commented on May 27, 2024 1

Please modify the path with your actual path to pre-trained BART model.

from multi-view-seq2seq.

PYMAQ commented on May 27, 2024

thank you.it works. but when i training Multi-View model,and then it come to my eyes:
OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB total capacity; 9.78 GiB already allocated; 21.75 MiB free; 9.97 GiB reserved in total by PyTorch)

The GPU are GeForceRTX2080Ti * 2 which is 11GB（every GPU） and i set CUDA_VISIBLE_DEVICES=0,1
but it didn‘t work

Maybe i need a GPU which is more than 16GB？？but i don’t know why two GPU totally 22GB did not work .oh my god.

thanks for everything.

from multi-view-seq2seq.

PYMAQ commented on May 27, 2024

话说，这个可以部署中文对话数据集吗emmm（大佬是浙大的呀

from multi-view-seq2seq.

jiaaoc commented on May 27, 2024

You could do that, but you need a Chinese-pre-trained model. Maybe mBART could work.

from multi-view-seq2seq.

chostyouwang commented on May 27, 2024

thank you.it works. but when i training Multi-View model,and then it come to my eyes:
OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB total capacity; 9.78 GiB already allocated; 21.75 MiB free; 9.97 GiB reserved in total by PyTorch)

The GPU are GeForceRTX2080Ti * 2 which is 11GB（every GPU） and i set CUDA_VISIBLE_DEVICES=0,1
but it didn‘t work

Maybe i need a GPU which is more than 16GB？？but i don’t know why two GPU totally 22GB did not work .oh my god.

thanks for everything.

BART_PATH= PATH-TO-BART-MODEL (./bart.large/model.pt)我改成BART_PATH= (./bart.large/model.pt)为什么bash train_single_view.sh显示train.py: error: argument --restore-file: expected one argument，请问该怎么修改呢

from multi-view-seq2seq.

PYMAQ commented on May 27, 2024

不是这样改，应该是BART_PATH=”./bart.large/model.pt”

from multi-view-seq2seq.

chostyouwang commented on May 27, 2024

不是这样改，应该是BART_PATH=”./bart.large/model.pt”

train_single_view.sh: line 6: /content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt: Permission denied
usage: train.py [-h] [--no-progress-bar] [--log-interval N]
[--log-format {json,none,simple,tqdm}]
[--tensorboard-logdir DIR] [--seed N] [--cpu] [--fp16]
[--memory-efficient-fp16] [--fp16-no-flatten-grads]
[--fp16-init-scale FP16_INIT_SCALE]
[--fp16-scale-window FP16_SCALE_WINDOW]
[--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
[--min-loss-scale D]
[--threshold-loss-scale THRESHOLD_LOSS_SCALE]
[--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ]
[--all-gather-list-size ALL_GATHER_LIST_SIZE] [--multi-views]
[--balance] [--lr-weight LR_WEIGHT] [--T T]
[--criterion {adaptive_loss,binary_cross_entropy,composite_loss,cross_entropy,label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,legacy_masked_lm_loss,masked_lm,nat_loss,sentence_prediction,sentence_ranking}]
[--tokenizer {moses,nltk,space}]
[--bpe {fastbpe,gpt2,bert,sentencepiece,subword_nmt}]
[--optimizer {adadelta,adafactor,adagrad,adam,adamax,lamb,nag,sgd}]
[--lr-scheduler {cosine,fixed,inverse_sqrt,polynomial_decay,reduce_lr_on_plateau,tri_stage,triangular}]
[--task TASK] [--num-workers N]
[--skip-invalid-size-inputs-valid-test] [--max-tokens N]
[--max-sentences N] [--required-batch-size-multiple N]
[--dataset-impl FORMAT] [--train-subset SPLIT]
[--valid-subset SPLIT] [--validate-interval N]
[--fixed-validation-seed N] [--disable-validation]
[--max-tokens-valid N] [--max-sentences-valid N]
[--curriculum N] [--distributed-world-size N]
[--distributed-rank DISTRIBUTED_RANK]
[--distributed-backend DISTRIBUTED_BACKEND]
[--distributed-init-method DISTRIBUTED_INIT_METHOD]
[--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID]
[--distributed-no-spawn] [--ddp-backend {c10d,no_c10d}]
[--bucket-cap-mb MB] [--fix-batches-to-gpus]
[--find-unused-parameters] [--fast-stat-sync]
[--broadcast-buffers] [--arch ARCH] [--max-epoch N]
[--max-update N] [--clip-norm NORM] [--sentence-avg]
[--update-freq N1,N2,...,N_K] [--lr LR_1,LR_2,...,LR_N]
[--min-lr LR] [--use-bmuf] [--save-dir DIR]
[--restore-file RESTORE_FILE] [--reset-dataloader]
[--reset-lr-scheduler] [--reset-meters] [--reset-optimizer]
[--optimizer-overrides DICT] [--save-interval N]
[--save-interval-updates N] [--keep-interval-updates N]
[--keep-last-epochs N] [--keep-best-checkpoints N] [--no-save]
[--no-epoch-checkpoints] [--no-last-checkpoints]
[--no-save-optimizer-state]
[--best-checkpoint-metric BEST_CHECKPOINT_METRIC]
[--maximize-best-checkpoint-metric] [--patience N]
train.py: error: argument --restore-file: expected one argument还是一样地问题，BART_PATH=”./bart.large/model.pt”后面是不是还有一个文件地址

from multi-view-seq2seq.

PYMAQ commented on May 27, 2024

你有下载bart模型吗要放好位置噢

from multi-view-seq2seq.

chostyouwang commented on May 27, 2024

你有下载bart模型吗要放好位置噢

谢谢，已经可以跑了

from multi-view-seq2seq.

chostyouwang commented on May 27, 2024

你有下载bart模型吗要放好位置噢

| INFO | fairseq.trainer | no existing checkpoint found ”./bart.large/model.pt”
我把下载地model.pt文件放在/content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt，为啥说找不到啊

from multi-view-seq2seq.

chostyouwang commented on May 27, 2024

你有下载bart模型吗要放好位置噢

而且我换成绝对路径也是找不到，，，

from multi-view-seq2seq.

chostyouwang commented on May 27, 2024

你有下载bart模型吗要放好位置噢

| INFO | fairseq.trainer | no existing checkpoint found ”./bart.large/model.pt”
我把下载地model.pt文件放在/content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt，为啥说找不到啊

ememem，文件路径打成中文引号了，，，

from multi-view-seq2seq.

I encountered a little problem when I Training Single-View model. After deploying all the resources, I found that I could not run this program. Could you please help me check it? thank you about multi-view-seq2seq HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs