GithubHelp home page GithubHelp logo

I encountered a little problem when I Training Single-View model. After deploying all the resources, I found that I could not run this program. Could you please help me check it? thank you about multi-view-seq2seq HOT 13 CLOSED

PYMAQ avatar PYMAQ commented on May 27, 2024
I encountered a little problem when I Training Single-View model. After deploying all the resources, I found that I could not run this program. Could you please help me check it? thank you

from multi-view-seq2seq.

Comments (13)

jiaaoc avatar jiaaoc commented on May 27, 2024 2

It means that you need to use GPU with larger memories.

The multi-GPU does not work in this way. Please check the relevant docs (fairseq/pytorch).

Multi-GPU can help you expand the batch size, but can not adapt to max_seq_len in generation tasks. Basically, multi-GPU helps you assign different training data in a batch to different GPUs and then aggregate them to achieve a larger batch size.

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024 1

Please modify the path with your actual path to pre-trained BART model.

from multi-view-seq2seq.

PYMAQ avatar PYMAQ commented on May 27, 2024

thank you.it works. but when i training Multi-View model,and then it come to my eyes:
OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB total capacity; 9.78 GiB already allocated; 21.75 MiB free; 9.97 GiB reserved in total by PyTorch)

The GPU are GeForceRTX2080Ti * 2 which is 11GB(every GPU) and i set CUDA_VISIBLE_DEVICES=0,1
but it didn‘t work

Maybe i need a GPU which is more than 16GB??but i don’t know why two GPU totally 22GB did not work .oh my god.

thanks for everything.

from multi-view-seq2seq.

PYMAQ avatar PYMAQ commented on May 27, 2024

话说,这个可以部署中文对话数据集吗emmm(大佬是浙大的呀

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

You could do that, but you need a Chinese-pre-trained model. Maybe mBART could work.

from multi-view-seq2seq.

chostyouwang avatar chostyouwang commented on May 27, 2024

thank you.it works. but when i training Multi-View model,and then it come to my eyes:
OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB total capacity; 9.78 GiB already allocated; 21.75 MiB free; 9.97 GiB reserved in total by PyTorch)

The GPU are GeForceRTX2080Ti * 2 which is 11GB(every GPU) and i set CUDA_VISIBLE_DEVICES=0,1
but it didn‘t work

Maybe i need a GPU which is more than 16GB??but i don’t know why two GPU totally 22GB did not work .oh my god.

thanks for everything.

BART_PATH= PATH-TO-BART-MODEL (./bart.large/model.pt)我改成BART_PATH= (./bart.large/model.pt)为什么bash train_single_view.sh显示train.py: error: argument --restore-file: expected one argument,请问该怎么修改呢

from multi-view-seq2seq.

PYMAQ avatar PYMAQ commented on May 27, 2024

不是这样改,应该是BART_PATH=”./bart.large/model.pt”

from multi-view-seq2seq.

chostyouwang avatar chostyouwang commented on May 27, 2024

不是这样改,应该是BART_PATH=”./bart.large/model.pt”

train_single_view.sh: line 6: /content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt: Permission denied
usage: train.py [-h] [--no-progress-bar] [--log-interval N]
[--log-format {json,none,simple,tqdm}]
[--tensorboard-logdir DIR] [--seed N] [--cpu] [--fp16]
[--memory-efficient-fp16] [--fp16-no-flatten-grads]
[--fp16-init-scale FP16_INIT_SCALE]
[--fp16-scale-window FP16_SCALE_WINDOW]
[--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
[--min-loss-scale D]
[--threshold-loss-scale THRESHOLD_LOSS_SCALE]
[--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ]
[--all-gather-list-size ALL_GATHER_LIST_SIZE] [--multi-views]
[--balance] [--lr-weight LR_WEIGHT] [--T T]
[--criterion {adaptive_loss,binary_cross_entropy,composite_loss,cross_entropy,label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,legacy_masked_lm_loss,masked_lm,nat_loss,sentence_prediction,sentence_ranking}]
[--tokenizer {moses,nltk,space}]
[--bpe {fastbpe,gpt2,bert,sentencepiece,subword_nmt}]
[--optimizer {adadelta,adafactor,adagrad,adam,adamax,lamb,nag,sgd}]
[--lr-scheduler {cosine,fixed,inverse_sqrt,polynomial_decay,reduce_lr_on_plateau,tri_stage,triangular}]
[--task TASK] [--num-workers N]
[--skip-invalid-size-inputs-valid-test] [--max-tokens N]
[--max-sentences N] [--required-batch-size-multiple N]
[--dataset-impl FORMAT] [--train-subset SPLIT]
[--valid-subset SPLIT] [--validate-interval N]
[--fixed-validation-seed N] [--disable-validation]
[--max-tokens-valid N] [--max-sentences-valid N]
[--curriculum N] [--distributed-world-size N]
[--distributed-rank DISTRIBUTED_RANK]
[--distributed-backend DISTRIBUTED_BACKEND]
[--distributed-init-method DISTRIBUTED_INIT_METHOD]
[--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID]
[--distributed-no-spawn] [--ddp-backend {c10d,no_c10d}]
[--bucket-cap-mb MB] [--fix-batches-to-gpus]
[--find-unused-parameters] [--fast-stat-sync]
[--broadcast-buffers] [--arch ARCH] [--max-epoch N]
[--max-update N] [--clip-norm NORM] [--sentence-avg]
[--update-freq N1,N2,...,N_K] [--lr LR_1,LR_2,...,LR_N]
[--min-lr LR] [--use-bmuf] [--save-dir DIR]
[--restore-file RESTORE_FILE] [--reset-dataloader]
[--reset-lr-scheduler] [--reset-meters] [--reset-optimizer]
[--optimizer-overrides DICT] [--save-interval N]
[--save-interval-updates N] [--keep-interval-updates N]
[--keep-last-epochs N] [--keep-best-checkpoints N] [--no-save]
[--no-epoch-checkpoints] [--no-last-checkpoints]
[--no-save-optimizer-state]
[--best-checkpoint-metric BEST_CHECKPOINT_METRIC]
[--maximize-best-checkpoint-metric] [--patience N]
train.py: error: argument --restore-file: expected one argument还是一样地问题,BART_PATH=”./bart.large/model.pt”后面是不是还有一个文件地址

from multi-view-seq2seq.

PYMAQ avatar PYMAQ commented on May 27, 2024

你有下载bart模型吗 要放好位置噢

from multi-view-seq2seq.

chostyouwang avatar chostyouwang commented on May 27, 2024

你有下载bart模型吗 要放好位置噢

谢谢,已经可以跑了

from multi-view-seq2seq.

chostyouwang avatar chostyouwang commented on May 27, 2024

你有下载bart模型吗 要放好位置噢

| INFO | fairseq.trainer | no existing checkpoint found ”./bart.large/model.pt”
我把下载地model.pt文件放在/content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt,为啥说找不到啊

from multi-view-seq2seq.

chostyouwang avatar chostyouwang commented on May 27, 2024

你有下载bart模型吗 要放好位置噢

而且我换成绝对路径也是找不到,,,

from multi-view-seq2seq.

chostyouwang avatar chostyouwang commented on May 27, 2024

你有下载bart模型吗 要放好位置噢

| INFO | fairseq.trainer | no existing checkpoint found ”./bart.large/model.pt”
我把下载地model.pt文件放在/content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt,为啥说找不到啊

ememem,文件路径打成中文引号了,,,

from multi-view-seq2seq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.