GithubHelp home page GithubHelp logo

antoyang / frozenbilm Goto Github PK

View Code? Open in Web Editor NEW
153.0 4.0 23.0 98 KB

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Home Page: https://arxiv.org/abs/2206.08155

License: Apache License 2.0

Python 99.93% Shell 0.07%
multimodal-learning video-understanding vqa weakly-supervised-learning large-language-models pre-training video-question-answering videoqa vision-and-language visual-question-answering

frozenbilm's People

Contributors

antoyang avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

frozenbilm's Issues

Bad zero-shot results on TVQA

Hi, I ran the zero-shot result on TVQA dataset with the given zero-shot checkpoint frozenbilm.pth and the given TVQA video features clipvitl14.pth. I also used the microsoft/deberta-v2-xlarge checkpoint. However, I got the val acc 31.59 instead of the reported 59.7.

Error on zero-shot VQA

Hi. Thanks for providing code! I'm having the same issue as #3 on the VQA demo. I have the Microsoft deberta-v2-xlarge ( https://huggingface.co/microsoft/deberta-v2-xlarge ) downloaded from huggingface in a folder called transformers_cache. I've set the TRANSFORMERS_CACHE environment variable to point at it (if I remove this, it complains that deberta is missing, so I assume this part is correct). Do you have any idea why it might be failing?

The command I'm running is:

python demo_videoqa.py --combine_datasets msrvtt --combine_datasets_val msrvtt \ --suffix="." --max_tokens=256 --ds_factor_ff=8 --ds_factor_attn=8 \ --load=models/frozenbilm.pth --msrvtt_vocab_path=data/MSRVTT-QA/vocab.json \ --question_example question --video_example test.mp4 --device='cpu'

And the error is:

Traceback (most recent call last):
File "demo_videoqa.py", line 170, in
main(args)
File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "demo_videoqa.py", line 32, in main
tokenizer = get_tokenizer(args)
File "/user/work/tp8961/FrozenBiLM/model/init.py", line 96, in get_tokenizer
tokenizer = DebertaV2Tokenizer.from_pretrained(
File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
return cls._from_pretrained(
File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1923, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 145, in init
self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 296, in init
spm.load(vocab_file)
File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(890) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Conda Environment Setting

Hi.

Instruction says to run "pip install requirements.txt", but it is "pip install -r requirements.txt", right?

And my question is about this error;

$ pip install -r requirements.txt
ERROR: Could not find a version that satisfies the requirement clip==1.0 (from versions: 0.0.1, 0.1.0, 0.2.0)
ERROR: No matching distribution found for clip==1.0

How can I download clip==1.0?

Errors in MSRVTT-QA test set

Hi, I have found some spelling errors in the test set of MSRVTT. For example, "badmitten", "peson", "tenni". How did you handle such ground truth errors during the testing?

How2qa

What is the difference between the folowing checkpoints.

frozenbilm_how2qa.pth
frozenbilm_how2qa1p.pth
frozenbilm_how2qa10p.pth

webvid_clipvitl14_features

Hi Antoine,
Thanks for your great and open work! I was failed to find the video features of WebVid in your provided files. Could you please provide me with the download link?

[Import Error] with demo_videoqa.py

python demo_videoqa.py --combine_datasets msrvtt --combine_datasets_val msrvtt --suffix="." --max_tokens=256 --ds_factor_ff=8 --ds_factor_attn=8 --load=checkpoints/frozenbilm_msrvtt10p.pth --msrvtt_vocab_path=data/MSRVTT-QA/vocab.json --question_example "what is that dog doing?" --video_example ./angry_cute_dog.mp4

I downloaded all the data and checkpoints files. Also i downloaded transformers library from hugging face. But... plz.. check my error message..

ImportError: cannot import name 'GreedySearchOutput' from 'transformers.generation_utils'(FrozenBiLM/transformers/src/transformers/generation_utils.py)

what version of transformers library are u using?

Few-shot VideoQA training details

Hi,

Thanks for the great work and publicly available code.

Could you please share the few-shot training parameters (batch size, learning rate, etc.)? I could not reproduce the results.

Thanks in advance.

Unexpected Zero-shot Results

Hi,

I tried to evaluate the fine-tuned checkpoints provided in the repo. My environment has been correctly configured and I followed all steps up to Zero-shot VideoQA section. As I only have one GPU, I didn't use distributed inference.
Here is what I used to run the evaluation:
python videoqa.py --test --eval --combine_datasets <dataset> --combine_datasets_val <dataset> --save_dir=zs<dataset> --ds_factor_ff=8 --ds_factor_attn=8 --suffix="." --batch_size_val=32 --max_tokens=256 --load=checkpoints/frozenbilm_<dataset>.pth --<dataset>_vocab_path <data_folder>/vocab1000.json
I tried with ActivityNet-VQA and iVQA and couldn't get any expected results.
For instance, here is what got by testing on ActivityNet-VQA:

number of params: 29735424
loading from checkpoints/frozenbilm_activitynet.pth
test:  [  0/250]  eta: 0:07:27  acc: 0.0000 (0.0000)  time: 1.7891  data: 0.3052  max mem: 6485
test:  [100/250]  eta: 0:03:35  acc: 0.0000 (0.0006)  time: 1.4358  data: 0.0020  max mem: 7765
test:  [200/250]  eta: 0:01:11  acc: 0.0000 (0.0005)  time: 1.4355  data: 0.0021  max mem: 7765
test:  [249/250]  eta: 0:00:01  acc: 0.0000 (0.0006)  time: 1.4344  data: 0.0020  max mem: 7765
test: Total time: 0:05:59 (1.4361 s / it)
activitynet
test acc1:  0.06%
test acc10:  0.55%
acc motion:  0.00%
acc spatial:  0.12%
acc temporal:  0.00%
acc yesno:  0.00%
acc color:  0.57%
acc object:  0.00%
acc location:  0.00%
acc number:  0.00%
acc other:  0.00%
acc sub:  0.10%; proportion  25.25%

And results on iVQA:
number of params: 29735424

loading from checkpoints/frozenbilm_ivqa.pth
test:  [ 0/63]  eta: 0:02:40  acc: 0.0000 (0.0000)  time: 2.5405  data: 0.2846  max mem: 6485
test:  [62/63]  eta: 0:00:01  acc: 0.0000 (0.0000)  time: 1.1953  data: 0.0018  max mem: 7766
test: Total time: 0:01:16 (1.2169 s / it)
ivqa
test acc1:  0.00%
test acc10:  0.95%
acc sub:  0.00%; proportion  14.20%

Do you have any ideas on this issue?

Cheers

Problems in reproducing the code process

Hello, thank you very much for being able to share your work,!I've run into a couple of problems in trying to reproduce your work:

  1. running main.py, videoqa.py gives an error:“ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.”
  2. how to set combine_datasets and combine_datasets_val
    I hope you can take time out of your busy schedule to help me out!

Problematic Tokennizer?

Hi! I am trying zeroshot inference with the code below

DATA_DIR=data
DATASET=activitynet
DATASET_FILE=ActivityNet-QA
CKPT_PATH=checkpoints/frozenbilm_activitynet.pth

TRANSFORMERS_CACHE=/root/.cache/huggingface/transformers \
CUDA_VISIBLE_DEVICES=4,5,6,7 \
CUDA_LAUNCH_BLOCKING=1 \
python -m torch.distributed.run --nproc_per_node 4 videoqa.py --test --eval \
--combine_datasets $DATASET --combine_datasets_val $DATASET --save_dir=zs${DATASET} \
--ds_factor_ff=8 --ds_factor_attn=8 --suffix="." \
--batch_size_val=32 --max_tokens=256 --load=$CKPT_PATH \
"--${DATASET}_vocab_path"=$DATA_DIR/$DATASET_FILE/vocab1000.json \
"--${DATASET}_train_csv_path"=$DATA_DIR/$DATASET_FILE/train.json "--${DATASET}_test_csv_path"=$DATA_DIR/$DATASET_FILE/test.csv

While I encountered the issue of sentencepiece

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
ERROR:root:No token file found. Also make sure that a [prod] section with a 'token = value' assignment exists.
| distributed init (rank 0): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
Namespace(combine_datasets=['activitynet'], combine_datasets_val=['activitynet'], webvid_features_path='webvid_clipvitl14_features', webvid_train_csv_path='data/WebVid/train_captions.csv', webvid_val_csv_path='data/WebVid/val_captions.csv', lsmdc_features_path='data/LSMDC/clipvitl14.pth', lsmdc_train_csv_path='data/LSMDC/training.csv', lsmdc_val_csv_path='data/LSMDC/val.csv', lsmdc_test_csv_path='data/LSMDC/test.csv', lsmdc_vocab_path='data/LSMDC/vocab.json', lsmdc_subtitles_path='data/LSMDC/subtitles.pkl', ivqa_features_path='data/iVQA/clipvitl14.pth', ivqa_train_csv_path='data/iVQA/train.csv', ivqa_val_csv_path='data/iVQA/val.csv', ivqa_test_csv_path='data/iVQA/test.csv', ivqa_vocab_path='data/iVQA/vocab.json', ivqa_subtitles_path='data/iVQA/subtitles.pkl', msrvtt_features_path='data/MSRVTT-QA/clipvitl14.pth', msrvtt_train_csv_path='data/MSRVTT-QA/train.csv', msrvtt_val_csv_path='data/MSRVTT-QA/val.csv', msrvtt_test_csv_path='data/MSRVTT-QA/test.csv', msrvtt_vocab_path='data/MSRVTT-QA/vocab.json', msrvtt_subtitles_path='data/MSRVTT-QA/subtitles.pkl', msvd_features_path='data/MSVD-QA/clipvitl14.pth', msvd_train_csv_path='data/MSVD-QA/train.csv', msvd_val_csv_path='data/MSVD-QA/val.csv', msvd_test_csv_path='data/MSVD-QA/test.csv', msvd_vocab_path='data/MSVD-QA/vocab.json', msvd_subtitles_path='data/MSVD-QA/subtitles.pkl', activitynet_features_path='data/ActivityNet-QA/clipvitl14.pth', activitynet_train_csv_path='data/ActivityNet-QA/train.json', activitynet_val_csv_path='data/ActivityNet-QA/val.csv', activitynet_test_csv_path='data/ActivityNet-QA/test.csv', activitynet_vocab_path='data/ActivityNet-QA/vocab1000.json', activitynet_subtitles_path='data/ActivityNet-QA/subtitles.pkl', tgif_features_path='data/TGIF-QA/clipvitl14.pth', tgif_frameqa_train_csv_path='data/TGIF-QA/train_frameqa.csv', tgif_frameqa_test_csv_path='data/TGIF-QA/test_frameqa.csv', tgif_vocab_path='data/TGIF-QA/vocab.json', how2qa_features_path='data/How2QA/clipvitl14_split.pth', how2qa_train_csv_path='data/How2QA/train.csv', how2qa_val_csv_path='data/How2QA/public_val.csv', how2qa_subtitles_path='data/How2QA/subtitles.pkl', tvqa_features_path='data/TVQA/clipvitl14.pth', tvqa_train_csv_path='data/TVQA/train.csv', tvqa_val_csv_path='data/TVQA/val.csv', tvqa_test_csv_path='data/TVQA/test_public.csv', tvqa_subtitles_path='data/TVQA/subtitles.pkl', vqa_features_path='data/VQA/clipvitl14.pth', vqa_train_pkl_path='data/VQA/train_list.pkl', vqa_val_pkl_path='data/VQA/val_list.csv', vqa_vocab_path='data/VQA/vocab.json', mlm_prob=0.15, lr=0.0003, beta1=0.9, beta2=0.95, batch_size=32, batch_size_val=32, weight_decay=0, epochs=10, lr_drop=10, optimizer='adam', clip_max_norm=0.1, schedule='', fraction_warmup_steps=0.1, eval_skip=1, print_freq=100, freeze_lm=True, model_name='/root/.cache/huggingface/transformers/deberta-v2-xlarge', ds_factor_attn=8, ds_factor_ff=8, ft_ln=True, freeze_mlm=True, dropout=0.1, scratch=False, n_ans=0, freeze_last=True, test=True, save_dir='zsactivitynet', presave_dir='', device='cuda', seed=42, load='checkpoints/frozenbilm_activitynet.pth', resume=False, start_epoch=0, eval=True, num_workers=3, world_size=4, dist_url='env://', max_feats=10, features_dim=768, use_video=True, use_context=True, max_tokens=256, max_atokens=5, prefix='', suffix='.', rank=0, gpu=0, distributed=True, dist_backend='nccl')
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    main(args)
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    tokenizer = DebertaV2Tokenizer.from_pretrained(    
main(args)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
    tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    tokenizer = DebertaV2Tokenizer.from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
    spm.load(vocab_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    spm.load(vocab_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return self.LoadFromFile(model_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg): 
Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    main(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
    tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    tokenizer = DebertaV2Tokenizer.from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
Traceback (most recent call last):
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 530, in <module>
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
        spm.load(vocab_file)main(args)

  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/videoqa.py", line 266, in main
    return self.LoadFromFile(model_file)
      File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
tokenizer = get_tokenizer(args)
  File "/mnt/lustre/lychen/code/sm/FrozenBiLM/model/__init__.py", line 96, in get_tokenizer
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
    tokenizer = DebertaV2Tokenizer.from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 149, in __init__
    self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 301, in __init__
    spm.load(vocab_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1066196) of binary: /mnt/lustre/anaconda3/envs/dream/bin/python
Traceback (most recent call last):
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/run.py", line 766, in <module>
    main()
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mnt/lustre/anaconda3/envs/dream/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
videoqa.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 1066197)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 1066198)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 1066199)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-11-07_10:48:31
  host      : localhost.vm
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1066196)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

This isssue is the same as the one below. It looks like some prblem from vocab. How can we fix it?

sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] · Issue #20011 · huggingface/transformers
huggingface/transformers#20011

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.