GithubHelp home page GithubHelp logo

prefixtuning's People

Contributors

xiangli1999 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prefixtuning's Issues

hyper-parameters setting in low-data scenario

Very good work!
What I'd like to ask is, What is the setting of hyper-parameters in the low-resource scenario of summarization, such as learning rate and numbers of epoch. I have tried to use prefixtuning to low-resources summarization tasks,but it seems to work not very well...

environment creation fails

Hi
I am getting this erorr, thanks for your help
Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement en-core-web-sm==2.3.1 (from -r /users/dara/seq2seq/temps/PrefixTuning/condaenv.2wv1bbj7.requirements.txt (line 26)) (from versions: none)
ERROR: No matching distribution found for en-core-web-sm==2.3.1 (from -r /users/dara/seq2seq/temps/PrefixTuning/condaenv.2wv1bbj7.requirements.txt (line 26))

failed

CondaEnvException: Pip failed

python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory

Hi,

I got the following error which says python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory when I run that command CUDA_VISIBLE_DEVICES=0 python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00008 --mode data2text --bsz 10 --seed 101 --tuning_mode prefixtune --cache_dir ./cache

Does anyone meet that issue or know how to deal with that? Thank you so much.

cat
True False
control code is  None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
 name : Zizzi | Type : pub | customer rating : average | near : Burger King <|endoftext|> Zizzi is a pub near Burger King. It has an average customer rating.  <|endoftext|>

cat
True False
control code is  None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
 name : Zizzi | Type : pub | customer rating : high | near : Burger King <|endoftext|> Zizzi is a pub near Burger King with a high customer rating.  <|endoftext|>

cat
True False
control code is  None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
 name : Zizzi | Type : pub | near : The Sorrento <|endoftext|> Zizzi is a pub near The Sorrento.  <|endoftext|>

/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam_eval
 /data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold
 /data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam
python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory

Here are my environment configuration:
Package                Version     Location
---------------------- ----------- -------------------------------------------
absl-py                0.14.1
cachetools             4.2.4
certifi                2021.10.8
charset-normalizer     2.0.7
click                  8.0.3
filelock               3.3.0
future                 0.18.2
google-auth            1.35.0
google-auth-oauthlib   0.4.6
grpcio                 1.41.0
idna                   3.3
joblib                 1.1.0
Markdown               3.3.4
nltk                   3.6.5
numpy                  1.21.2
oauthlib               3.1.1
packaging              21.0
Pillow                 8.3.2
pip                    20.0.2
pkg-resources          0.0.0
protobuf               3.18.1
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pyparsing              2.4.7
pytorch-lightning      0.9.0
PyYAML                 6.0
regex                  2021.10.8
requests               2.26.0
requests-oauthlib      1.3.0
rsa                    4.7.2
sacremoses             0.0.46
sentencepiece          0.1.96
setuptools             44.0.0
six                    1.16.0
tensorboard            2.2.0
tensorboard-plugin-wit 1.8.0
tokenizers             0.8.1rc2
torch                  1.8.0+cu111
torchvision            0.9.0+cu111
tqdm                   4.62.3
transformers           3.2.0       /data/qbao775/PrefixTuning/transformers/src
typing-extensions      3.10.0.2
urllib3                1.26.7
Werkzeug               2.0.2
wheel                  0.37.0

The evaluation code of train.bart

I noticed that the line of 572 in your seq2seq.finetune use the generate method which is implemented from bart model (huggingface). And the paramaters for generate method is input_ids. I think the self.model.generate only use the part of bartmodel which didn't contain the prompt part and the input of this method also didn't contain the prompt embeddin. And we know that the bartmodel parameters were not been updated during the prefix-tuning process.

Therefore, I am confused about how do you bring the prompt embedding to the evaluation process.

GPT 2 prefix tuning. Input data format.

Hi Lisa,

I saw your video and have read your paper. Great work.
I want to try prefix-tuning GPT2 for code summarization task and want to bring my data in right format which can be fed to the code as an input. My data has has pairs of code snippet and the corresponding summary. Can you please guide me to bring it to a right format.

Thank you,
Regards,
Manasi

Should've mentioned about "CRITICAL" modifications done in transformers source code

Thanks for public opening of your work. I really appreciate your simple yet param-effective method for tuning PLMs.

In fact, I've gone through hard time re-implementing the original experiment of yours.
Until knowing that you've modified modeling_gpt2.py / GPT2LMHeadModel.prepare_inputs_for_generation() (and maybe lil' modifications in generation_utils.py) results were truly mysterious.

The function mentioned above is necessary for making this method actually work. It preserves past_key_values passed. Otherwise, PLM will not incorporate the learned prefix embedding during the generation.

It was really painful process to track this down. You hinted about modifications of data_collators but not about generation part of the transformers which is critical part of the implementation. MehπŸ˜•.

Hope this helps the other visitors.

RuntimeError: Input, output and indices must be on the current device

Hi, I met a RuntimeError when training a prefix model. Do you have any suggestions?

Here is the evironments:
certifi (2021.5.30)
charset-normalizer (2.0.4)
click (8.0.1)
dataclasses (0.8)
filelock (3.0.12)
idna (3.2)
importlib-metadata (4.8.1)
itsdangerous (2.0.1)
Jinja2 (3.0.1)
joblib (1.0.1)
MarkupSafe (2.0.1)
nltk (3.6.2)
numpy (1.19.5)
packaging (21.0)
Pillow (8.3.2)
pip (9.0.3)
pyparsing (2.4.7)
Python-dev (2.0.0.dev0)
regex (2021.8.28)
requests (2.26.0)
sacremoses (0.0.45)
sentencepiece (0.1.96)
setuptools (39.2.0)
six (1.16.0)
tokenizers (0.8.1rc2)
torch (1.8.0+cu111)
torchvision (0.9.0+cu111)
tqdm (4.62.2)
transformers (3.2.0, /home/yanzhongxiang/PrefixTuning/transformers/src)
typing-extensions (3.10.0.2)
urllib3 (1.26.6)
Werkzeug (2.0.1)
zipp (3.5.0)

Here is the command line:
python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 --cache_dir ./cache

Here is the error information:

webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
python run_language_modeling.py         --output_dir=webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --model_type=gpt2         --model_name_or_path=gpt2-medium         --tokenizer_name=gpt2-medium         --per_device_train_batch_size 5         --per_device_eval_batch_size 5         --save_steps 500000         --num_train_epochs 5         --do_train         --train_data_file=../data/webnlg_challenge_2017/train.json         --do_eval         --line_by_line         --save_total_limit 1         --overwrite_output_dir         --task_mode webnlg         --eval_data_file=../data/webnlg_challenge_2017/dev.json          --tuning_mode prefixtune --logging_dir webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --train_embs no --optim_prefix yes --preseqlen 5 --prefix_mode activation --format_mode cat --gradient_accumulation_steps 1 --learning_rate 5e-05 --weight_decay 0.0 --seed 101 --disable_tqdm --mid_dim 512 --init_random no --use_dropout no --prefix_dropout 0.0 --objective_mode 1 --evaluate_during_training --eval_steps 5000  --cache_dir cache/gpt2-medium-s3 
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/__init__.py
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/training_args.py:299: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
  FutureWarning,
09/16/2021 10:22:04 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False
09/16/2021 10:22:04 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=True, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=5, per_device_eval_batch_size=5, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, warmup_steps=0, logging_dir='webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', logging_first_step=False, logging_steps=500, save_steps=500000, save_total_limit=1, no_cuda=False, seed=101, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=5000, dataloader_num_workers=0, past_index=-1, run_name=None, disable_tqdm=True, remove_unused_columns=True, label_names=None)
objective is 1
False
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/tokenization_utils_base.py:1324: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
  FutureWarning,
prefixtune
adapting the size of the model embedding to include [PAD]
len(tokenizer) =  50257
len(tokenizer) =  50258
<|endoftext|> 50256
<|endoftext|> 50256
loading the prefix model from  None
training the prefix model from scratch. 
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
webnlg
tgt_avg:  30.665242718446603
src_avg:  49.62568654646324
ratios:  1.6183040519881826
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
  | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> The Aarhus is the airport of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
  | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> Aarhus Airport serves the city of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]
webnlg
tgt_avg:  31.644375553587246
src_avg:  51.023914968999115
ratios:  1.6124165535386898
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220, 50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
  | Aarhus : leaderName : Jacob_Bundsgaard <|endoftext|> The leader of Aarhus is Jacob Bundsgaard. <|endoftext|>
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220]
[50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[3554, 5376]

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220, 50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
  | Aarhus_Airport : runwayLength : 2702.0 <|endoftext|> Aarhus Airport's runway length is 2702.0. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220]
[50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[23443, 24539]
FORMAT MODE IS  cat
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:309: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.
  FutureWarning,
09/16/2021 10:22:53 - WARNING - trainer_prefix -   You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:1291: FutureWarning: This method is deprecated, use `Trainer.is_world_process_zero()` instead.
  warnings.warn("This method is deprecated, use `Trainer.is_world_process_zero()` instead.", FutureWarning)
{'state': {}, 'param_groups': [{'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [0, 1, 2]}, {'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [3, 4]}]}
09/16/2021 10:22:53 - INFO - trainer_prefix -   ***** Running training *****
09/16/2021 10:22:53 - INFO - trainer_prefix -     Num examples = 18025
09/16/2021 10:22:53 - INFO - trainer_prefix -     Num Epochs = 5
09/16/2021 10:22:53 - INFO - trainer_prefix -     Instantaneous batch size per device = 5
09/16/2021 10:22:53 - INFO - trainer_prefix -     Total train batch size (w. parallel, distributed & accumulation) = 40
09/16/2021 10:22:53 - INFO - trainer_prefix -     Gradient Accumulation steps = 1
09/16/2021 10:22:53 - INFO - trainer_prefix -     Total optimization steps = 2255
Traceback (most recent call last):
  File "run_language_modeling.py", line 1159, in <module>
    main()
  File "run_language_modeling.py", line 993, in main
    trainer.train(model_path=model_path)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 811, in train
    tr_loss += self.training_step(model, inputs)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1174, in training_step
    loss = self.compute_loss(model, inputs, gpt2_model=self.gpt2)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1214, in compute_loss
    outputs = model(**inputs, gpt2_model=gpt2_model)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/train_control.py", line 327, in forward
    return_dict=return_dict, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
    return_dict=return_dict,
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 619, in forward
    inputs_embeds = self.wte(input_ids)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/sparse.py", line 147, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/functional.py", line 1913, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device

PyTorch Lightning Version?

What version of PyTorch Lightning was this built with? I followed the setup instructions to install the requirements, but I keep get errors from misnamed parameters in the seq2seq module (the gpt-2 module works fine). I can fix the errors as they come up by consulting the current PyTorch Lightning documentation (filepath in the trace should be dirpath for example), but I'd rather use the code as written instead of manually updating it.

Traceback (most recent call last):
File "finetune.py", line 876, in
main(args)
File "finetune.py", line 782, in main
checkpoint_callback=get_checkpoint_callback(args.output_dir, model.val_metric, args.save_top_k, lower_is_better), #LISA
File "/workspace/PrefixTuning/seq2seq/callbacks.py", line 105, in get_checkpoint_callback
period=0, # maybe save a checkpoint every time val is run, not just end of epoch.
TypeError: init() got an unexpected keyword argument 'filepath'

About the evaluation scripts

Hi,

Thanks for your contribution.
Could you share your evaluation scripts on WebNLG dataset? (e.g. run_eval_on_webnlg.sh)

Data issue and file location query

Is the dataset used for the GPT-2-based training portion in the code located under 'data'? Where can I find the files that start with '/u/scr/xlisali/'?

OOM error

Hi, I tried the seq2seq prefixtuning and found:

RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 15.90 GiB total capacity; 4.63 GiB already allocated; 797.50 MiB free; 5.81 GiB reserved in total by PyTorch)

I run the expr on a 16GB GPU. Am I supposed to use a 32GB GPU instead? Thanks!

About the e2e version

Hi Lisa~ I've been recently following your work on NLG task. And I'm wondering that could you provide the version and access for the data you used(for example e2e, webnlg)? since your code use some paths start with the absolute path on you disk, and the code for reading "e2e dataset" data split each line in it by "||". So it confused me to wonder whether I used the same one as yours. Thx(and forgive me if i missed something important)!

AttributeError: 'tuple' object has no attribute 'detach'

Hi,

I'm doing an experiment on this package for β€˜classify-sentiment’ tasks. I'm using the code in the folder β€˜gpt2’. I modified 'prediction_loss_only = False' in 'run_language_modeling.py' in order to get the predicted labels for the testing dataset.

Unfortunately, I get an error saying that 'AttributeError: 'tuple' object has no attribute 'detach''. The training and evaluation process are fine if 'prediction_loss_only = True'. However, I can only get the perplexity in this case.

Traceback (most recent call last):
File "run_language_modeling.py", line 1173, in <module>
main()
File "run_language_modeling.py",line 993,in main
trainer.train(model_path=model_path)
File"/work/PrefixTuning/gpt2/trainer_prefix.py",line 867, in train
metrics =self.evaluate()
File"/work/PrefixTuning/gpt2/trainer_prefixpy",line 1419,in evaluate
output =self.prediction_loop(eval_dataloader,description="Evaluation")
File"/work/PrefixTuning/gpt2/trainer_prefix.py", line 1514,in prediction_loop
loss,logits,labels =self.prediction_step(model,inputs,prediction_loss_only
File"/work/PrefixTuning/gpt2/trainer_prefix.py",line 1625,in prediction_step
logits = tuple(logit.detach() for logit in logits)
File"/work/PrefixTuning/gpt2/trainer_prefix.py",line 1625,in <genexpr>
logits = tuple(logit.detach() for logit in logits)
AttributeError: 'tuple' object has no attribute 'detach

Does anyone know why it happens? Is there any workaround for it? Thanks!

p.s.
I have tried a few suggestions like:
huggingface/transformers#7760
and
huggingface/transformers#7539
But I got no luck.

IndexError: list index out of range

Hi, Lisali. I failed to use GPT-2, which trained in prefix tuning mode on webnlg mode, to evaluate valid dataset. I use the scripts you provide here. The detailed information shows in the belowing.

Traceback (most recent call last): File "evaluation.py", line 356, in <module> read_participant(team, sys.argv[1]) File "evaluation.py", line 250, in read_participant output_reduced = [output[i-1] for i in sorted(entry_ids)] File "evaluation.py", line 250, in <listcomp> output_reduced = [output[i-1] for i in sorted(entry_ids)] IndexError: list index out of range

the command is bash evaluation/run_eval_on_webnlg.sh ~/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam >> ~/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam_eval

I have observed "sorted(entry_ids)" starts with element 0, and it's length can not match output. I read the code and find output is just the file that I provide in the command line and assure it's justification. Do I miss something? Thank you very much.

Understanding the Seq2Seq Encoder-Decoder Prefix Implementation

Hi @XiangLi1999, thank you for open-sourcing this amazing work! I have been trying to understand your seq2seq implementation:

class PrefixTuning(PretrainedBartModel):

I was wondering if you could help me with a few doubts that I had regarding the same:

  1. What does the mode_para attribute mean?
    self.mode_para = 1
  2. What is the difference between the multiple methods for getting the prompt prefixes? Which ones are used in the paper?
  3. How is the Prefix on Encoder side implemented? I saw that the use_encoder_prefix attribute was only used in one of the prompt methods: def get_prompt_p5
    if self.use_encoder_prefix:
  4. What does the use_cross_prefix attribute do?
  5. How is the Encoder side prefix being attended by the Decoder, given that we are using past_key_values to feed the prefixes?
  6. How are the past_key_values fed to the model? As per my understanding, it should contain the key-value pairs for all the preceding tokens on the decoding side. How is the encoder side prefix included in the past_key_values? https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartForConditionalGeneration.forward.past_key_values
  7. Where is the inference code implemented for the seq2seq model? (if one needs to deploy/serve it). In this case we would need a token-by-token decoding, right?

How to full train the model?

Hello, I want to fine-tune the prefix along with the whole BART model.
And I comment the freeze code in seq2seq/finetune.py#L95.
I don't know if it is right. (I see GPU usage getting bigger, that may be right)

However, when I load the model, I find only the prefix part is saved.
So, I want to know how to train, save and load the prefix and BART model.

Thank you very much!

TypeError: setup() got an unexpected keyword argument 'stage'

Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1132, in _run
self._call_setup_hook() # allow user to setup lightning_module in accelerator environment
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1432, in _call_setup_hook
self.call_hook("setup", stage=fn)
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1483, in call_hook
output = model_fx(*args, **kwargs)
TypeError: setup() got an unexpected keyword argument 'stage'

Process finished with exit code 1

About prefix tuning input

Hi, Lisa!
I read in your paper that you have tried different inputs for prefix like 'random init', 'active', 'summarize', etc.
I would like to ask in what form did you apply words to the input to the prefix? Were these tokenized words or smth else?

FileNotFoundError for e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold

Hi,

I got the following error which says FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold' when I run that command CUDA_VISIBLE_DEVICES=0 python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00008 --mode data2text --bsz 10 --seed 101 --tuning_mode prefixtune --cache_dir ./cache

Does anyone meet that issue or know how to deal with that? Thank you so much.

Training completed. Do not forget to share your model on huggingface.co/models =)


10/15/2021 20:14:10 - INFO - trainer_prefix -   Saving model checkpoint to save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
10/15/2021 20:14:11 - INFO - __main__ -   *** Evaluate ***
10/15/2021 20:14:11 - INFO - trainer_prefix -   ***** Running Evaluation *****
10/15/2021 20:14:11 - INFO - trainer_prefix -     Num examples = 42061
10/15/2021 20:14:11 - INFO - trainer_prefix -     Batch size = 10
False
False
{'eval_loss': 25.165123616772462, 'epoch': 5.0, 'total_flos': 2514722051589120, 'step': 21035}
10/15/2021 20:18:41 - INFO - __main__ -   ***** Eval results *****
10/15/2021 20:18:41 - INFO - __main__ -     perplexity = 25.165123616772462
running evaluation on  /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
suggested code:
python gen.py data2text yes valid /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no
python gen.py data2text yes test /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no
python run_generation.py         --model_type=gpt2         --length 100         --model_name_or_path=gpt2-medium         --num_return_sequences 5         --stop_token [EOS]         --tokenizer_name=/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --task_mode=data2text         --control_mode=yes --tuning_mode prefixtune --gen_dir e2e_results_conv2 --eval_dataset valid     --optim_prefix no --preseqlen 20 --prefix_mode activation  --format_mode cat  --prefixModel_name_or_path /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3
10/15/2021 20:18:42 - WARNING - __main__ -   device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "_objective_mode": 2,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "predict_special_tokens": true,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "vocab_size": 50257
}

GPT2Config {
  "_my_arg_control": true,
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "activation_function": "gelu_new",
  "architectures": [
    "PrefixTuning"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "format_mode": "cat",
  "init_random": "no",
  "init_shallow": "no",
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "lowdata": false,
  "mid_dim": 512,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "optim_prefix": true,
  "predict_special_tokens": true,
  "prefix_dropout": 0.0,
  "preseqlen": 5,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "train_weights": "no",
  "use_infix": false,
  "vocab_size": 50258
}

under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
10/15/2021 20:19:00 - INFO - __main__ -   Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='valid', format_mode='cat', fp16=False, gen_dir='e2e_results_conv2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='no', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='data2text', temperature=1.0, tokenizer_name='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
using the test path  /data/qbao775/PrefixTuning/data/e2e_data/src1_valid.txt
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold
547
Traceback (most recent call last):
  File "run_generation.py", line 1356, in <module>
    main()
  File "run_generation.py", line 825, in main
    write_e2e_corr(prompt_text_lst, prompt_text_dict, gold_dir)
  File "run_generation.py", line 360, in write_e2e_corr
    with open(corr_path, 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold'
python run_generation.py         --model_type=gpt2         --length 100         --model_name_or_path=gpt2-medium         --num_return_sequences 5         --stop_token [EOS]         --tokenizer_name=/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --task_mode=data2text         --control_mode=yes --tuning_mode prefixtune --gen_dir e2e_results_conv2 --eval_dataset test     --optim_prefix no --preseqlen 20 --prefix_mode activation  --format_mode cat  --prefixModel_name_or_path /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3
10/15/2021 20:19:02 - WARNING - __main__ -   device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "_objective_mode": 2,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "predict_special_tokens": true,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "vocab_size": 50257
}

GPT2Config {
  "_my_arg_control": true,
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "activation_function": "gelu_new",
  "architectures": [
    "PrefixTuning"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "format_mode": "cat",
  "init_random": "no",
  "init_shallow": "no",
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "lowdata": false,
  "mid_dim": 512,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "optim_prefix": true,
  "predict_special_tokens": true,
  "prefix_dropout": 0.0,
  "preseqlen": 5,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "train_weights": "no",
  "use_infix": false,
  "vocab_size": 50258
}

under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
10/15/2021 20:19:20 - INFO - __main__ -   Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='test', format_mode='cat', fp16=False, gen_dir='e2e_results_conv2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='no', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='data2text', temperature=1.0, tokenizer_name='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
using the test path  /data/qbao775/PrefixTuning/data/e2e_data/src1_test.txt
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold
630
Traceback (most recent call last):
  File "run_generation.py", line 1356, in <module>
    main()
  File "run_generation.py", line 825, in main
    write_e2e_corr(prompt_text_lst, prompt_text_dict, gold_dir)
  File "run_generation.py", line 360, in write_e2e_corr
    with open(corr_path, 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold'

Here are my environment configuration:

Package                Version     Location
---------------------- ----------- -------------------------------------------
absl-py                0.14.1
cachetools             4.2.4
certifi                2021.10.8
charset-normalizer     2.0.7
click                  8.0.3
filelock               3.3.0
future                 0.18.2
google-auth            1.35.0
google-auth-oauthlib   0.4.6
grpcio                 1.41.0
idna                   3.3
joblib                 1.1.0
Markdown               3.3.4
nltk                   3.6.5
numpy                  1.21.2
oauthlib               3.1.1
packaging              21.0
Pillow                 8.3.2
pip                    20.0.2
pkg-resources          0.0.0
protobuf               3.18.1
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pyparsing              2.4.7
pytorch-lightning      0.9.0
PyYAML                 6.0
regex                  2021.10.8
requests               2.26.0
requests-oauthlib      1.3.0
rsa                    4.7.2
sacremoses             0.0.46
sentencepiece          0.1.96
setuptools             44.0.0
six                    1.16.0
tensorboard            2.2.0
tensorboard-plugin-wit 1.8.0
tokenizers             0.8.1rc2
torch                  1.8.0+cu111
torchvision            0.9.0+cu111
tqdm                   4.62.3
transformers           3.2.0       /data/qbao775/PrefixTuning/transformers/src
typing-extensions      3.10.0.2
urllib3                1.26.7
Werkzeug               2.0.2
wheel                  0.37.0

XSUM dataset differences with original

Hello,
You shared the xsum dataset link here #2

However I see from the colab link https://worksheets.codalab.org/bundles/0x58f85171b43f4e61bf411c35faab369d
and from the hyperparameters/data directory in https://worksheets.codalab.org/bundles/0xa3f0cd3c10c7490ab508a351968cbdcf that you have used xsum_news data. When I checked xsum_news, I found that the validation file has 7,186 examples. However, the original dataset has 11,327 examples. The test set is also different with 11,333 examples in xsum_news vs. 20,418 in the original xsum.

I was wondering if you could explain the differences in eval/test dataset sizes compared to the original and perhaps provide your script for preprocessing the original xsum.

Thanks!

Applying PrefixTuning with T5ForConditionalGeneration model

Hello! I'm trying to use PrefixTuning with T5 model. After reading source codes in seq2seq, I figure that generally speaking, prefix is added to the BART model by using the parameter past_key_values.

But in T5, when the parameter past_key_values is provided together with decoder_input_ids(the ground truth while training), the forward() function will only use the last token of decoder_input_ids

if past_key_values is not None:
assert labels is None, "Decoder should not use cached key value states when training."
if decoder_input_ids is not None:
decoder_input_ids = decoder_input_ids[:, -1:]
if decoder_inputs_embeds is not None:
decoder_inputs_embeds = decoder_inputs_embeds[:, -1:]

while the BART use the full decoder_input_ids.

if labels is not None:
use_cache = False
if decoder_input_ids is None:
decoder_input_ids = shift_tokens_right(labels, self.config.pad_token_id)
outputs = self.model(
input_ids,
attention_mask=attention_mask,
decoder_input_ids=decoder_input_ids,
encoder_outputs=encoder_outputs,
decoder_attention_mask=decoder_attention_mask,
past_key_values=past_key_values,
use_cache=use_cache,
use_prefix=use_prefix,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)

However, I don't see any code handling this difference in seq2seq folder. The only codes I find about T5 are handing input ids or freezing embeddings.

Is PrefixTuning compatible with T5 model? If not, could you give some advice to make it so? Thanks a lot!

control code is not used in PrefixTuning.get_prompt()

Hi, thanks for sharing the codes.

I have tried the webnlg task and data2text task with the 'cleaned' branch. But I found that the "control_code" argument is not used in all the implementations of PrefixTuning.get_prompt(). Does this mean that different categories of webnlg dataset will use the same soft prompt? I found that there are get_prompt_p3, get_prompt_p1 and get_prompt_p4 in the master branch. Can I use them to reproduce the results of the paper?

Thanks.

Use --init_shallow_word for seq2seq model

Hi, thanks for your wonderful work!
But I have some questions about your open code. I saw that "--init_shallow_word" is used in gpt2 model(GPT2LMHeadModel), so the prev_key and prev_value can be initialized by some provided word like "summarize".

parser.add_argument('--init_shallow', type=str, default='no', help='')
parser.add_argument('--init_shallow_word', type=str, default='summarize', help='')

If I want to use this trick in seq2seq model(BartForConditionalGeneration), where and how to change your code?
I have found that directly using "get_gold_init" function didn't work.

def get_gold_init(self, gpt2, sample_input):
gpt2 = gpt2.cuda()
with torch.no_grad():
output = gpt2(sample_input.to(gpt2.device), return_dict=True, use_cache=True)
output = output.past_key_values
print(len(output), output[0].shape)
output = torch.cat(output, dim=0)
return output

It seems that BartModel forward function didn't return "past_key_values" because "use_cache" is set to False or the return format is different from GPT2LMHeadModel forward function. I didn't figure out this problem, and any reply would be helpful :) @XiangLi1999

if decoder_input_ids is None:
use_cache = False

The version of pytorch_lightning

Thank you for your open source code. I tried to run your program on the server, but the interface of pytorch_lightning has changed, so I got some errors. May I know the version of pytorch_lightning you and your team use? Thank you!

Looking forward to your reply.

About the training speed verification

Hi Lisa~ I rewrite the code refer to yours on BART based on the newest huggingface transformers, and I want to verify a thing that according to my training procedure, the speed of the prefix-training is about 60%~70% of the all parameter finetune, even when I used a very very small prefix prompt module. I want to ask for your help that: does that make sense? And where may be the bottle neck of the speed? Hope for you reply.

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

I'm sorry to disturb you, but I have tried many ways to solve out it and failed. I have successfully trained a prefix tuning model(refer to this), but a RuntimeError occurs when decoding. The command shows in the following :
python gen.py webnlg yes valid /home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no

And the Error:

python run_generation.py         --model_type=gpt2         --length 100         --model_name_or_path=gpt2-medium         --num_return_sequences 5         --stop_token [EOS]         --tokenizer_name=/home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --task_mode=webnlg         --control_mode=yes --tuning_mode prefixtune --gen_dir webNLG_results2 --eval_dataset valid     --optim_prefix yes --preseqlen 20 --prefix_mode activation  --format_mode cat  --prefixModel_name_or_path /home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3 
09/21/2021 20:13:41 - WARNING - __main__ -   device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
  "_my_arg_task_mode": "webnlg",
  "_my_arg_tune_mode": "prefixtune",
  "_objective_mode": 2,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "predict_special_tokens": true,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "vocab_size": 50257
}

GPT2Config {
  "_my_arg_control": true,
  "_my_arg_task_mode": "webnlg",
  "_my_arg_tune_mode": "prefixtune",
  "activation_function": "gelu_new",
  "architectures": [
    "PrefixTuning"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "format_mode": "cat",
  "init_random": "no",
  "init_shallow": "no",
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "lowdata": false,
  "mid_dim": 512,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "optim_prefix": true,
  "predict_special_tokens": true,
  "prefix_dropout": 0.0,
  "preseqlen": 5,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "train_weights": "no",
  "use_infix": false,
  "vocab_size": 50258
}

under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
09/21/2021 20:14:05 - INFO - __main__ -   Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='valid', format_mode='cat', fp16=False, gen_dir='webNLG_results2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='yes', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='webnlg', temperature=1.0, tokenizer_name='/home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
871 871
/home/yanzhongxiang/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam
/home/yanzhongxiang/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold
871
['Aarhus', ':', 'leaderName', ':', 'Jacob_Bundsgaard']
('leaderName',)
cat
True True
control code is  None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
Traceback (most recent call last):
  File "run_generation.py", line 1357, in <module>
    main()
  File "run_generation.py", line 1194, in main
    num_return_sequences=1,
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/generation_utils.py", line 511, in generate
    model_kwargs=model_kwargs,
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/generation_utils.py", line 700, in _generate_beam_search
    outputs = self(**model_inputs, return_dict=True)  # (batch_size * num_beams, cur_len, vocab_size)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
    return_dict=return_dict,
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 645, in forward
    output_attentions=output_attentions,
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 327, in forward
    feed_forward_hidden_states = self.mlp(self.ln_2(hidden_states))
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 267, in forward
    h = self.act(self.c_fc(x))
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_utils.py", line 1094, in forward
    x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

If additional information was needed, please contact me. Thank you.

OSError: [Errno 30] Read-only file system: '/u'

python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101
would cause the error on my local PC below. I just did the environment set up and install nothing else. Should I install something instead?

Traceback (most recent call last):
File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 355, in get_config_dict
local_files_only=local_files_only,
File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 719, in cached_path
local_files_only=local_files_only,
File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 821, in get_from_cache
os.makedirs(cache_dir, exist_ok=True)
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
[Previous line repeated 4 more times]
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 223, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/u'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_language_modeling.py", line 1159, in
main()
File "run_language_modeling.py", line 546, in main
config = AutoConfig.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir)
File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_auto.py", line 310, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 368, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'gpt2-medium'. Make sure that:

  • 'gpt2-medium' is a correct model identifier listed on 'https://huggingface.co/models'

  • or 'gpt2-medium' is the correct path to a directory containing a config.json file

Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ?

I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py

position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)

position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)

Is it OK to just make position ids in [0, seq_len) ? Since I have not found the use of position embeddings for prefix matrix.

Question: Do we need a supervised dataset with X & Y pairs?

one thing which is unclear, for prefix fine tuning do we always need labelled dataset? Say we want to do Q/A on proprietary data, would we need q/a as x/y pairs?You use webnlg data for summarisation, which does not seem to have summaries defined. Would you have any views on the kind of dataset needed for fine tuning with this method?

About the results on lowdata table2text task.

Hi Lisa,
Thanks a lot for conducting such helpful experiments on comparing prompt-based fine-tuning with prompt tuning. However, I have difficulty in reproducing your BLEU and ROUGE-L score shown in Figure 3 below. Could you please give me some clues on how to calculate BLEU in your code in lowdata setting? (I only saw perplexity)
I also re-implement your prefix prompt with activation using GPT-2(117M). However, it only achieves 15 BLEU score on WikiData. Could you please give me some insight in the lowdata prefix-tuning setting?

Thanks in advance!

notation typo in the paper

Hi, thanks for the great work!

I noticed one notation typo in the paper. It's in footnote 4 of page 5. The second P_theta should be P_theta'.

Hope this could help.

Not able to replicate XSUM results

Thanks for making your code public. I am having difficulties replicating XSUM results. I think that it could be a data processing issue. Can you provide more details on XSUM data and preprocessing.

I got the data from here:
http://bollin.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz

Then, I tried to preprocess it myself since you use a .len file in https://github.com/XiangLi1999/PrefixTuning/blob/cleaned/seq2seq/utils.py#L104-L106

self.src_file = Path(data_dir).joinpath(type_path + ".source")
self.tgt_file = Path(data_dir).joinpath(type_path + ".target")
self.len_file = Path(data_dir).joinpath(type_path + ".len")

Still, the results are significantly lower than in the paper.
Thanks

question about the initialization experiment

Hi, thanks for the great work!

In section 7.4, it conducts an initialization experiment with real words. I am just wondering, does this initialization applies to prompts in every layer? Or just the prompts in the first layer? And how does this work together with the re-parameterization method since the input dimension of re-param is much smaller?

And I also noticed that in your code, instead of directly adding prompts to the input of each layer (as described in ur paper), what u actually did is appending vectors to key value matrices directly via the past_key_values argument. Just wondering, how does the initialization experiment work in this setup/implementation? Directly initialize the key/value vectors? But seems that the dimension doesn't match?

Thanks!

How to evaluate DART ? The test set may be changed ?

Hi, Lisa!
I read your paper and you have done brilliant work. I want to use GPT to fine-tune the DART dataset. However, I don't know how to evaluate my results. The official scripts (https://github.com/Yale-LILY) provide a different test set (5,097 samples), which has different references, too.
I use your test set (12,552 samples) to do generation and evaluate its performance based on the target sentences in the test set (12, 552 samples are aligned, so for each sample, I got only 1 reference). However, I can only get BLEU about 26.28 (GPT large), much lower than yours.
Could you please answer me how to evaluate it? Thank you!

A piece of GPU can train this model?

set pytorch PYTORCH_NO_CUDA_MEMORY_CACHING=1 to close the cache and resolve the OOM, but train slowly.
Is this normal?
Epoch 0: 0%| | 3/12784 [00:07<8:32:20, 2.41s/it, loss=7.35, v_num=5]/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
saving checkpoint now
saving models now/22..
try calling the pl_module save
Epoch 0: 0%| | 3/12784 [00:07<9:08:09, 2.57s/it, loss=7.35, v_num=5]
Traceback (most recent call last):
File "/data/lirongzhen/PrefixTuning/seq2seq/finetune.py", line 878, in
main(args)
File "/data/lirongzhen/PrefixTuning/seq2seq/finetune.py", line 779, in main
trainer: pl.Trainer = generic_train(
File "/data/lirongzhen/PrefixTuning/seq2seq/lightning_base.py", line 795, in generic_train
trainer.fit(model)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 470, in fit
results = self.accelerator_backend.train()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 68, in train
results = self.train_or_test()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 69, in train_or_test
results = self.trainer.train()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 521, in train
self.train_loop.run_training_epoch()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 560, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 687, in run_training_batch
self.training_step_and_backward(
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 816, in training_step_and_backward
self.backward(result, optimizer, opt_idx)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 836, in backward
result.closure_loss = self.trainer.accelerator_backend.backward(
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 98, in backward
closure_loss = self.trainer.precision_connector.backend.backward(
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/plugins/native_amp.py", line 46, in backward
model.backward(closure_loss, optimizer, opt_idx)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1152, in backward
loss.backward(*args, **kwargs)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 47.54 GiB total capacity; 41.43 GiB already allocated; 900.75 MiB free; 44.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Data preparation step

Hi @XiangLi1999

Thanks for releasing your code. I was wondering how it is possible to download the "webnlg" dataset? I was not able to find any .json format of webnlg dataset. Could you please share your data version as well?

Best,
Mohammad

License

Hi, thanks for the great repo! Would it be possible to add a proper license statement to it? Thank you!

Possible mistake in prefix model parameter count? I am getting 15% not 2% like in the paper

Hi,

I calculated the number of parameters used in the embedding and linear layers of the prefix model from self.control_trans, self.control_trans_enc, self.control_trans2, wte, wte_enc, wte_2 and I am getting 62,1M. Since BART large is 406M, we should get 15% added parameters and not 2% like in table 2 of your paper.

I tried the following code:
sum(p.numel() for p in list(self.model.control_trans.parameters()))
which gives 20505376 or 20.5M using the hyparameters to replicate xsum results.

Here's the prefix model (Embedding is not included):

Sequential(
(0): Linear(in_features=1024, out_features=800, bias=True)
(1): Tanh()
(2): Linear(in_features=800, out_features=24576, bias=True)
)

There is such model for encoder inputs, decoder inputs and cross inputs. So, you have to multiply the 20.5M by 3 (see here: https://github.com/XiangLi1999/PrefixTuning/blob/cleaned/seq2seq/prefixTuning.py#L260-L279).

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.