salt-nlp / multi-view-seq2seq Goto Github PK

Source codes for the paper "Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization"

License: MIT License

Python 92.95% Jupyter Notebook 2.09% Shell 1.74% Makefile 0.03% Batchfile 0.04% C++ 0.76% Cuda 1.76% Lua 0.20% Cython 0.43%

summarization conversation seq2seq multiview

multi-view-seq2seq's Introduction

Multi-View-Seq2Seq

This repo contains codes for the following paper:

Jiaao Chen, Diyi Yang: Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization, EMNLP 2020

If you would like to refer to it, please cite the paper mentioned above.

Getting Started

These instructions will get you running the codes of Multi-View Conversation Summarization.

Requirements

Python 3.6 or higher
Pytorch >= 1.3.0
Pandas, Numpy, Pickle
rouge(https://github.com/pltrdy/rouge)
Fairseq
sentence_transformers

Code Structure

|__ data/
        |__ C99.py, C99utils.py --> C99 topic segmentation functions
        |__ Sentence_Embeddings.ipynb --> Jupyter Notebook for getting the embeddings for utterances using SentBert
        |__ Topic_Segment.ipynb --> Jupyter Notebook for getting the topic segments using C99
        |__ Stage_Segment.ipynb --> Jupyter Notebook for getting the stage segments using HMM
        |__ Read_Labels.ipynb --> Jupyter Notebook for getting the formated data for traning/evaluation
        |__ Please download the full data folder from here https://drive.google.com/file/d/1-W42dS74MuFQUKBIru6_yc2Sm7LObc7o/view?usp=sharing

|__fairseq_multi_view/ --> Source codes built on fairseq, containing the multi-view model codes
|__train_sh/
        |__*_data_bin --> Store the binarized files
        |__bpe.sh, binarize.sh --> Pre-process the data for fairseq training
        |__train_multi_view.sh, train_single_view.sh --> Train the models

Install the multi-view-fairseq

cd fairseq_multi_view

pip install --editable ./

Downloading the data

Please download the SAMSum dataset.

Pre-processing the data

The data folder you download from the above link already contains all the pre-processed files from SamSUM corpus.

Segment conversations

For your own data, first go through Sentence_Embeddings.ipynb to store all the embeddings of utterances in pickle files. Then using Topic_Segment.ipynb and Stage_Segment.ipynb to read the utterance representations and segment the conversations. You will generate the *_label.pkl, which contains the segment id for each utterance in conversations. Finally, using Read_Labels.ipynb to generate segmented data *.source and *.target for fairseq framework.

BPE preprocess:

cd train_sh

wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'

./bpe.sh

Binarize dataset:

cd train_sh

./binarize.sh

Download the pre-trained BART

Please download the pre-trained model from here. And modify the BART_PATH in ./train_single_view.sh or ./train_multi_view.sh.

Training models

These section contains instructions for training the conversation summarizationmodels.

The trained multi-view summarization models used in the paper can be downloaded here. The generated summaries on test set is in the data folder.

Note that during training, after every epoch, it will automatically evaluate on the val and test set (you might need to change the dataset path in ./fairseq_multi_view/fairseq_cli/train.py for single_view training). The best model is selected based on lower loss on val set. Also, the training is performed on one P100 GPU (or other GPU with memory >= 16G). After 6 or 7 epoches, it will get the best model and you could stop further training.

Training Single-View model

Please run ./train_single_view.sh to train the single-view models. Note that you might need to modify the data folder name.

Training Multi-View model

Please run ./train_multi_view.sh to train the Multi-view model, where it combines topic view and stage view. If you are going to combine different views, please modify the corresponding data folder name as well.

Evaluating models

An example jupyter notebook (Eval_Sum.ipynb) is provided for evaluating the model on test set.

multi-view-seq2seq's People

Contributors

Stargazers

Watchers

Forkers

jiaaoc sarwojowo createrll katherinelyx bobycv06fpm sanjarbek16 yongxin2020 zhengtongopu khankanz coreyfury neihtq junpliu yjspecial yash-garg274 stevenyzzhang

multi-view-seq2seq's Issues

The distance of score reproduction is a little less

Test {'rouge-1': {'f': 0.47346011281155474, 'p': 0.4381161507283877, 'r': 0.5239694254911272}, 'rouge-2': {'f': 0.21751781836114528, 'p': 0.21087599615939312, 'r': 0.2522172885765042}, 'rouge-l': {'f': 0.438524176324828, 'p': 0.4235244020818687, 'r': 0.4957524251398071}}
I used the BART. Large, why is it still below the 49.3% that the authors mentioned in their paper? May I ask why?

Here is my train_multi_view.sh parameter information：

TOTAL_NUM_UPDATES=5000
WARMUP_UPDATES=200
LR=3e-05
MAX_TOKENS=800
UPDATE_FREQ=32
BART_PATH='./bart.large/model.pt'

CUDA_VISIBLE_DEVICES=0 python train.py cnn_dm-bin_2
--restore-file $BART_PATH
--max-tokens $MAX_TOKENS
--task translation
--source-lang source --target-lang target
--truncate-source
--layernorm-embedding
--share-all-embeddings
--share-decoder-input-output-embed
--reset-optimizer --reset-dataloader --reset-meters
--arch bart_large
--criterion label_smoothed_cross_entropy
--label-smoothing 0.1
--dropout 0.1 --attention-dropout 0.1
--weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08
--clip-norm 0.1
--lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES
--update-freq $UPDATE_FREQ
--skip-invalid-size-inputs-valid-test
--find-unused-parameters
--ddp-backend=no_c10d
--required-batch-size-multiple 1
--no-epoch-checkpoints
--save-dir checkpoints
--lr-weight 1000
--T 0.2
--multi-views
--balance
--seed 14632

Why the segmentation of the stage view in the demo in the paper is different from the test set

This is generated from the source code：test_sent_trans_cons_label.source
| James: Hey! I’ve been thinking about you ;)
| Hannah: Oh, that’s nice ;)
| James: what are you up to? Hannah: i'm about to sleep James: I miss u I was hoping to see you Hannah: have to get up early for work tomorrow James: what about tomorrow? Hannah: to be honest i have plans for tomorrow evening James: oh ok, what about Sat then? Hannah: yeah, sure i’m available on sat James: i’ll pick you up at 8?
| Hannah: sounds good. See u then.

Is the segmentation area different from the demo in the paper?：
this is from the Table 1 from paper
| James: Hey! I’ve been thinking about you ;)
Hannah: Oh, that’s nice ;)
James: what are you up to?
| Hannah: i'm about to sleep James: I miss u I was hoping to see you
| Hannah: have to get up early for work tomorrow James: what about tomorrow? Hannah: to be honest i have plans for tomorrow evening James: oh ok, what about Sat then? Hannah: yeah, sure i’m available on sat James: i’ll pick you up at 8?
| Hannah: sounds good. See u then.

thank you！~

Could you elaborate on how to evaluate topic segmentation task?

Hi.
Thanks for sharing interesting work with the community 👍
I am trying to understand and see how combination of sentence-transformer embedding and C99 works for topic segmentation.
For example, I get 7 topic segments from running Topic_Segment.ipynb

Segment 1. A: Hi Tom, are you busy tomorrow’s afternoon?
Segment 2. B: I’m pretty sure I am. What’s up?, A: Can you go with me to the animal shelter?., B: What do you want to do?
Segment 3. A: I want to get a puppy for my son., 'B: That will make him so happy., A: Yeah, we’ve discussed it many times. I think he’s ready now., 
Segment 4. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-), A: I'll get him one of those little dogs." 
Segment 5. B: One that won't grow up too big;-), A: And eat too much;-)), B: Do you know which one he would like? 
Segment 6. A: Oh, yes, I took him there last Monday. He showed me one that he really liked., B: I bet you had to drag him away., A: He wanted to take it home right away ;-)., 
Segment 7. B: I wonder what he'll name it., A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))

Could you suggest any tip for evaluating the quality segmentation of above example?
As I understand, you decide the hyperparameter for C99 (window size 4 and std coefficient 1) with qualitative evaluation with several human. Personally, I think that "topic segmentation" is a difficult problem in which people think of the correct segmentation boundary differently. So if you elaborate on how such qualitative evaluation is done, it would be much helpful for the research community :)

Execute ./train_single_view.sh appear problem

Execute ./train_single_view.sh appear
train.py: error: argument --restore-file: expected one argument
How to solve

Unexpected key(s) in state_dict while using pre-trained model

I download pre-trained multi-view summarization model from the link you posted (https://drive.google.com/file/d/1Rhzxk1B7oaKi85Gsxr_8WcqTRx23HO-y/view?usp=sharing).

And I am trying to run Eval_Sum.py with the pre-trained model but encounter unexpected key(s) in state_dict. Below is the error message.

  File "Eval_Sum.py", line 25, in <module>
    bart = BARTModel.from_pretrained(
  File "/home1/irteam/users/geonminkim/library/fairseq/fairseq/models/bart/model.py", line 128, in from_pretrained
    x = hub_utils.from_pretrained(
  File "/home1/irteam/users/geonminkim/library/fairseq/fairseq/hub_utils.py", line 71, in from_pretrained
    models, args, task = checkpoint_utils.load_model_ensemble_and_task(
  File "/home1/irteam/users/geonminkim/library/fairseq/fairseq/checkpoint_utils.py", line 307, in load_model_ensemble_and_task
    model.load_state_dict(state["model"], strict=strict, model_cfg=cfg.model)
  File "/home1/irteam/users/geonminkim/library/fairseq/fairseq/models/fairseq_model.py", line 115, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home1/irteam/geonminkim/anaconda3/envs/gm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for BARTModel:
	Unexpected key(s) in state_dict: "section_positions.weight", "section_layernorm_embedding.weight", "section_layernorm_embedding.bias", "section.weight_ih_l0", "section.weight_hh_l0", "section.bias_ih_l0", "section.bias_hh_l0", "w_proj_layer_norm.weight", "w_proj_layer_norm.bias", "w_proj.weight", "w_proj.bias", "w_context_vector.weight".

Could you suggest any solution?

Error(s) in loading state_dict for BARTModel

I have the same problem with #6. I installed fairseq as you suggested. There are required two paths for pretrained model, I defined them as follows:

checkpoint_file: I downloaded your model from here.
data_name_or_path: "Multi-View-Seq2Seq/train_sh/cnn_dm-bin_2/" (It is already in the repo)

Could you suggest any solution?

Link to dataset expired

It seems that link to dataset does not exist. Could anyone help out? Thanks

GT-SALT/Multi-View-Seq2Seq, Please help with data format.

Hi,
Thank you for great work.
When I run "train_multi_view.sh", I encountered this error,

epoch 001 | valid on 'valid' subset | loss 5.318 | nll_loss 3.057 | ppl 8.324 | wps 3513.7 | wpb 524.6 | bsz 20 | num_updates 24
here bpe NONE
here!
Test on val set:
Test on val set:
Traceback (most recent call last):
File "train.py", line 11, in
cli_main()
File "/export/liujunpeng/code/Multi-View-Seq2Seq/fairseq_multi_view/fairseq_cli/train.py", line 435, in cli_main
nprocs=args.distributed_world_size,
File "/export/liujunpeng/anaconda3/envs/multivie/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/export/liujunpeng/anaconda3/envs/multivie/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/export/liujunpeng/anaconda3/envs/multivie/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 2 terminated with the following error:
Traceback (most recent call last):
File "/export/liujunpeng/anaconda3/envs/multivie/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/export/liujunpeng/code/Multi-View-Seq2Seq/fairseq_multi_view/fairseq_cli/train.py", line 401, in distributed_main
main(args, init_distributed=True)
File "/export/liujunpeng/code/Multi-View-Seq2Seq/fairseq_multi_view/fairseq_cli/train.py", line 135, in main
s1 = source.readlines()
File "/export/liujunpeng/anaconda3/envs/multivie/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 35: ordinal not in range(128)

I checked the eval-set file "../data/val_sent_trans_cons_label.source", which looks like as below:

that is "[34m~ " at the position 35, could help me with this? Thanks!

RuntimeError: result type Float can't be cast to the desired output type Long

count = 1
bsz = 8
with open('/content/drive/MyDrive/bishe/Multi-View-Seq2Seq/data/test_sent_trans_cons_label.source') as source, open('/content/drive/MyDrive/bishe/Multi-View-Seq2Seq/data/test_sent_c99_label.source') as source2, open('/content/drive/MyDrive/bishe/Multi-View-Seq2Seq/data/test_best_multi_attn.hypo', 'wt', encoding='utf-8') as fout:
s1 = source.readlines()
s2 = source2.readlines()

slines = [s1[0].strip()]
slines2 = [s2[0].strip()]

for i in tqdm(range(1, len(s1))):
    if count % bsz == 0:
        with torch.no_grad():
            hypotheses_batch = bart.sample(slines, sentences2 = slines2, balance = True, beam=4, lenpen=2.0, max_len_b=100, min_len=5, no_repeat_ngram_size=3)
           
        for hypothesis in hypotheses_batch:
            fout.write(hypothesis + '\n')
            fout.flush()
        slines = []
        slines2 = []

    slines.append(s1[i].strip())
    slines2.append(s2[i].strip())

    count += 1

if slines != []:

    hypotheses_batch = bart.sample(slines, sentences2 = slines2, balance = True, beam=4, lenpen=2.0, max_len_b=100, min_len=5, no_repeat_ngram_size=3)
   

    for hypothesis in hypotheses_batch:
        fout.write(hypothesis + '\n')
        fout.flush()

RuntimeError Traceback (most recent call last)
in ()
11 if count % bsz == 0:
12 with torch.no_grad():
---> 13 hypotheses_batch = bart.sample(slines, sentences2 = slines2, balance = True, beam=4, lenpen=2.0, max_len_b=100, min_len=5, no_repeat_ngram_size=3)
14
15 for hypothesis in hypotheses_batch:
/content/drive/MyDrive/bishe/Multi-View-Seq2Seq/fairseq_multi_view/fairseq/search.py in step(self, step, lprobs, scores)
79 out=(self.scores_buf, self.indices_buf),
80 )
---> 81 torch.floor_divide(self.indices_buf, vocab_size, out=self.beams_buf)
82 self.indices_buf.fmod_(vocab_size)
83 return self.scores_buf, self.indices_buf, self.beams_buf

RuntimeError: result type Float can't be cast to the desired output type Long

unintended bug on sep (Read_Labels.ipynb)

Hello, I'm leadawon. I'm a beginner in nlp.

It's been a long time since you published your paper. I think it is great research!

I think I found a bug, which baffled me.

I ran Read_Labels.ipynb to create *.source. When executing

cons, sum = concat_conversion(data, labels, label_type)

within the

transform_format(prefix, label_type = '_sent_c99_label') function

, the label_type goes into sep field in concat_conversion(data, labels, sep=' |', label_type = '_sent_c99_label') function. If you think this is a problem, I would appreciate it if you could fix it. I'm always rooting for you!

my sol.
cons, sums = concat_conversation(data, labels, label_type) -> cons, sums = concat_conversation(data, labels, " | " ,label_type)

Unable to replicate results reported in the paper?

I've tried running your code from this repo but couldn't replicate the results that you report in the paper. For example, I don't achieve the best model at around 7 epochs as you say. The best model that I got performed significantly worse than your reported results. I only get to around 0.26 ROUGE1. Do you have any ideas about why this might be? Which version of Pytorch have you used? I'm using Pytorch 1.4 and the preprocessed data that you included in the repo. See below for the log for the single view model.

epoch 016 | loss 6.261 | nll_loss 4.916 | ppl 30.193 | wps 234.3 | ups 0.06 | wpb 4165.4 | bsz 158.3 | num_updates 1488 | lr 2.195e-05 | gnorm 2.165 | clip 100 | oom 0 | train_wall 1064 | wall 25366
epoch 016 | valid on 'valid' subset | loss 7.379 | nll_loss 6.115 | ppl 69.293 | wps 1017.4 | wpb 132.8 | bsz 5 | num_updates 1488 | best_loss 7.379
here bpe NONE
here!
Test on val set: 
100% 817/817 [02:35<00:00,  5.27it/s]
Val {'rouge-1': {'f': 0.26769580254553177, 'p': 0.30399684645069164, 'r': 0.26228723498609796}, 'rouge-2': {'f': 0.07173007955995553, 'p': 0.08290470011345255, 'r': 0.07046128497657979}, 'rouge-l': {'f': 0.264904383518601, 'p': 0.3149244870641518, 'r': 0.24536414711376517}}
2020-10-30 05:17:30 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_stage/checkpoint_best.pt (epoch 16 @ 1488 updates, score 7.379) (writing took 236.98618674099998 seconds)
Test on testing set: 
100% 818/818 [02:42<00:00,  5.03it/s]
Test {'rouge-1': {'f': 0.2707510254925983, 'p': 0.30304375457878013, 'r': 0.27045976455175946}, 'rouge-2': {'f': 0.07069378120884638, 'p': 0.08043789892742863, 'r': 0.07085366466696506}, 'rouge-l': {'f': 0.26921047464426007, 'p': 0.3131869557940146, 'r': 0.25498452981146014}}

The generation results

Can the author make the generation results public? We want to use the pyrouge metric to evaluate our result, and for a fair comparison, we also need to evaluate the result of MV-BART.
Thanks a lot!

How Do I run this model on an 11 GB 1080ti?

For example, can I lower a certain parameter ?thank you！

Error while reproducing results

already installed fairseq, but it’s still a problem：ModuleNotFoundError: No module named 'fairseq.logging'

It's been stuck for days！Already installed fairseq, but it’s still a problem：ModuleNotFoundError: No module named 'fairseq.logging' ，that is log！
thank you！！！

Traceback (most recent call last):
File "train.py", line 11, in
cli_main()
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq_cli/train.py", line 451, in cli_main
main(args)
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq_cli/train.py", line 89, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq/checkpoint_utils.py", line 133, in load_checkpoint
reset_meters=args.reset_meters,
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq/trainer.py", line 269, in load_checkpoint
state = checkpoint_utils.load_checkpoint_to_cpu(filename)
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq/checkpoint_utils.py", line 165, in load_checkpoint_to_cpu
f, map_location=lambda s, l: default_restore_location(s, "cpu")
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
ModuleNotFoundError: No module named 'fairseq.logging'

2021-03-25 15:34:40 | INFO | fairseq_cli.train | Namespace(T=1.0, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.1, balance=True, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin_2', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format=None, log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=1000.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=True, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.large/model.pt', save_dir='checkpoints', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=5000, train_subset='train', truncate_source=True, update_freq=[32], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=200, weight_decay=0.01)
2021-03-25 15:34:41 | INFO | fairseq.tasks.translation | [source] dictionary: 50264 types
2021-03-25 15:34:41 | INFO | fairseq.tasks.translation | [target] dictionary: 50264 types
2021-03-25 15:34:41 | INFO | fairseq.data.data_utils | loaded 3685 examples from: cnn_dm-bin_2/valid.source-target.source
2021-03-25 15:34:41 | INFO | fairseq.data.data_utils | loaded 3685 examples from: cnn_dm-bin/valid.source-target.source
2021-03-25 15:34:41 | INFO | fairseq.data.data_utils | loaded 3685 examples from: cnn_dm-bin_2/valid.source-target.target
2021-03-25 15:34:41 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 valid source-target 3685 examples
!!! 3685 3685
2021-03-25 15:34:53 | INFO | fairseq_cli.train | BARTModel(
(encoder): TransformerEncoder(
(embed_tokens): Embedding(50264, 1024, padding_idx=1)
(embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(6): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(7): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(8): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(9): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(10): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(11): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(decoder): TransformerDecoder(
(embed_tokens): Embedding(50264, 1024, padding_idx=1)
(embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(6): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(7): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(8): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(9): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(10): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(11): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(classification_heads): ModuleDict()
(section_positions): LearnedPositionalEmbedding(1025, 1024, padding_idx=0)
(section_layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(section): LSTM(1024, 1024)
(w_proj_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(w_proj): Linear(in_features=1024, out_features=1024, bias=True)
(w_context_vector): Linear(in_features=1024, out_features=1, bias=False)
(softmax): Softmax(dim=1)
)
2021-03-25 15:34:53 | INFO | fairseq_cli.train | model bart_large, criterion LabelSmoothedCrossEntropyCriterion
2021-03-25 15:34:53 | INFO | fairseq_cli.train | num. model params: 416791552 (num. trained: 416791552)
2021-03-25 15:34:58 | INFO | fairseq_cli.train | training on 1 GPUs
2021-03-25 15:34:58 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None
Namespace(T=1.0, activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.1, balance=True, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin_2', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source=True, left_pad_target=False, load_alignments=False, log_format=None, log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=1000.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=True, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.large/model.pt', save_dir='checkpoints', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=5000, train_subset='train', truncate_source=True, update_freq=[32], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=200, weight_decay=0.01) <fairseq.trainer.Trainer object at 0x7f8c143bccf8>
Traceback (most recent call last):
File "train.py", line 11, in
cli_main()
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq_cli/train.py", line 451, in cli_main
main(args)
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq_cli/train.py", line 89, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq/checkpoint_utils.py", line 133, in load_checkpoint
reset_meters=args.reset_meters,
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq/trainer.py", line 269, in load_checkpoint
state = checkpoint_utils.load_checkpoint_to_cpu(filename)
File "/tf/pym/Multi-View-Seq2Seq-master/fairseq_multi_view/fairseq/checkpoint_utils.py", line 165, in load_checkpoint_to_cpu
f, map_location=lambda s, l: default_restore_location(s, "cpu")
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
ModuleNotFoundError: No module named 'fairseq.logging'

A tiny question on the code

line 265 in "Multi-View-Seq2Seq/fairseq_multi_view/fairseq/data/language_pair_dataset.py" :
'''
if self.src2 and self.src2[index][-2] == eos:
'''

Why "-2" instead of "-1" as in the line 262? Thanks a lot!

A variable in the paper

What does the variable “A^k” mean in the transformer structure? Does it mean the attention matrix in different views?
In section 3.2 "Then the multi-head attention is performed over conversation tokens h^k{i:j} from different views k and form A^k separately."_

minor bug fixes

Fair sequence bug
torch.div(self.indices_buf, vocab_size, out=self.beams_buf) gives error, it should be replaced by torch.floor_divide(self.indices_buf, vocab_size, out=self.beams_buf)
File naming error:
in data/stage_segmentation notebook file names generated is _sent_trans_cons_label_2.pkl but it is expected as _sent_trans_cons_label.pkl by other part of codes.
Synatx error
in data/read_labels notebook line 33 has missing : in if condition.

Can not reproduce report result about MultiView BART

I want to reproduce the result in paper. But result still lower than paper's result.
Any suggestion or solution greatly appreciated~~~~~~~~

And i do with follow step

Clone this repo
Conda create and pip install using the command in README
download data from the link in README. (the data.tar.gz in Google Drive)
using bpe.sh and binary.sh in train_sh
using train_multi_view.sh in train_sh ( I have downloaded BART pretrain model and specifiyed in training script)
observed result remain a distance from paper result

Following is my train.log
`2021-05-30 16:31:37 | INFO | fairseq_cli.train | model bart_large, criterion LabelSmoothedCrossEntropyCriterion
2021-05-30 16:31:37 | INFO | fairseq_cli.train | num. model params: 416791552 (num. trained: 416791552)
2021-05-30 16:31:44 | INFO | fairseq_cli.train | training on 1 GPUs
2021-05-30 16:31:44 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None
2021-05-30 16:31:47 | INFO | fairseq.trainer | loaded checkpoint /home/data_ti4_c/gengx/PGN/DialogueSum/bart.large/bart.large/model.pt (epoch 41 @ 0 updates)
group1:
511
group2:
12
2021-05-30 16:31:47 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
here schedule!
2021-05-30 16:31:47 | INFO | fairseq.trainer | loading train data for epoch 0
2021-05-30 16:31:47 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.source
2021-05-30 16:31:47 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.source
2021-05-30 16:31:47 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.target
2021-05-30 16:31:47 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 train source-target 14731 examples
!!! 14731 14731
2021-05-30 16:31:48 | WARNING | fairseq.data.data_utils | 5 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269]
True
2021-05-30 16:43:49 | INFO | train | epoch 001 | loss 5.35 | nll_loss 3.418 | ppl 10.686 | wps 549.2 | ups 0.13 | wpb 4165.4 | bsz 158.3 | num_updates 93 | lr 1.395e-05 | gnorm 30.101 | clip 100 | oom 0 | train_wall 706 | wall 725
2021-05-30 16:44:02 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 4.067 | nll_loss 2.182 | ppl 4.537 | wps 1721.8 | wpb 132.8 | bsz 5 | num_updates 93
100%|██████████| 817/817 [02:32<00:00, 5.35it/s]here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.466633180309513, 'p': 0.49140138382586446, 'r': 0.48556837413794035}, 'rouge-2': {'f': 0.2283604408486965, 'p': 0.23967396780627975, 'r': 0.2406360296133875}, 'rouge-l': {'f': 0.45239921360854707, 'p': 0.4768419669949866, 'r': 0.46298214107054253}}
2021-05-30 16:46:51 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_best.pt (epoch 1 @ 93 updates, score 4.067) (writing took 13.230378480977379 seconds)

100%|██████████| 818/818 [02:34<00:00, 5.30it/s]Test on testing set:
Test {'rouge-1': {'f': 0.46401575684701973, 'p': 0.4876149230960775, 'r': 0.4856753031382108}, 'rouge-2': {'f': 0.22558266819086983, 'p': 0.23804809718663697, 'r': 0.2380510356102369}, 'rouge-l': {'f': 0.4507830089574146, 'p': 0.47369243895761404, 'r': 0.463735231898608}}

2021-05-30 17:01:12 | INFO | train | epoch 002 | loss 4.071 | nll_loss 2.233 | ppl 4.702 | wps 371.3 | ups 0.09 | wpb 4165.4 | bsz 158.3 | num_updates 186 | lr 2.79e-05 | gnorm 3.805 | clip 100 | oom 0 | train_wall 690 | wall 1768
2021-05-30 17:01:25 | INFO | valid | epoch 002 | valid on 'valid' subset | loss 3.943 | nll_loss 2.093 | ppl 4.267 | wps 1714.4 | wpb 132.8 | bsz 5 | num_updates 186 | best_loss 3.943
100%|██████████| 817/817 [02:40<00:00, 5.10it/s]here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.48628543166111304, 'p': 0.4849894455764677, 'r': 0.5314782969117535}, 'rouge-2': {'f': 0.24927341750101978, 'p': 0.24701375251072502, 'r': 0.2760416390474299}, 'rouge-l': {'f': 0.47186839505679423, 'p': 0.47350846043003486, 'r': 0.5050869486974158}}
2021-05-30 17:04:31 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_best.pt (epoch 2 @ 186 updates, score 3.943) (writing took 23.186852996994276 seconds)

100%|██████████| 818/818 [02:43<00:00, 5.00it/s]Test on testing set:
Test {'rouge-1': {'f': 0.4786341588084106, 'p': 0.4786291670320602, 'r': 0.5265071766157988}, 'rouge-2': {'f': 0.2388174902966539, 'p': 0.23853517761149043, 'r': 0.2657237453293394}, 'rouge-l': {'f': 0.46179904611032535, 'p': 0.4642844043532549, 'r': 0.496457510321212}}

2021-05-30 17:18:59 | INFO | train | epoch 003 | loss 3.863 | nll_loss 2.03 | ppl 4.083 | wps 363.2 | ups 0.09 | wpb 4165.4 | bsz 158.3 | num_updates 279 | lr 2.95062e-05 | gnorm 3.972 | clip 100 | oom 0 | train_wall 684 | wall 2835
2021-05-30 17:19:12 | INFO | valid | epoch 003 | valid on 'valid' subset | loss 3.886 | nll_loss 2.05 | ppl 4.141 | wps 1659.8 | wpb 132.8 | bsz 5 | num_updates 279 | best_loss 3.886
100%|██████████| 817/817 [02:32<00:00, 5.36it/s]here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.48681990701428274, 'p': 0.5021639425012622, 'r': 0.514617894002924}, 'rouge-2': {'f': 0.25267840837027383, 'p': 0.2601730865438312, 'r': 0.2694340002654152}, 'rouge-l': {'f': 0.47263410942856693, 'p': 0.48723857579759916, 'r': 0.4928951658275421}}
2021-05-30 17:22:09 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_best.pt (epoch 3 @ 279 updates, score 3.886) (writing took 21.174986565019935 seconds)

100%|██████████| 818/818 [02:33<00:00, 5.32it/s]Test on testing set:
Test {'rouge-1': {'f': 0.48538106530910036, 'p': 0.5043341635928553, 'r': 0.5140378691028713}, 'rouge-2': {'f': 0.24471431210883865, 'p': 0.2551209404134376, 'r': 0.2601614283339945}, 'rouge-l': {'f': 0.468263423938775, 'p': 0.4845770907657205, 'r': 0.4894307801054096}}

2021-05-30 17:36:32 | INFO | train | epoch 004 | loss 3.672 | nll_loss 1.826 | ppl 3.545 | wps 367.9 | ups 0.09 | wpb 4165.4 | bsz 158.3 | num_updates 372 | lr 2.8925e-05 | gnorm 2.403 | clip 100 | oom 0 | train_wall 692 | wall 3888
2021-05-30 17:36:45 | INFO | valid | epoch 004 | valid on 'valid' subset | loss 3.866 | nll_loss 2.047 | ppl 4.133 | wps 1719.5 | wpb 132.8 | bsz 5 | num_updates 372 | best_loss 3.866
100%|██████████| 817/817 [02:36<00:00, 5.23it/s]here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4827315494753739, 'p': 0.50526373048346, 'r': 0.5041134859110057}, 'rouge-2': {'f': 0.24840728492515798, 'p': 0.2592307191193046, 'r': 0.2622010621533662}, 'rouge-l': {'f': 0.4675137179138125, 'p': 0.4863900933656205, 'r': 0.4838663279210453}}
2021-05-30 17:39:46 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_best.pt (epoch 4 @ 372 updates, score 3.866) (writing took 22.199564302980434 seconds)

100%|██████████| 818/818 [02:34<00:00, 5.31it/s]Test on testing set:
Test {'rouge-1': {'f': 0.47986946672852465, 'p': 0.505675068662463, 'r': 0.5010302542324921}, 'rouge-2': {'f': 0.24667273240725318, 'p': 0.2601824293731443, 'r': 0.2595407303752472}, 'rouge-l': {'f': 0.4649056998973506, 'p': 0.4862129586100383, 'r': 0.4809156933510188}}

2021-05-30 17:54:06 | INFO | train | epoch 005 | loss 3.507 | nll_loss 1.646 | ppl 3.131 | wps 367.7 | ups 0.09 | wpb 4165.4 | bsz 158.3 | num_updates 465 | lr 2.83438e-05 | gnorm 2.009 | clip 100 | oom 0 | train_wall 687 | wall 4941
2021-05-30 17:54:18 | INFO | valid | epoch 005 | valid on 'valid' subset | loss 3.882 | nll_loss 2.059 | ppl 4.167 | wps 1719.8 | wpb 132.8 | bsz 5 | num_updates 465 | best_loss 3.866
100%|██████████| 817/817 [02:58<00:00, 4.59it/s]here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4843439334064069, 'p': 0.4676299223115565, 'r': 0.5524586002963356}, 'rouge-2': {'f': 0.24974488509752005, 'p': 0.24019083936995303, 'r': 0.28883689957033615}, 'rouge-l': {'f': 0.46841726367851844, 'p': 0.4529545727532749, 'r': 0.5246822708277491}}
2021-05-30 17:57:33 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_last.pt (epoch 5 @ 465 updates, score 3.882) (writing took 13.954715799016412 seconds)

100%|██████████| 818/818 [02:54<00:00, 4.68it/s]Test on testing set:
Test {'rouge-1': {'f': 0.4831880715673854, 'p': 0.4698315545550697, 'r': 0.5481711003208287}, 'rouge-2': {'f': 0.24967258791161379, 'p': 0.24298097108018568, 'r': 0.28566565132721744}, 'rouge-l': {'f': 0.47041083526959915, 'p': 0.45832861492381066, 'r': 0.5230912021841242}}

(base) gengx@v100-13:~/Multi-View-Seq2Seq-master/Multi-View-Seq2Seq-master/train_sh$
03752472}, 'rouge-l': {'f': 0.4649056998973506, 'p': 0.4862129586100383, 'r': 0.4809156933510188}}
`

I encountered a little problem when I Training Single-View model. After deploying all the resources, I found that I could not run this program. Could you please help me check it? thank you

I encountered a little problem. After deploying all the resources, I found that I could not run this program. Could you please help me check it? thank you

when I Training Single-View model，and then it come to my eyes.

./train_single_view.sh: line 6: syntax error near unexpected token (' ./train_single_view.sh: line 6: BART_PATH= PATH-TO-BART-MODEL (./bart.large/model.pt)'

Issue with unrecognized arguments when running train_multi_view.sh

When I run ./train_multi_view.sh, I get the following error:

train.py: error: unrecognized arguments: --lr-weight 1000 --T 0.2 --multi-views --balance

Running ./train_single_view.sh works fine. Wondering if there's a fix or something I can do?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble