Baseline method for DTA Dialogue Summarization task.
Please install all the dependency packages using the following command:
pip install -r requirements.txt
- Download dataset (passord: 7mvn) and unzip it under the
data
folder. - Download the pretrained model mbart-large-50.
- Execute command
python3 preprocess.py
to generate data for model training. This will generatetrain.jsonl
anddev.jsonl
in thedata
folder. Note that it will take few minutes. - You can execute
bash train.sh
or the following command to train an baseline model.python3 -u pipeline.py \ --do_train \ --do_eval \ --src_lang zh_CN \ --tgt_lang zh_CN \ --train_filename data/train.jsonl \ --val_filename data/dev.jsonl \ --max_src_len ${max_src_len} \ --max_tgt_len ${max_tgt_len} \ --remark ${remark} \ --pretrained_model_path ${save_dir} \ --vocab_path ${vocab_dir} \ --save_dir ${save_dir} \ --batch_size ${batch_size} \ --num_train_epochs ${iter} \ --skip_eval_epochs ${skip_iter} \ --learning_rate ${learning_rate}
- You can execute
bash test.sh
or the following command to generate dialogue summary by the model trained before. And you will get your generated results in thetest.pred
file.python3 -u pipeline.py \ --do_test \ --src_lang zh_CN \ --tgt_lang zh_CN \ --test_filename data/dev.jsonl \ --max_src_len ${max_src_len} \ --max_tgt_len ${max_tgt_len} \ --remark ${remark} \ --pretrained_model_path ${model_dir} \ --vocab_path ${vocab_dir} \ --save_dir ${save_dir} \ --batch_size ${batch_size} \ --num_train_epochs ${iter} \ --skip_eval_epochs ${skip_iter} \ --learning_rate ${learning_rate}
If you have any questions about the code, please open an issue or contact us by [email protected].
Please try to specify the problem with details so we can help you better and quicker!