GithubHelp home page GithubHelp logo

w4ngatang / qags Goto Github PK

View Code? Open in Web Editor NEW
64.0 4.0 16.0 12.03 MB

Question Answering and Generation for Summarization

Python 93.34% Shell 2.32% JavaScript 3.80% Makefile 0.04% Batchfile 0.05% C++ 0.21% Lua 0.26%

qags's Introduction

qags: Question Answering and Generation for Summarization

This repo contains the code for the paper Asking and Answering Questions to Evaluate the Factual Consistency of Summaries, which appeared at ACL 2020.

Usage

To compute QAGS scores, we need to

  1. generate questions
  2. answer questions
  3. compare answers

1. Generating Questions

Extracting answer candidates

We use an answer-conditional question generation model, so we first need to extract answer candidates. Use the following command, where data_file is a text file containining an example per line and out_dir is the directory to write the processed files to. The script will produce test.txt, test_{n_ans_per_txt}.txt, test_w_{n_ans_per_txt}ans.txt in out_dir, which respectively contain the examples, the extracted answers, and the answers and examples formatted to feed into the QG model.

python qg_utils.py --command extract_ans \
                   --data_file ${data_file} \
                   --out_dir ${out_dir}

Generating questions

To generate the questions, we rely on BART finetuned on NewsQA, implemented in fairseq. A frozen version of fairseq for doing so is available in qags/fairseq. Our pretrained QG model is available here.

To generate from these models, we must first preprocess the data (tokenize and binarize) using the command: ./fairseq/scripts/aw/preprocess.sh preprocess. In the script, make sure to change dat_dir to point to the directory containing your files. The script expects dat_dir to contain test.src and test.trg, where test.src are the files that will actually be fed into the QG model to generate from; test.trg can be a dummy file with the same number of lines (e.g., a copy of test.src).

Then to generate, use command ./scripts/gen_qg.sh. Change model_path to point to the pretrained QG checkpoint, data_path to the directory containing the processed data (typically the processed directory created during preprocessing), and out_file for the file to log to. Due to a code quirk, in fairseq/fairseq/models/summerization_encoder_only.py, set HACK_PATH (line 107) to the best_pretrained_bert.pt checkpoint, located here.

Finally, extract the generated questions using

python qg_utils.py --command extract-gen \
                   --data_file ${fseq_log_file} \
                   --out_dir ${out_dir}

which will extract the generations and the corresponding probabilities respectively to gen.txt and prob.txt in out_dir.

2. Answering Questions

To prepare the QA data, use the following command:

python qa_utils.py --command format-qa-data --out_dir tmp \
                   --src_txt_file ${src_txt_file} --gen_txt_file ${gen_txt_file} \
                   --gen_qst_file ${gen_qst_file} --gen_prob_file ${gen_prob_file} 

where gen_{qst/prob}_file are generated from the previous step (gen.txt and prob.txt). {src/gen}_txt_file are respectively the source and model-generated texts (e.g. for summarization, the source articles and model-generated summaries to be evaluated). As part of this step, we filter questions by quality using a number of heuristics. Most importantly, we filter questions by enforcing answer consistency: We use a QA model to answer the generated questions, and if the predicted answer doesn't match the original answer, we throw out the question. To do this, we need to run the QA model on the generated questions, which will produce an answer file. For this step, use the flag --use_all_qsts and then run the QA model on the resulting data file.

Once you have answers for each question, we need to compare the expected and predicted answers, which we do so via the flags --use_exp_anss --gen_ans_file ${gen_ans_file} --gen_prd_file ${gen_prd_file}, where the latter two respectively contain the expected and the predicted answers.

To evaluate our QA models, use the following command to evaluate the model on pred_file and write the predictions to out_dir/out_file Our models are based on pytorch-pretrained-BERT (now transformers) and pretrained checkpoints are located here. Make sure model_dir points to the QA model directory. To compute QAGS scores, evaluate the QA model using the both the article as context and the summary as context, so you will need to run this command twice.

python finetune_pt_squad.py \
              --bert_model bert-large-uncased \
              --load_model_from_dir ${model_dir} \
              --version_2_with_negative \
              --do_lower_case \
              --do_predict \
              --predict_file ${pred_file} \
              --output_dir ${out_dir} \
              --prediction_file ${out_file} \
              --overwrite_output_dir

3. Comparing Answers

Finally, to get the actual QAGS scores, we compare answers. The following command will write the scores to out_dir/qags_scores.txt.

python qa_utils.py --command compute-qags \
                   --src-ans-file ${src_ans_file} \
                   --trg-ans-file ${trg_ans_file} \
                   --out-dir ${out_dir}

Data

The crowdsourced annotations of summary sentences we collected are available in data/mturk_{cnndm,xsum}.jsonl. Each line is an article, model-generated summary divided into sentences, and three annotations per sentence. Each annotation is a binary choice of whether or not the summary sentence is factually supported by the article, as well as an anonymized annotator ID.

For CNNDM, the summarization model is Bottom-Up Summarization (Gehrmann et al., 2017). For XSUM, the summarization model is BART finetuned on the XSUM training data.

Citation

If you use this code or data, please cite us.

@article{wang2020asking,
   title={Asking and Answering Questions to Evaluate the Factual Consistency of Summaries},
   url={http://dx.doi.org/10.18653/v1/2020.acl-main.450},
   DOI={10.18653/v1/2020.acl-main.450},
   journal={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
   publisher={Association for Computational Linguistics},
   author={Wang, Alex and Cho, Kyunghyun and Lewis, Mike},
   year={2020}
}

qags's People

Contributors

w4ngatang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

qags's Issues

Generate more questions

I used gen_qg.sh's setting but I can only generate 5 questions.
Does anyone know how to generate more questions?

Batch size while generating questions

Hi! Is it possible to change the batch size during question generation?
Whenever I pass a value different than 1 to fairseq/summerization_generate.py, I get an error.
e.g. for a batch size 10 (it's a weird value for the batch size but I was just playing around):

RuntimeError: The expanded size of the tensor (40) must match the existing size (8) at non-singleton dimension 0.
Target sizes: [40, 90].  Tensor sizes: [8, 90]

Thanks!

Missing dictionary file for preprocess.sh

Dear Alex, thanks for the great work.
When I run the command ./fairseq/scripts/aw/preprocess.sh preprocess, it raises the error of FileNotFoundError: [Errno 2] No such file or directory: 'models/fseq/dict.txt'.
I also searched this file in this repository but I cannot find it.
Could you please provide this file?
I also find that the tokenization script uses the BERT tokeniser instead of BART? I would like to ask if these two models use the same tokeniser? Thanks!

ImportError: cannot import name 'libbleu' from 'fairseq'

When executing the below script
./scripts/gen_qg.sh

I get these errors

  1. ERROR: missing libbleu.so. run 'python setup.py insitall' --> even running this command does not help.
  2. ImportError: cannot import name 'libbleu' from 'fairseq'
  • Whereas, when manually importing from python prompt, it works with below command
    'from fairseq import libbleu'

Requirements

Hello! First, thank you for your wonderful work.
I was wondering whether you can upload the requirements.txt file or just state the versions of the dependencies you're using.
Thanks!

dataset

Where to find dataset for qg_utils.py?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.