GithubHelp home page GithubHelp logo

xiaoanshi / openqa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thunlp/openqa

0.0 1.0 0.0 73 KB

The source code of ACL 2018 paper "Denoising Distantly Supervised Open-Domain Question Answering".

License: MIT License

Python 100.00%

openqa's Introduction

Open-QA

The source codes for paper "Denoising Distantly Supervised Open-Domain Question Answering", which is modified based on the code of paper " Reading Wikipedia to Answer Open-Domain Questions."

Requirements

pytorch 0.3.0 numpy scikit-learn termcolor regex tqdm prettytable scipy nltk pexpect 4.2.1

Evaluation Results

Dataset Quasar-T SearchQA TrivialQA SQuAD
Models EM F1 EM F1 EM F1 EM F1
GA (Dhingra et al., 2017) 26.4 26.4 - - - -
BiDAF (Seo et al., 2017) 25.9 28.5 28.6 34.6 - - - -
AQA (Buck et al., 2017) - - 40.5 47.4 - - - -
R^3 (Wang et al., 2018a) 35.3 41.7 49 55.3 47.3 53.7 29.1 37.5
Our Model 42.2 49.3 58.8 64.5 48.7 56.3 28.7 36.6

Data

We provide Quasar-T, SearchQA and TrivialQA dataset we used for the task in data/ directory. We preprocess the original data to make it satisfy the input format of our codes, and can be download at here.

To run our code, the dataset should be put in the folder data/ using the following format:

datasets/

  • train.txt, dev.txt, test.txt: format for each line: {"question": quetion, "answers":[answer1, answer2, ...]}.

  • train.json, dev.json, test.json: format [{"question": question, "document":document1},{"question": question, "document":document2}, ...].

embeddings/

  • glove.840B.300d.txt: word vectors obtained from here.

corenlp/

  • all jar files from Stanford Corenlp.

Codes

The source codes of our models are put in the folders src/.

Train and Test

For training and test, you need to:

  1. Pre-train the paragraph reader: python main.py --batch-size 256 --model-name quasart_reader --num-epochs 10 --dataset quasart --mode reader

  2. Pre-train the paragraph selector: python main.py --batch-size 64 --model-name quasart_selector --num-epochs 10 --dataset quasart --mode selector --pretrained models/quasart_reader.mdl

  3. Train the whole model: python main.py --batch-size 32 --model-name quasart_all --num-epochs 10 --dataset quasart --mode all --pretrained models/quasart_selector.mdl

Cite

If you use the code, please cite the following paper:

  1. Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising Distantly Supervised Open-Domain Question Answering. In Proceedings of ACL. pages 1736--1745. [pdf]

Reference

  1. Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William Cohen, and Ruslan Salakhutdinov. 2017. Gated-attention readers for text comprehension. In Proceedings of ACL. pages 1832--1846.

  2. Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidirectional attention flow for machine comprehension. In Proceedings of ICLR.

  3. Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Andrea Gesmundo, Neil Houlsby, Wojciech Gajewski, and Wei Wang. 2017. Ask the right questions: Active question reformulation with reinforcement learning. arXiv preprint arXiv:1705.07830.

  4. Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang,Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. 2018. R3: Reinforced ranker-reader for open-domain question answering. In Proceedings of AAAI. pages 5981--5988.

openqa's People

Contributors

mrlyk423 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.