GithubHelp home page GithubHelp logo

strategist922 / denspi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from seominjoon/denspi

0.0 2.0 0.0 1.12 MB

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)

Home Page: https://nlp.cs.washington.edu/denspi

License: Apache License 2.0

Python 95.88% Shell 0.52% Dockerfile 0.38% CSS 0.15% HTML 3.08%

denspi's Introduction

Dense-Sparse Phrase Index (DenSPI)

This is the official code for Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, to appear at ACL 2019. Check out our Live Demo.

teaser

BibTeX:

@inproceedings{denspi,
  title={Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index},
  author={Seo, Minjoon and Lee, Jinhyuk and Kwiatkowski, Tom and Parikh, Ankur P and Farhadi, Ali and Hajishirzi, Hannaneh},
  booktitle={ACL},
  year={2019}
}

While the entire codebase is here, please understand that it still requires substantial work on documentation. As of now, we only have instructions for hosting your own demo with the pre-dumped index and pre-trained model that we provide. Please stay tuned for the full documentation including how to start from scratch (though you are more than welcome to look into our undocumented code).

Demo

This section will let you host the demo that looks like

demo

on your machine. You can also try it out here. You will need to download ~1.5 TB of files, but once you have them, it will take less than a minute to start serving.

Prerequisites

A. Hardware

  • CPUs: at least 4 cores recommended.
  • RAM: at least 32GB needed.
  • Storage: at least 2TB of SSD needed.
  • GPUs: not needed.

If you are using Google Cloud (our demo is also being hosted on Google Cloud, with 24 vCPUs, 128 GB RAM, and 6 local SSDs), we highly recommend using local SSD, which is not only cheaper but also better for low-latency applications (at the cost of persistency).

B. Environment

We highly recommend Conda environment, since faiss cannot be installed with pip. Note that we have two requirements.txt files: one in this directory, and one in open subfolder. This directory's file is for hosting a (PyTorch-based) server that maps the input question to a vector. open's file is for hosting the search server and the demo itself. In this tutorial, we will simply install both in the same environment.

  1. Make sure you are using python=3.6 through Conda.
  2. First, manually install faiss with conda:
conda install faiss-cpu=1.5.2 -c pytorch
  1. Before installing with pip, make sure that you have installed DrQA. Visit here for instructions.
  2. Then install both requirement files:
pip install -r requirements.txt
pip install -r open/requirements.txt

Note that this will give you an error if you don't have faiss and DrQA already installed.

C. Download

Model and dump files are currently provided through Google Cloud Storage under bucket denspi, so first make sure that you have installed gsutil (link). You will then need to download four directories.

  1. Create $ROOT_DIR and cd to it:
mkdir $ROOT_DIR; cd $ROOT_DIR
  1. You will need the model files.
gsutil cp -r gs://denspi/v1-0/model .
  1. You will need BERT-related files.
gsutil cp -r gs://denspi/v1-0/bert .
  1. You will need tfidf-related information from DrQA.
gsutil cp -r gs://denspi/v1-0/wikipedia .
  1. You will need to download the entire phrase index dump. Warning: this will take up 1.5 TB!
gsutil cp -r gs://denspi/v1-0/dump .

You can also choose to download all at once via

gsutil cp -r gs://denspi/v1-0 $ROOT_DIR

Run Demo

Serve API on port $API_PORT:

python run_piqa.py --do_serve --load_dir $ROOT_DIR/model --metadata_dir $ROOT_DIR/bert --do_load --parallel --port $API_PORT

This lets you to perform GET request on $API_PORT to obtain the embedding of the question in json (list) format.

Serve the demo on $DEMO_PORT:

cd open/
python run_demo.py $ROOT_DIR/dump $ROOT_DIR/wikipedia --api_port $API_PORT --port $DEMO_PORT

Demo will be served in ~1 minute.

Acknowledgment

Our code makes a heavy use of faiss, DrQA and BERT, in particular, Huggingface's PyTorch implementation. We thank them for open-sourcing these projects!

denspi's People

Contributors

seominjoon avatar jhyuklee avatar

Watchers

 avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.