GithubHelp home page GithubHelp logo

bert-based-faqir's Introduction

bert-based-faqir

FAQ retrieval system that considers the similarity between a user’s query and a question as well as the relevance between the query and an answer. The detail is on our paper (arxiv).

Requirements

tensorflow >= 1.11.0

Usage

Data

Download the BERT repository, BERT Japanese pre-trained model, QA pairs in Amagasaki City FAQ, testset (localgovFAQ) and samples of prediction results.

./download.sh

The data structure is below.

data
├── bert : the forked repository from BERT original repository *1
├── Japanese_L-12_H-768_A-12_E-30_BPE : BERT Japanese pre-trained model 
└── localgovfaq *2
    ├── qas : QA pairs in Amagasaki City FAQ
    ├── testset_segmentation.txt : the testset for evaluation
    └── samples : the retrieval results by TSUBAKI, BERT, and hybrid model

*1 We modified the original code of BERT so that it can deal with Japanese sentences and load our FAQ retrieval format. See ku-nlp/bert to check the differences from the original code.

*2 The detail about localgovFAQ is on localgovFAQ.md.

BERT application for FAQ retrieval

Generate dataset (train/test), finetuneing and evaluate.

make -f Makefile.generate_dataset OUTPUT_DIR=/path/to/data_dir
make -f Makefile.run_classifier BERT_DATA_DIR=/path/to/data_dir \
    OUTPUT_DIR=/path/to/somewhere \
    JAPANESE=true

The result example is below.

Hit@1 : 381, 3: 524, 5 : 578, all : 784
SR@1 : 0.486, 3: 0.668, 5 : 0.737
P@1 : 0.486, 3: 0.349, 5 : 0.286
MAP : 0.550, MRR : 0.596, MDCG : 0.524

TSUBAKI + BERT

TSUBAKI (paper, github ) is an open search engine based on BM25. We can get a higher score by using both TSUBAKI and BERT.

We can evaluate the hybrid model by the following commands.

python scripts/merge_tsubaki_bert_results.py --bert data/localgovfaq/samples/bert.txt \
    --tsubaki data/localgovfaq/samples/tsubaki.txt \
    --threshold 0.3 \
    --tsubaki_ratio 10 > /path/to/resultfile.txt
python scripts/calculate_score.py --testset data/localgovfaq/testset_segmentation.txt \
    --target_qs data/localgovfaq/qas/questions_in_Amagasaki.txt \
    --target_as data/localgovfaq/qas/answers_in_Amagasaki.txt \
    --search_result /path/to/resultfile.txt | tail -n 4

In this command, the results pre-computed by TSUBAKI and BERT are used.

The result example is below.

Hit@1 : 498, 3: 611, 5 : 661, all : 784
SR@1 : 0.635, 3: 0.779, 5 : 0.843
P@1 : 0.635, 3: 0.446, 5 : 0.360
MAP : 0.660, MRR : 0.720, MDCG : 0.625

Reference

Wataru Sakata (LINE Corporation), Tomohide Shibata (Kyoto University), Ribeka Tanaka (Kyoto University) and Sadao Kurohashi (Kyoto University):
FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance,
Proceedings of SIGIR2019: 42nd Intl ACM SIGIR Conference on Research and Development in Information Retrieval, (2019.7).arxiv

bert-based-faqir's People

Contributors

sktwtrg avatar tomohideshibata avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.