GithubHelp home page GithubHelp logo

data-analysis-vizamatrix / sentence-similarity Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tuzhucheng/sentence-similarity

0.0 0.0 0.0 4.22 MB

PyTorch implementations of various deep learning models for paraphrase detection, semantic similarity, and textual entailment

License: MIT License

Shell 0.36% Python 99.64%

sentence-similarity's Introduction

sentence-similarity

I plan to implement some models for sentence similarity found in the literature to reproduce and study them. They have a wide variety of application, including:

  • Paraphrase Detection: Give two sentences, are the sentences paraphrases of each other?
  • Semantic Texual Similarity: Given two sentences, how close are they in terms of semantic equivalence?
  • Natural Language Inference / Textual Entailment: Can one sentence be inferred from another sentence (the premise)?
  • Answer Selection: Given question-answer pairs, rank candidate answers based on relevance to question.

Setup

Install packages in requirements.txt.

Theignite library, currently in alpha, needs to be installed from source. See https://github.com/pytorch/ignite.

Download SpaCy English model:

python -m spacy download en

Compile trec_eval for computing MAP/MRR metrics for WikiQA dataset:

cd metrics
./get_trec_eval.sh

Running

Baseline

SICK

# Unsupervised
$ python main.py --model sif --dataset sick --unsupervised
Test Results - Epoch: 0 pearson: 0.7199 spearman: 0.5956
# Supervised
$ python main.py --model sif --dataset sick
Test Results - Epoch: 15 pearson: 0.7763 spearman: 0.6637
$ python main.py --model mpcnn --dataset sick
$ python main.py --model bimpm --dataset sick

WikiQA

$ python main.py --model sif --dataset wikiqa --epochs 15 --lr 0.001
Test Results - Epoch: 15 map: 0.6295 mrr: 0.6404
$ python main.py --model mpcnn --dataset wikiqa
$ python main.py --model bimpm --dataset wikiqa

Attribution

The English Wikipedia token frequency dataset for estimating p(w) in the baseline model is obtained from the official SIF implementation: https://github.com/PrincetonML/SIF.

sentence-similarity's People

Contributors

tuzhucheng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.