GithubHelp home page GithubHelp logo

brightmart / nlu_sim Goto Github PK

View Code? Open in Web Editor NEW
296.0 17.0 89.0 16.75 MB

all kinds of baseline models for sentence similarity 句子对语义相似度模型

Python 99.97% Shell 0.03%
question-answering sentence-similarity nlu word2vec atec qa questions-and-answers similarity-measurement semantic-similarity

nlu_sim's People

Contributors

brightmart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlu_sim's Issues

Enhancement - vocab mashup

I got this repo working / it’s a few years old. It has the power to take a sentence / and switch out vocab based off of respective parts of speech.
https://github.com/johndpope/vocab-mashup

It seems like this functionality could be useful in this repo, in that similar sentences could be generated. Or perhaps this is already possible?

assign_pretrained_word_embedding

word2id 里面PAD_ID = 0,UNK_ID=1,所以embdding的赋值开始index应该是2才对,目前是只针PAD_ID 设置了zeros,而UNK_ID没有设置,开始index是从1开始。

some question

thanks for sharing this good work!
how much will these features improve your f1 score?

1)n-gram similiarity(blue score for n-gram=1,2,3...);

2) get length of questions, difference of length

3) how many words are same, how many words are unique

4) question 1,2 start with how/why/when(wei shen me,zenme,ruhe,weihe)

5)edit distance

6) cos similiarity using bag of words for sentence representation(combine tfidf with word embedding from word2vec,fasttext)

7) manhattan_distance,canberra_distance,minkowski_distance,euclidean_distance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.