brightmart / nlu_sim Goto Github PK

all kinds of baseline models for sentence similarity 句子对语义相似度模型

Python 99.97% Shell 0.03%

question-answering sentence-similarity nlu word2vec atec qa questions-and-answers similarity-measurement semantic-similarity

nlu_sim's People

Contributors

Stargazers

Watchers

Forkers

zdstandup lrisliu 1oscar youed michael-wzhu dfenglei 791879258 amandalmia14 lwhzju sduchh suntofly maybefeicun jokerdu huibibi guhay zsunpku yorick76ee zldeng havesupper 52nlp newenglandml tianke0711 sunnymarkliu ben2017a yorkchu1995 s4sarath tsingcoo ramananm jkhlot ringwraith ramonyeung jobqiu inistlwq johndpope jllan ghiblifield izarek hans208 ywl0911 smallfresh lrpopeyou cecgreg shubhampachori12110095 delaiahz drjzhou mzdu wsgan001 redinton wurentidai topdreamer zhongyunuestc padadox stevenlol liutong-cnu zzdgit asirem16 wushicanasl meccy yuanjie-ai jennfer0808 fan2048 xingchengxu lc222 chunlinx yangshuodelove saurabhkulkarni77 navpreetsamra andrew05200 zddllc lvcheer decmaker strategist922 bytearchive anoop2019 frostjsy tricoffee luweishuang cchengz sumepr lwj-code xyyhcl jinsongpan bobwang0813 novellll hell-to-heaven hunterkruger shaowen-open jdadong yueyedeai

nlu_sim's Issues

预训练词向量链接已失效

pretrained word embedding

Enhancement - vocab mashup

I got this repo working / it’s a few years old. It has the power to take a sentence / and switch out vocab based off of respective parts of speech.
https://github.com/johndpope/vocab-mashup

It seems like this functionality could be useful in this repo, in that similar sentences could be generated. Or perhaps this is already possible?

assign_pretrained_word_embedding

word2id 里面PAD_ID = 0，UNK_ID=1，所以embdding的赋值开始index应该是2才对，目前是只针PAD_ID 设置了zeros，而UNK_ID没有设置，开始index是从1开始。

some question

thanks for sharing this good work!
how much will these features improve your f1 score?

1)n-gram similiarity(blue score for n-gram=1,2,3...);

2) get length of questions, difference of length

3) how many words are same, how many words are unique

4) question 1,2 start with how/why/when(wei shen me,zenme，ruhe，weihe）

5）edit distance

6) cos similiarity using bag of words for sentence representation(combine tfidf with word embedding from word2vec,fasttext)

7) manhattan_distance,canberra_distance,minkowski_distance,euclidean_distance

请问这里的方法在蚂蚁金融的比赛中F1能到多少

https://dc.cloud.alipay.com/index#/topic/intro?id=3
排名大概如何？

pretrained word embedding in data\asttext_fin_model_50.vec

找不到pretrained word embedding in data\asttext_fin_model_50.vec该文件呀

brightmart / nlu_sim Goto Github PK

nlu_sim's People

Contributors

Stargazers

Watchers

Forkers

nlu_sim's Issues

预训练词向量链接已失效

Enhancement - vocab mashup

assign_pretrained_word_embedding

some question

请问这里的方法在蚂蚁金融的比赛中F1能到多少

pretrained word embedding in data\asttext_fin_model_50.vec

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs