brightmart / nlu_sim Goto Github PK
View Code? Open in Web Editor NEWall kinds of baseline models for sentence similarity 句子对语义相似度模型
all kinds of baseline models for sentence similarity 句子对语义相似度模型
pretrained word embedding
I got this repo working / it’s a few years old. It has the power to take a sentence / and switch out vocab based off of respective parts of speech.
https://github.com/johndpope/vocab-mashup
It seems like this functionality could be useful in this repo, in that similar sentences could be generated. Or perhaps this is already possible?
word2id 里面PAD_ID = 0,UNK_ID=1,所以embdding的赋值开始index应该是2才对,目前是只针PAD_ID 设置了zeros,而UNK_ID没有设置,开始index是从1开始。
thanks for sharing this good work!
how much will these features improve your f1 score?
1)n-gram similiarity(blue score for n-gram=1,2,3...);
2) get length of questions, difference of length
3) how many words are same, how many words are unique
4) question 1,2 start with how/why/when(wei shen me,zenme,ruhe,weihe)
5)edit distance
6) cos similiarity using bag of words for sentence representation(combine tfidf with word embedding from word2vec,fasttext)
7) manhattan_distance,canberra_distance,minkowski_distance,euclidean_distance
找不到pretrained word embedding in data\asttext_fin_model_50.vec该文件呀
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.