Some question

谢谢分享！这里的翻译和中文不都指的 Quora 数据对应的中文吗？
另外想问下，蚂蚁金服的你也尝试翻译了吗？

求数据集

embedding_path = 'CnCorpus-vectors-negative64.bin' 这个文件。
麻烦博主百忙抽空上传一下数据集，太大的话可以百度云。
谢谢！

quora_test.csv file missing

您好，项目中缺少quora_test.csv这个文件，可以上传一下这个文件吗？感谢。

Got 0 accuracy by using Colab GPU to run the project

您好，我在Google Colab上运行了您的项目，但是结果显示的准确率只有0，请问您有遇到这个情况吗？谢谢。

A Keras Implementation of Attention_based Siamese Manhattan LSTM

Hey Junru

我叫孙浩然。我在查怎么构造Siamese模型的时候在github伤看到了你的代码。你使用keras做的。我基本复制了你的代码，然后在本地做，但是略有不同。
我没用embedding层，而是直接：
activations = Bidirectional(LSTM(n_hidden, return_sequences=True), merge_mode='concat')(_input)
activations = Bidirectional(LSTM(n_hidden, return_sequences=True), merge_mode='concat')(activations)
原因稍后讲。然后我将输入的数据直接变成了【【0.21，0.2135..。0.2354】【0.33，0.26..0.25】【0.235,0.235...0.235】】用的glove100d。简单说就是我把embedding层的计算移到外面了。直接把word变成vector。
然后做出来validation是80%左右，跟你的original的代码差不多。然后我保存了model，然后，因为我有glove100d，所以我直接放了几个自己临时写的句子然后把他们转化成相同的input shape：
model = load_model('my_10length_model.h5', custom_objects={'ManDist': ManDist})
texttarget = ['How do I make friends?']
text1 = ['How to make friends?']
text2 = ['Can you tell me how to play the piano?']
text3 = ['Can you tell me the truth of the computer games?']
texttarget = prepare_data(texttarget,MAX_SEQ_LENGTH,embeddings_index)
candidate_1 = prepare_data(text1,MAX_SEQ_LENGTH,embeddings_index)
candidate_2 = prepare_data(text2,MAX_SEQ_LENGTH,embeddings_index)
candidate_3 = prepare_data(text3,MAX_SEQ_LENGTH,embeddings_index)

print(texttarget)

result1 = model.predict([texttarget,candidate_1])
result2 = model.predict([texttarget,candidate_2])
result3 = model.predict([texttarget,candidate_3])
print(result1,result2,result3)
然后result是：
[[0.5311898]] [[0.60761184]] [[0.42349315]]
其中的prepare_data函数是把任意句子通过glove转化成一个有MAX_SEQ_LENGTH个100维的vector的list。
从结果我们可以看出，这个模型认为第二句跟我的target句子是最相似的。进一步说明，这个模型实际上根本不具有普遍性，没办法真正计算任意句子的相似度。
我给你发issue的目的是想问问，你有没有更好的办法可以提高模型对非训练集的句子pair的计算相似度的准确率？

cheers
Haoran

lujunru / sentences_pair_similarity_calculation_siamese_lstm Goto Github PK

sentences_pair_similarity_calculation_siamese_lstm's Issues

Some question

求数据集

quora_test.csv file missing

Got 0 accuracy by using Colab GPU to run the project

A Keras Implementation of Attention_based Siamese Manhattan LSTM

print(texttarget)

How can I get the cn vocab file :CnCorpus-vectors-negative64.bin

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs