Light

lixuanhng / nlp_related_projects Goto Github PK

View Code? Open in Web Editor NEW

130.0 2.0 79.0 46.42 MB

Mark and summarize the NLP project has been through

Python 63.57% Jupyter Notebook 36.43%

nlp_related_projects's Introduction

nlp_related_projects's People

Contributors

Stargazers

Watchers

Forkers

nagin-kim c007456 vivian-wh zsrgit sanzo001 wakawakaohoh xiaoma-father amberww linbaba222 konver-z hxh004 kanglexu drmartinmar huang-yi001 ottocheung bzqweiyi hensnberg skinny-joey c1258797185 laishuzhong susu-wanwan komorebi12345 zhangqile900621 lusialuo johnnylin97 tiffen huruijuan 77216384 twtw2524 creamiracle leon2milan jw2100 tianyu529 joepei linxiao777 candycc9626 chertiver zengchongq lc10230327 tbsuperman haisimao zhuifeng414 wszny lizhuangzhuangbenren qinlong888 lylyone 3176226683 freesia1215 meverystrong github2022163 bleachwhb 123xyg123 rayofsunshine hitshc1997 hailei edith-wang jaieu yubo1993 flyrainkey chongchongliu namegoodd elepikachu zile-dong suptt biwenxiao laurel06 magnetic233 johanna24601 greyer-hasion wangjian226 luckhuanhuan suanyuyuan luciusun taoxiao778 charon2061912967 sky-jiangcheng sherlockqhs suyingshi cancancan12138

nlp_related_projects's Issues

相似性结果没有改变

为什么我用自己预训练得到的模型来做英文的文本相似性分析时，不管怎么改变两个用来预测的句子，最后的结果都很接近0.5。而且最后两个句子预测得到的label好像不是计算得到的，就是一开始设置的那个label啊？

文本相似度结果产生的原理？

一般计算文本相似度会用到余弦距离或者欧氏距离等，那么在这个项目中用到的是什么方法或者原理呢？

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 144: invalid continuation byte

请问这个如何解决
Traceback (most recent call last):
File "D:/down/NLP_related_projects-master/BERT/Bert_sim/run_similarity.py", line 716, in
sim = BertSim()
File "D:/down/NLP_related_projects-master/BERT/Bert_sim/run_similarity.py", line 141, in init
self.tokenizer = tokenization.FullTokenizer(vocab_file=cf.vocab_file, do_lower_case=True)
File "D:\down\NLP_related_projects-master\BERT\Bert_sim\bert_model\tokenization.py", line 165, in init
self.vocab = load_vocab(vocab_file)
File "D:\down\NLP_related_projects-master\BERT\Bert_sim\bert_model\tokenization.py", line 127, in load_vocab
token = convert_to_unicode(reader.readline())
File "D:\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 169, in readline
self._preread_check()
File "D:\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 79, in _preread_check
self.__name, 1024 * 512)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 144: invalid continuation byte

Process finished with exit code 1

bert模型文件好像没有

from bert_dir.bert.bert import modeling
from bert_dir.bert.bert import tokenization
from bert_dir.bert.bert import optimization

bert_dir不存在

bert_model没有

这里面的bert_model指的是什么呢

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.