Comments (6)
原因未名。先不看绝对值,只看相对值看看。能否贴出你的对比
from roberta_zh.
这是原bert模型的结果,能够看出每个字的概率相对较大,这也符合预期,因为每次预测都是只mask一个字
{
"tokens": [
{
"token": "是",
"prob": 0.30322134494781494
},
{
"token": "啊",
"prob": 0.0012240558862686157
},
{
"token": "国",
"prob": 0.9883688688278198
},
{
"token": "内",
"prob": 0.8388231992721558
},
{
"token": "的",
"prob": 0.13244634866714478
},
{
"token": "话",
"prob": 0.2772337794303894
},
{
"token": "换",
"prob": 0.012012508697807789
},
{
"token": "运",
"prob": 0.9997902512550354
},
{
"token": "营",
"prob": 0.9997696280479431
},
{
"token": "商",
"prob": 0.9707292318344116
},
{
"token": "就",
"prob": 0.11638925969600677
},
{
"token": "得",
"prob": 0.049220748245716095
},
{
"token": "换",
"prob": 0.9010285139083862
},
{
"token": "号",
"prob": 0.4484074115753174
},
{
"token": "码",
"prob": 1.2002858511550585e-06
}
],
"ppl": 10.693282416916839
},
这是用Roberta_large模型使用完全相同的脚本跑出来的结果,我也注意了相对值,依然感觉明显不太合理。
{
"tokens": [
{
"token": "是",
"prob": 8.387277193833143e-05
},
{
"token": "啊",
"prob": 1.7700522221275605e-05
},
{
"token": "国",
"prob": 2.2484027795144357e-05
},
{
"token": "内",
"prob": 5.781384970759973e-06
},
{
"token": "的",
"prob": 4.081234692421276e-06
},
{
"token": "话",
"prob": 6.827569904999109e-06
},
{
"token": "换",
"prob": 6.573647624463774e-06
},
{
"token": "运",
"prob": 5.2257790230214596e-05
},
{
"token": "营",
"prob": 2.9685045319638448e-06
},
{
"token": "商",
"prob": 5.854314076714218e-05
},
{
"token": "就",
"prob": 3.97300100303255e-05
},
{
"token": "得",
"prob": 1.9959677956649102e-05
},
{
"token": "换",
"prob": 2.198752781623625e-06
},
{
"token": "号",
"prob": 3.740817874131608e-06
},
{
"token": "码",
"prob": 0.00041163168498314917
}
from roberta_zh.
看上去概率确实很低,而bert模型的ppl还挺正常的
from roberta_zh.
这块,我比较了哈工大RoBERTa-wwm-ext, Chinese,这个模型的效果;从概率上看,它基本和原Bert模型接近。出现这种情况的最大可能,我觉得可能是由于本模型从一开始就是采用MASK词的方式训练的,而哈工大哪个模型是从BerT字模型基础上做的增量MASK词。因为MASK词是一个难度更高的任务,测试时是依次扣掉每一个字来算概率,因此本模型得到的概率自然会比较低。
from roberta_zh.
应该是的。
from roberta_zh.
你好,roberta_zh_large是没有包含语言模型的权重的。那么你在测试的时候,可能是随机的吗。
你可以试一试这个包含mlm参数的版本(roeberta_zh_L-24_H-1024_A-16_lm_layer.zip):
https://drive.google.com/file/d/1MmVWOGTsCdeUMfeCePDcatsui9zL3lND/view
from roberta_zh.
Related Issues (20)
- XLNet其实不能稳压RoBERTa吧? HOT 1
- GPT vs BERT, under same computation and data resource, which one is better for downstream tasks like GLUE?
- 在pytorch模型上做post train HOT 2
- NaN probability sometimes when inference on GPU
- CMRC示例
- 是否可以开放语料,供其他模型对比
- 其中依赖的预训练模型是否和bert官方提供是一样的?
- 关于MLM中,中文全词掩盖的预测标签问题 HOT 7
- What are the pretrained-language-model that is obviously better than BERT and RoBERTa?
- 预处理数据丢失问题 HOT 1
- Unrelated parameters in the config
- resource文件夹下的vocab和代码不对应
- pytorch用BERT的加载方式加载roberta模型,呢么创建token时special token 是按照bert的方式还是roberta的方式呢
- 请问下,怎么进行GPU训练?
- Huggingface
- 利用roberta_zh的tokenizer来做中文NER任务时报错 HOT 2
- tensorboard可视化模型输出结果 train的masked_lm_loss和masked_lm_accuracy是空的,eval的图只有一个点
- Loss curve
- 下载问题和加载模型
- 加载的小问题求解答
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roberta_zh.