GithubHelp home page GithubHelp logo

您好,我用发布的roberta_large当做语言模型测试句子的ppl值时,发现每个字的概率很小,不太合理;相同条件下bert模型句子中每个字的概率都是比较大的。请教一下原因? about roberta_zh HOT 6 CLOSED

brightmart avatar brightmart commented on May 13, 2024
您好,我用发布的roberta_large当做语言模型测试句子的ppl值时,发现每个字的概率很小,不太合理;相同条件下bert模型句子中每个字的概率都是比较大的。请教一下原因?

from roberta_zh.

Comments (6)

brightmart avatar brightmart commented on May 13, 2024

原因未名。先不看绝对值,只看相对值看看。能否贴出你的对比

from roberta_zh.

Jethu1 avatar Jethu1 commented on May 13, 2024

这是原bert模型的结果,能够看出每个字的概率相对较大,这也符合预期,因为每次预测都是只mask一个字
{
"tokens": [
{
"token": "是",
"prob": 0.30322134494781494
},
{
"token": "啊",
"prob": 0.0012240558862686157
},
{
"token": "国",
"prob": 0.9883688688278198
},
{
"token": "内",
"prob": 0.8388231992721558
},
{
"token": "的",
"prob": 0.13244634866714478
},
{
"token": "话",
"prob": 0.2772337794303894
},
{
"token": "换",
"prob": 0.012012508697807789
},
{
"token": "运",
"prob": 0.9997902512550354
},
{
"token": "营",
"prob": 0.9997696280479431
},
{
"token": "商",
"prob": 0.9707292318344116
},
{
"token": "就",
"prob": 0.11638925969600677
},
{
"token": "得",
"prob": 0.049220748245716095
},
{
"token": "换",
"prob": 0.9010285139083862
},
{
"token": "号",
"prob": 0.4484074115753174
},
{
"token": "码",
"prob": 1.2002858511550585e-06
}
],
"ppl": 10.693282416916839
},
这是用Roberta_large模型使用完全相同的脚本跑出来的结果,我也注意了相对值,依然感觉明显不太合理。
{
"tokens": [
{
"token": "是",
"prob": 8.387277193833143e-05
},
{
"token": "啊",
"prob": 1.7700522221275605e-05
},
{
"token": "国",
"prob": 2.2484027795144357e-05
},
{
"token": "内",
"prob": 5.781384970759973e-06
},
{
"token": "的",
"prob": 4.081234692421276e-06
},
{
"token": "话",
"prob": 6.827569904999109e-06
},
{
"token": "换",
"prob": 6.573647624463774e-06
},
{
"token": "运",
"prob": 5.2257790230214596e-05
},
{
"token": "营",
"prob": 2.9685045319638448e-06
},
{
"token": "商",
"prob": 5.854314076714218e-05
},
{
"token": "就",
"prob": 3.97300100303255e-05
},
{
"token": "得",
"prob": 1.9959677956649102e-05
},
{
"token": "换",
"prob": 2.198752781623625e-06
},
{
"token": "号",
"prob": 3.740817874131608e-06
},
{
"token": "码",
"prob": 0.00041163168498314917
}

from roberta_zh.

brightmart avatar brightmart commented on May 13, 2024

看上去概率确实很低,而bert模型的ppl还挺正常的

from roberta_zh.

Jethu1 avatar Jethu1 commented on May 13, 2024

这块,我比较了哈工大RoBERTa-wwm-ext, Chinese,这个模型的效果;从概率上看,它基本和原Bert模型接近。出现这种情况的最大可能,我觉得可能是由于本模型从一开始就是采用MASK词的方式训练的,而哈工大哪个模型是从BerT字模型基础上做的增量MASK词。因为MASK词是一个难度更高的任务,测试时是依次扣掉每一个字来算概率,因此本模型得到的概率自然会比较低。

from roberta_zh.

brightmart avatar brightmart commented on May 13, 2024

应该是的。

from roberta_zh.

brightmart avatar brightmart commented on May 13, 2024

你好,roberta_zh_large是没有包含语言模型的权重的。那么你在测试的时候,可能是随机的吗。

你可以试一试这个包含mlm参数的版本(roeberta_zh_L-24_H-1024_A-16_lm_layer.zip):
https://drive.google.com/file/d/1MmVWOGTsCdeUMfeCePDcatsui9zL3lND/view

from roberta_zh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.