GithubHelp home page GithubHelp logo

Comments (8)

brightmart avatar brightmart commented on May 12, 2024

can you paste your configuration here to have a look?
你可以把你的相关的训练相关的参数配置信息贴出来,看看吗。如训练什么型号的、参数配置、batch_size,数据量什么的。

from albert_zh.

yyht avatar yyht commented on May 12, 2024

albert-12, embedding_size=128, hidden_size=768, intermediate_size=1024, number_of_layers=12
batch_size=24(32张卡,相当于 batch_size=768). 只做mlm,不做 sop 等

from albert_zh.

brightmart avatar brightmart commented on May 12, 2024

好的。看到了。
1、intermediate_size应该是4*hidden_size,
2、之前的run_pretraining.py里没有把段落连续性的损失加到总loss里。现在已经更新上去了

你可以先改一下,再试一试;如果loss还很大的话,稍等一两天,还会有更新。

from albert_zh.

yyht avatar yyht commented on May 12, 2024

我觉得 和 intermediate_size 的size 大小 应该 相关度不高,稍微小一些 应该 也能 收敛,现在 看 曲线 感觉 mlm loss stuck 在 局部 极致点附近了,下降很慢(或者 进入 饱和区 之类的)
另外,40G 的文本 可以 分享一下嘛

from albert_zh.

yyht avatar yyht commented on May 12, 2024

我知道原因了,主要是 embedding matrix、projection matrix 需要 连续的矩阵乘法,这样,会导致 他们的 元素 比 之前小很多,对 projection 后的tensor 乘以2 可以 保持 值域类似,收敛效果会好很多
image

from albert_zh.

brightmart avatar brightmart commented on May 12, 2024

怎么理解,有根据吗。有没有更好的解释

from albert_zh.

yyht avatar yyht commented on May 12, 2024

from albert_zh.

lonePatient avatar lonePatient commented on May 12, 2024

看样子应该是数据量比较小吧,如果用12层的话,学习率是调整的。

from albert_zh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.