GithubHelp home page GithubHelp logo

albert_zh's Introduction

albert_zh

海量中文语料上预训练ALBERT模型:参数更少,效果更好

Chinese version of ALBERT pre-trained model

ALBERT模型介绍

ALBERT模型是BERT的改进版,与最近其他State of the art的模型不同的是,这次是预训练小模型,效果更好、参数更少。

预训练小模型也能拿下13项NLP任务,ALBERT三大改造登顶GLUE基准

它对BERT进行了三个改造:

1)词嵌入向量参数的因式分解 Factorized embedding parameterization

 O(V * H) to O(V * E + E * H)
 
 如以ALBert_xxlarge为例,V=30000, H=4096, E=128
   
 那么原先参数为V * H= 30000 * 4096 = 1.23亿个参数,现在则为V * E + E * H = 30000*128+128*4096 = 384万 + 52万 = 436万,
   
 词嵌入相关的参数变化前是变换后的28倍。

2)跨层参数共享 Cross-Layer Parameter Sharing

 参数共享能显著减少参数。共享可以分为全连接层、注意力层的参数共享;注意力层的参数对效果的减弱影响小一点。

3)段落连续性任务 Inter-sentence coherence loss.

 使用段落连续性任务。正例,使用从一个文档中连续的两个文本段落;负例,使用从一个文档中连续的两个文本段落,但位置调换了。
 
 避免使用原有的NSP任务,原有的任务包含隐含了预测主题这类过于简单的任务。

  We maintain that inter-sentence modeling is an important aspect of language understanding, but we propose a loss 
  based primarily on coherence. That is, for ALBERT, we use a sentence-order prediction (SOP) loss, which avoids topic 
  prediction and instead focuses on modeling inter-sentence coherence. The SOP loss uses as positive examples the 
  same technique as BERT (two consecutive segments from the same document), and as negative examples the same two 
  consecutive segments but with their order swapped. This forces the model to learn finer-grained distinctions about
  discourse-level coherence properties. 

发布计划 Release Plan

1、albert_base, 参数量12M, 层数12,10月5号

2、albert_large, 参数量18M, 层数24,10月13号

3、albert_xlarge, 参数量59M, 层数24,10月6号

4、albert_xxlarge, 参数量233M, 层数12,10月7号(效果最佳的模型)

训练语料

40g中文语料,超过100亿汉字,包括多个百科、新闻、互动社区、小说、评论。

模型性能与对比

模型参数和配置

Reference

1、ALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations

2、预训练小模型也能拿下13项NLP任务,ALBERT三大改造登顶GLUE基准

3、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

4、SpanBERT: Improving Pre-training by Representing and Predicting Spans

albert_zh's People

Contributors

brightmart avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.