GithubHelp home page GithubHelp logo

itongxiaojun / paddlenlp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from paddlepaddle/paddlenlp

0.0 1.0 0.0 30.46 MB

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

License: Apache License 2.0

Python 96.69% Shell 2.99% CMake 0.04% C++ 0.27%

paddlenlp's Introduction

简体中文 | English


License python version support os

简介

PaddleNLP 2.0拥有覆盖多场景的模型库简洁易用的全流程API动静统一的高性能分布式训练能力,旨在为飞桨开发者提升文本领域建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

特性

安装

环境依赖

  • python >= 3.6
  • paddlepaddle >= 2.0.0

pip安装

pip install paddlenlp\>=2.0.0rc

快速开始

数据集快速加载

from paddlenlp.datasets import ChnSentiCorp

train_ds, dev_ds, test_ds = ChnSentiCorp.get_datasets(['train', 'dev', 'test'])

可参考Dataset文档查看更多数据集。

一键加载中文词向量

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643

内置50+中文词向量,更多使用方法请参考Embedding文档

一键加载高质量中文预训练模型

from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')

请参考Transformer API文档查看目前支持的预训练模型。

模型库及其应用

PaddleNLP模型库整体介绍请参考文档PaddleNLP Model Zoo。 模型应用场景介绍请参考PaddleNLP Examples

进阶应用

API 使用文档

  • Transformer API
    • 基于Transformer结构相关的预训练模型API,包含ERNIE, BERT, RoBERTa, Electra等主流经典结构和下游任务。
  • Data API
    • 文本数据处理Pipeline的相关API说明。
  • Dataset API
    • 数据集相关API,包含自定义数据集,数据集贡献与数据集快速加载等功能说明。
  • Embedding API
    • 词向量相关API,支持一键快速加载包预训练的中文词向量,VisulDL高维可视化等功能说明。
  • Metrics API
    • 针对NLP场景的评估指标说明,与飞桨2.0框架高层API兼容。

交互式Notebook教程

更多教程参见PaddleNLP on AI Studio

社区贡献与技术交流

  • 欢迎您加入PaddleNLP的SIG社区,贡献优秀的模型实现、公开数据集、教程与案例、外围小工具。
  • 现在就加入PaddleNLP的QQ技术交流群,一起交流NLP技术吧!⬇️

License

PaddleNLP遵循Apache-2.0开源协议

paddlenlp's People

Contributors

zeyuchen avatar liuchiachi avatar smallv0221 avatar zhui avatar wawltor avatar joey12300 avatar guoshengcs avatar frostml avatar kinghuin avatar steffy-zxf avatar xiemoyuan avatar jiangjiajun avatar raindrops2sea avatar huangxu96 avatar weiwei1115 avatar zhengya01 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.