GithubHelp home page GithubHelp logo

language-models's Introduction

Language Models

Repository of pre-trained Language Models.

WARNING: a Bidirectional LM model using the MultiFiT configuration is a good model to perform text classification but with only 46 millions of parameters, it is far from being a LM that can compete with GPT-2 or BERT in NLP tasks like text generation. This my next step ;-)

Note: The training times shown in the tables on this page are the sum of the creation time of Fastai Databunch (forward and backward) and the training duration of the bidirectional model over 10 periods. The download time of the Wikipedia corpus and its preparation time are not counted.

Portuguese

I trained 1 Portuguese Bidirectional Language Model (PBLM) with the MultiFit configuration with 1 NVIDIA GPU v100 on GCP.

MultiFiT configuration (architecture 4 QRNN with 1550 hidden parameters by layer / tokenizer SentencePiece (15 000 tokens))

PBLM accuracy perplexity training time
forward 39.68% 21.76 8h
backward 43.67% 22.16 8h

[ WARNING ] The code of this notebook lm3-portuguese-classifier-olist.ipynb must be updated in order to use the SentencePiece model and vocab already trained for the Portuguese Language Model in the notebook lm3-portuguese.ipynb as it was done in the notebook lm3-portuguese-classifier-TCU-jurisprudencia.ipynb (see explanations at the top of this notebook).

Here's an example of using the classifier to predict the category of a TCU legal text:

Using the classifier to predict the category of TCU legal texts

French

I trained 3 French Bidirectional Language Models (FBLM) with 1 NVIDIA GPU v100 on GCP but the best is the one trained with the MultiFit configuration.

French Bidirectional Language Models (FBLM) accuracy perplexity training time
MultiFiT with 4 QRNN + SentencePiece (15 000 tokens) forward 43.77% 16.09 8h40
backward 49.29% 16.58 8h10
ULMFiT with 3 QRNN + SentencePiece (15 000 tokens) forward 40.99% 19.96 5h30
backward 47.19% 19.47 5h30
ULMFiT with 3 AWD-LSTM + spaCy (60 000 tokens) forward 36.44% 25.62 11h
backward 42.65% 27.09 11h

1. MultiFiT configuration (architecture 4 QRNN with 1550 hidden parameters by layer / tokenizer SentencePiece (15 000 tokens))

FBLM accuracy perplexity training time
forward 43.77% 16.09 8h40
backward 49.29% 16.58 8h10

Here's an example of using the classifier to predict the feeling of comments on an amazon product:

Using the classifier to predict the feeling of comments on an amazon product

2. Architecture QRNN / tokenizer SentencePiece

FBLM accuracy perplexity training time
forward 40.99% 19.96 5h30
backward 47.19% 19.47 5h30

3. Architecture AWD-LSTM / tokenizer spaCy

FBLM accuracy perplexity training time
forward 36.44% 25.62 11h
backward 42.65% 27.09 11h

language-models's People

Contributors

piegu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.