GithubHelp home page GithubHelp logo

cedrickchee / ulmfit-multilingual Goto Github PK

View Code? Open in Web Editor NEW

This project forked from n-waves/multifit

0.0 2.0 0.0 239 KB

Temporary repository used for collaboration on application of for multiple languages.

Home Page: https://forums.fast.ai/t/multilingual-ulmfit/28117

Jupyter Notebook 52.20% Python 45.83% Shell 1.97%
ulmfit nlp deep-learning transfer-learning pretrained bidirectional-lm bert-model elmo fastai

ulmfit-multilingual's Introduction

ulmfit-multilingual

Temporary repository used for collaboration on application of for multiple languages.

How to train classifier

$ python -m ulmfit lm --dataset-path data/wiki/wikitext-103 --bidir=False --qrnn=False --tokenizer=vf --name 'bs40' --bs=40 --cuda-id=0  -  train 20 --drop-mult=0.9
...
Model dir: data/wiki/wikitext-103/models/vf60k/lstm_bs40.m
...
$ python -m ulmfit cls --dataset-path data/imdb --base-lm-path data/wiki/wikitext-103/models/vf60k/lstm_bs40.m - train 20   

data directory strucutre

Directory structure after changes to the way we process wiki dumps.

data
├── imdb
│   ├── aclImdb
│   ├── imdb_lm
│   └── tmp
├── wiki
│   ├── de-100
│   │   └── models
│   ├── de-100-unk
│   │   └── models
│   ├── de-2
│   │   └── models
│   ├── de-2-unk
│   │   └── models
│   ├── de-all
│   │   └── models
│   ├── wikitext-103
│   │   └── models
│   └── wikitext-2
│       └── models
├── wiki_dumps
├── wiki_extr
│   └── de
│       ├── AA
│       ├── AB
...
        └── CC
└── xnli
    ├── XNLI-1.0
    └── XNLI-MT-1.0
        ├── multinli
        └── xnli

how to contribute

We have a fork of fastai to propose changes to fastai.text, with a branch for this project: https://github.com/n-waves/fastai/tree/ulmfit_multilingual

Let us know that you want to start collaboration on fastai forum thread: Multilingual ULMFIT and you will get access to both repositories.

Here is what I did:

$ cd fastai
$ git remote add n-waves https://github.com/n-waves/fastai.git
$ git remote -v 
n-waves	https://github.com/n-waves/fastai.git (fetch)
n-waves	https://github.com/n-waves/fastai.git (push)
origin	https://github.com/fastai/fastai.git (fetch)
origin	https://github.com/fastai/fastai.git (push)

$ git fetch n-waves
$ git checkout ulmfit_multilingual
Branch 'ulmfit_multilingual' set up to track remote branch 'ulmfit_multilingual' from 'n-waves'.
Switched to a new branch 'ulmfit_multilingual'

$ git push --set-upstream n-waves ulmfit_multilingual  # to automatically push ulmfit_multilingual branch to the n-waves repo

Repo structure

  • fastai_contrib -- anything that can be ported to fastai once we finish the project like: NLI models, Sentence Piece tok.,
  • ulmfit
    • data -- scripts to fetch and prepare data: wikipedia, xnli, classification data sets
    • lm -- scripts to train language models
    • bilm -- scripts to train biLM ELMo style, Bert style
    • class -- scripts to test classifiers on multiple languages
    • xnli -- scripts to test nli

ulmfit-multilingual's People

Contributors

piotrczapla avatar eisenjulian avatar aayux avatar sebastianruder avatar nirantk avatar

Watchers

Cedric Chee avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.