GithubHelp home page GithubHelp logo

kangzhonghua / gpt-2-tensorflow2.0 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from akanyaani/gpt-2-tensorflow2.0

0.0 1.0 0.0 4.45 MB

OpenAI GPT2 pre-training implementation in Tensorflow 2.0

Home Page: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

License: MIT License

Python 79.88% Jupyter Notebook 20.12%

gpt-2-tensorflow2.0's Introduction

GPT-2 Pre-training and text generation, implemented in Tensorflow 2.0

Originally implemented in tensorflow 1.14 by OapenAi :- "openai/gpt-2". OpenAi GPT-2 Paper:-"Language Models are Unsupervised Multitask Learners"

This repository has OpenAi GPT-2 pre-training implementation in tensorflow 2.0, I am also working on text -generation using this model, I will push that code after couple of days.

Requirements

  • python >= 3.6
  • setuptools==41.0.1
  • ftfy==5.6
  • tqdm==4.32.1
  • Click==7.0
  • sentencepiece==0.1.83
  • tensorflow-gpu==2.0.0rc0
  • numpy==1.16.4

Setup

$ git clone https://github.com/akanyaani/gpt-2-tensorflow2.0
$ cd gpt-2-tensorflow2.0
$ pip install -r requirements.txt

You can pre-train the model using sample data available in repository or you can download the data using this github repo https://github.com/eukaryote31/openwebtext

Pre-Training model on sample data available in repository

$ python pre_process.py --help

Options:
  --data-dir TEXT        training data path  [default: /data/scraped]
  --vocab-size INTEGER   byte pair vocab size  [default: 32000]
  --min-seq-len INTEGER  minimum sequence length  [default: 15]
  --max-seq-len INTEGER  minimum sequence length  [default: 512]
  --help                 Show this message and exit.
  
  
>> python pre_process.py

Pre-Training model on openwebtext or any other data

>> python pre_process.py --data-dir=data_directory --vocab-size=32000
$ python train_gpt2.py --help

Options:
  --num-layers INTEGER      No. of decoder layers  [default: 8]
  --embedding-size INTEGER  Embedding size  [default: 768]
  --num-heads INTEGER       Number of heads  [default: 8]
  --dff INTEGER             Filter Size  [default: 3072]
  --max-seq-len INTEGER     Seq length  [default: 515]
  --vocab-size INTEGER      Vocab size  [default: 32000]
  --optimizer TEXT          optimizer type  [default: adam]
  --batch-size INTEGER      optimizer type  [default: 8]
  --learning-rate FLOAT     learning rate  [default: 0.001]
  --distributed BOOLEAN     distributed training  [default: False]
  --help                    Show this message and exit.
  
  
>> python train_gpt2.py --num-layers=8 --embedding-size=768 --batch-size=32

Distributed training on multiple gpu.

>> python train_gpt2.py --num-layers=8 --embedding-size=768 --batch-size=32 --distributed=Ture

Start TensorBoard through the command line.

$ tensorboard --logdir /log

References:

Contribution

  • Your issues and PRs are always welcome.

Author

License

Computation Graph of GPT-2 Model.

Decoder Graph

GPT-2_Graph

gpt-2-tensorflow2.0's People

Contributors

akanyaani avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.