GithubHelp home page GithubHelp logo

zhaoyu-li / cs229_project Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4.57 MB

This is a course project for Nature Language Process(CS229).

Shell 0.10% Makefile 0.03% Dockerfile 0.01% CSS 0.12% Python 86.06% Jupyter Notebook 13.68%

cs229_project's Introduction

CS229_Project

This is a course project for Nature Language Process(CS229), ACM Class 2017, SJTU. Written in 2020.

Introduction

The General Language Understanding Evaluation (GLUE) benchmark is a diverse set of existing natural language understanding tasks. In this course project, we choose CoLA as our task to evaluate the performance of our language models. For my approaches, at first, I try ERNIE, MT-DNN, RoBERTa models with carefully parameters selecting. However, I fail to achieve good scores on dev set. After reading the latest papers and codebases, I finally choose two models: ALBERT(from Google Research and the Toyota Technological Institute at Chicago) and ELECTRA(from Google Research/Stanford University) as my pretrained models and then finetune on CoLA tasks respectively. I use ensemble method to combine the results derived from the two models and obtain a quite good score 74.2 on CoLA task.

For details of my method and evaluation, please refer to my report. I also provide some checkpoints of my model, you can download them if you like.

Prerequisites

  • Python 3
  • NumPy
  • scikit-learn
  • SciPy
  • comet_ml

The code is compatible with PyTorch 1.5.0 and TensorFlow 1.15. In addition, you need to execute the followings in order to download CoLA dataset and install other packages.

# Install apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

# For ALBERT
cd albert
# Download CoLA data
wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
python download_glue_data.py --data_dir glue_data --tasks CoLA

pip install --editable .
mkdir logs

# For ELECTRA
cd ../electra
# Download CoLA data
wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
python download_glue_data.py --data_dir glue_data --tasks CoLA

cd glue_data
mkdir finetuning_data
mv CoLA finetuning_data/cola

# Download model
mkdir models
cd models
wget https://storage.googleapis.com/electra-data/electra_large.zip
unzip electra_large.zip

Finetune on CoLA task

I provide two scripts for ALBERT and ELECTRA to finetune on CoLA task. For ALBERT, I release the hyper parameters I used to reproduce the result. For ELECTRA, just run the script and it will use different random seeds and all of them have good and stable performance.

# For albert
cd albert
bash run_cola.sh

# For electra
cd electra
bash run_cola.sh

Inference on CoLA task

I provide two python files for inferencing on CoLA task. Just modify the path to your model or checkpoint in inference.py for prediction.

# For albert
# The average score of the three models on CoLA test set are more than 69.8.
cd albert
python inference.py

# For electra
# Almost all of the models achieve more than 70.1 score on CoLA test set.
cd electra
python inference.py

Reference

Appendix

The screenshot of my test result is showed as below: Result

cs229_project's People

Contributors

zhaoyu-li avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.