gft (general fine-tuning): A Little Language for Deepnets

1-line programs for fine-tuning, inference and more

Quick Links

Papers
1. ACL-2022 Tutorial
2. JNLE
Videos 📽️
1. 📽️ 10 minute TEASER
2. 🆕📽️ First half of ACL-2022 Tutorial (1 hour 16 minutes) UNABRIDGED
Installation
Documentation
ACL-2022 Tutorial

Four Functions and Four Arguments

gft contains 4 main functions:

gft_fit: fit a pretrained model to data (aka fine-tuning)
gft_predict: apply a model to inputs (aka inference)
gft_eval: score a model on a split of a dataset
gft_summary: Find good stuff (popular models and datasets), and explain what's in those models and datasets.

These gft functions make use of 4 main arguments (though most arguments in most hubs are also supported):

data: standard datasets hosted on hubs such as HuggingFace, PaddleNLP, or custom datasets hosted on the local filesystem
model: standard models hosted on hubs such as HuggingFace, PaddleNLP, or custom models hosted on the local filesystem
equation: string such as "classify: label ~ text", where classify is a task, and label and text refer to columns in a dataset
task: classify, classify_tokens, classify_spans, classify_audio, classify_images, regress, text-generation, translation, ASR, fill-mask

A Few Simple Examples

Here are some simple examples:

emodel=H:bhadresh-savani/roberta-base-emotion

# Summarize a dataset and/or model
gft_summary --data H:emotion
gft_summary --model $emodel
gft_summary --data H:emotion --model $emodel

# find some popular datasets and models that contain "emotion"
gft_summary --data H:__contains__emotion --topn 5
gft_summary --model H:__contains__emotion --topn 5

# make predictions on inputs from stdin
echo 'I love you.' | gft_predict --task classify

# The default model (for the classification task) performs sentiment analysis
# The model, $emodel, outputs emotion classes (as opposed to POSITIVE/NEGATIVE)
echo 'I love you.' | gft_predict --task classify --model $emodel

# some other tasks (beyond classification)
echo 'I love New York.' | gft_predict --task H:token-classification
echo 'I <mask> you.' | gft_predict --task H:fill-mask

# make predictions on inputs from a split of a standard dataset
gft_predict --eqn 'classify: label ~ text' --model $emodel --data H:emotion --split test

# return a single score (as opposed to a prediction for each input)
gft_eval --eqn 'classify: label ~ text' --model $emodel --data H:emotion --split test

# Input a pre-trained model (bert) and output a post-trained model
gft_fit --eqn 'classify: label ~ text' \
	--model H:bert-base-cased \
	--data H:emotion \
	--output_dir $outdir

Pre-Training, Fine-Tuning and Inference

The table below shows a 3-step recipe, which has become standard in the literature on deep nets.

Step	gft Support	Description	Time	Hardware
1		Pre-Training	Days/Weeks	Large GPU Cluster
2	gft_fit	Fine-Tuning	Hours/Days	1+ GPUs
3	gft_predict	Inference	Seconds/Minutes	0+ GPUs

This repo provides support for step 2 (gft_fit) and step 3 (gft_predict). Most gft_fit and gft_predict programs are short (1-line), much shorter than examples such as these, which are typically a few hundred lines of python. With gft, users should not need to read or modify any python code for steps 2 and 3 in the table above.

Step 1, pre-training, is beyond the scope of this work. We recommend starting with models from HuggingFace and PaddleHub/PaddleNLP hubs, as illustrated in the examples above.

Citations, Documentation, etc.

Papers are here and here.

@inproceedings{church-etal-2022-gentle,
    title = "A Gentle Introduction to Deep Nets and Opportunities for the Future",
    author = "Church, Kenneth  and
      Kordoni, Valia  and
      Marcus, Gary  and
      Davis, Ernest  and
      Ma, Yanjun  and
      Chen, Zeyu",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-tutorials.1",
    pages = "1--6",
    abstract = "The first half of this tutorial will make deep nets more accessible to a broader audience, following {``}Deep Nets for Poets{''} and {``}A Gentle Introduction to Fine-Tuning.{''} We will also introduce GFT (general fine tuning), a little language for fine tuning deep nets with short (one line) programs that are as easy to code as regression in statistics packages such as R using glm (general linear models). Based on the success of these methods on a number of benchmarks, one might come away with the impression that deep nets are all we need. However, we believe the glass is half-full: while there is much that can be done with deep nets, there is always more to do. The second half of this tutorial will discuss some of these opportunities.",
}

@article{church-etal-2022-gft, 
   title={Emerging trends: General fine-tuning (gft)}, 
   DOI={10.1017/S1351324922000237}, 
   journal={Natural Language Engineering}, 
   publisher={Cambridge University Press}, 
   author={Church, Kenneth and Cai, Xingyu and Ying, Yibiao and Chen, Zeyu and Xun, Guangxu and Bian, Yuchen}, 
   year={2022}, 
   pages={1–17}}

yingyibiao / gft Goto Github PK

gft's Introduction

gft (general fine-tuning): A Little Language for Deepnets

Quick Links

Four Functions and Four Arguments

A Few Simple Examples

Pre-Training, Fine-Tuning and Inference

Citations, Documentation, etc.

gft's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs