GithubHelp home page GithubHelp logo

imvijay23 / apiq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from baohaoliao/apiq

0.0 0.0 0.0 440 KB

Finetuning of low-bit quantized large language model

License: Apache License 2.0

Shell 3.12% Python 96.88%

apiq's Introduction

ApiQ

Finetuning of 2-Bit Quantized Large Language Model

arXiv

ApiQ is a framework for quantizing and finetuning an LLM in low-bit format. It can:

  • act as a post-trianing quantization framework, achieveing superior performance for various bit levels
  • finetune the quantized model for saving GPU memory and obtaining superior finetuning results

Supports

  • ApiQ-bw for quantizing Llama-2 and Mistral-7B-v0.1 in 4, 3 and 2 bits
  • Fintuning of real/fake quantized LLM on WikiText-2, GSM8K, 4 arithmetic reasoning tasks and eight commonsense reasoning tasks

News

  • [2024.06.19] Release of code

Contents

Install

conda create -n apiq python=3.10 -y
conda activate apiq
git clone https://github.com/BaohaoLiao/ApiQ.git
cd ApiQ
pip install --upgrade pip 
pip install -e .

If you want to finetune a real quantized LLM, we leverage the kernel from AutoGPTQ. You can install AutoGPTQ and optimum as follows:

git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
pip install gekko
pip install -vvv --no-build-isolation -e .
pip install optimum>=0.20.0

Model Zoo

We provide fake/real and symmetrically/asymmetrically quantized models at Huggingface.

  • fake: The LLM's weights are still in FP16
  • real: The LLM's weights are in GPTQ format
  • symmetric: The quantization is symmetric, friendly to vllm
  • asymmetric: The quantization is asymmetric

Note:

  • For the finetuning of real quantized LLM, you need to use the real and symmetric version, because there is a bug in AutoGPTQ for the asymmetric quantizaion (see discussion).
  • Fortunately, the difference between the symmetric and asymmetric quantization is very tiny. All results in the paper are from the asymmetric quantization.

Quantization

  1. Quantize an LLM with GPU as ./scripts/quantize.sh.
SIZE=7
BIT=2
GS=64

SAVE_DIR=./model_zoos/Llama-2-${SIZE}b-hf-w${BIT}g${GS}-fake-sym
mkdir -p $SAVE_DIR

python ./apiq/main.py \
    --model_name_or_path meta-llama/Llama-2-${SIZE}b-hf \
    --lwc --wbits ${BIT} --group_size ${GS} \
    --epochs 20 --seqlen 2048 --nsamples 128 \
    --peft_lr 0.0005 --peft_wd 0.1 --lwc_lr 0.005 --lwc_wd 0.1 \
    --symmetric \
    --eval_ppl \
    --aug_loss \
    --save_dir $SAVE_DIR  

It will output some files in --save_dir:

  • peft.pth: PEFT parameters
  • lwc.pth: quantization parameters
  • folder apiq_init: contain necessary files for finetuning a PEFT model
  • Other: The quantized version of LLM in FP16 format, tokenizer files, etc
  1. Evaluate a quantized LLM with peft.pth and lwc.pth. After quantization, you can evaluate the model again with --resume.
SIZE=7
BIT=2
GS=64

SAVE_DIR=./model_zoos/Llama-2-${SIZE}b-hf-w${BIT}g${GS}-fake-sym

python ./apiq/main.py \
    --model_name_or_path meta-llama/Llama-2-${SIZE}b-hf \
    --lwc --wbits ${BIT} --group_size ${GS} \
    --epochs 0 --seqlen 2048 --nsamples 128 \  # set epochs to 0
    --symmetric \
    --eval_ppl \
    --save_dir $SAVE_DIR  \
    --resume $SAVE_DIR
  1. Convert the fake quantized LLM to a real quantized LLM in GPTQ format (only work for symmetric quantization):
SIZE=7
BIT=2
GS=64

RESUME_DIR=SAVE_DIR=./model_zoos/Llama-2-${SIZE}b-hf-w${BIT}g${GS}-fake-sym
SAVE_DIR=./model_zoos/Llama-2-${SIZE}b-hf-w${BIT}g${GS}-real-sym
mkdir -p $SAVE_DIR

python ./apiq/main.py \
    --model_name_or_path meta-llama/Llama-2-${SIZE}b-hf \
    --lwc --wbits ${BIT} --group_size ${GS} \
    --epochs 0 --seqlen 2048 --nsamples 128 \  # set epochs to 0
    --symmetric \
    --eval_ppl \
    --save_dir $SAVE_DIR  \
    --resume $RESUME_DIR \
    --convert_to_gptq --real_quant

Finetuning

  1. WikiText-2
bash ./scripts/train_clm.sh
  1. GSM8K
bash ./scripts/train_test_gsm8k.sh
  1. Arithmetic / commonsense reasoning
# Download the training and test sets
bash ./scripts/download_datasets.sh

# Finetune
bash ./scripts/train_multitask.sh

Aknowledgement

Citation

If you find ApiQ or our code useful, please cite our paper:

@misc{ApiQ,
      title={ApiQ: Finetuning of 2-Bit Quantized Large Language Model}, 
      author={Baohao Liao and Christian Herold and Shahram Khadivi and Christof Monz},
      year={2024},
      eprint={2402.05147},
      archivePrefix={arXiv}
}

apiq's People

Contributors

baohaoliao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.