GithubHelp home page GithubHelp logo

longqlora's Introduction

LongQLoRA: Efficient and Effective Method to Extend Context Length of LLMs

Technical Report

Technical Report: LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

Introduction

LongQLoRA is a memory-efficient and effective method to extend context length of Large Language Models with less training GPUs. On a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile dataset after only 1000 finetuning steps, our model outperforms LongLoRA and is very close to MPT-7B-8K.

Evaluation perplexity on PG19 validation and Proof-pile test datasets in evaluation context length of 8192:

Model PG19 Proof-pile
LLaMA2-7B >1000 >1000
MPT-7B-8K 7.98 2.67
LongLoRA-LoRA-7B-8K 8.20 2.78
LongLoRA-Full-7B-8K 7.93 2.73
LongQLoRA-7B-8K 7.96 2.73

Evaluation perplexity of 7B models on PG19 validation and Proof-pile test datasets in evaluation context length from 1024 to 8192:

Dataset

We sample about 54k long text from Redpajama dataset to finetune pretrained models, whose token lengths ranging from 4096 to 32768.

We also build a long context instruction dataset for supervised finetuning chat models. This dataset contains 39k instruction data, mainly including book summarization, Natural Questions, subset of LongQA and Evol-Instruct of WizardLM. In order to adapt to the target length of 8192, the max token number of each data is 8192. The distribution is as follows.

Dataset Description
๐Ÿค—LongQLoRA-Pretrain-Data-54k Include 54212 data, used to finetune pretrained model
๐Ÿค—LongQLoRA-SFT-Data-39k Include 38821 data, used to finetune chat model

Model

Model Context Length Description
๐Ÿค—LongQLoRA-Llama2-7b-8k 8192 Finetuned with LongQLoRA-Pretrain-Data-54k for 1k steps based on LLaMA2-7B
๐Ÿค—LongQLoRA-Vicuna-13b-8k 8192 Finetuned with LongQLoRA-SFT-Data-39k for 1.7k steps based on Vicuna-13B-V1.5
๐Ÿค—LongQLoRA-Llama2-7b-8k-lora 8192 LoRA weights
๐Ÿค—LongQLoRA-Vicuna-13b-8k-lora 8192 LoRA weights

Training

The training configs are saved in the train_args directory, some parameters are as follows:

  • sft: Do sft task if set as True, otherwise do pretraining task.
  • model_max_length: The target context length.
  • max_seq_length: The max sequence length in training, should be less than or equal to model_max_length
  • logging_steps: Log training loss every n steps.
  • save_steps: Save model every n steps.
  • lora_rank: The LoRA rank in training.

Extend context length of pretrained model LLaMA2-7B:

deepspeed train.py --train_args_file ./train_args/llama2-7b-pretrain.yaml

Extend context length of chat Model Vicuna-13B:

deepspeed train.py --train_args_file ./train_args/vicuna-13b-sft.yaml

Inference

You can merge the lora weight to base model:

cd script
python merge_lora.py

Inference with pretrained model:

cd script/inference
python inference.py

Chat with chat model:

cd script/inference
python chat.py

Evaluation

Download the evaluation dataset tokenized by LLaMA2 by LongLoRA.

Dataset
๐Ÿค—PG19-validation.bin
๐Ÿค—PG19-test.bin
๐Ÿค—Proof-pile-test.bin

Evaluate the perplexity of models. You can set load_in_4bit as True to save memory:

cd script/evaluate
python evaluate.py \
      --batch_size 1 \
      --base_model YeungNLP/LongQLoRA-Llama2-7b-8k \
      --seq_len 8192 \
      --context_size 8192 \
      --sliding_window 8192 \
      --data_path pg19-validation.bin

Evaluate the perplexity of models with LoRA weights:

cd script/evaluate
python evaluate.py \
      --batch_size 1 \
      --base_model YeungNLP/LongQLoRA-Llama2-7b-8k \
      --peft_model LongQLoRA-Llama2-7b-8k-lora\
      --seq_len 8192 \
      --context_size 8192 \
      --sliding_window 8192 \
      --data_path pg19-validation.bin

Examples

The examples generated by LongQLoRA-Vicuna-13b-8k ars as follows.

Examples of long context generartion, the input context lengths are between 4096 and 8192 which are larger than original context length of LLaMA2.

Examples of short context generartion, model keep the performance of short instruction following.

Citation

@misc{yang2023longqlora,
      title={LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models}, 
      author={Jianxin Yang},
      year={2023},
      eprint={2311.04879},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

longqlora's People

Contributors

yangjianxin1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.