GithubHelp home page GithubHelp logo

thapasujit / xturingllmfinetuning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stochasticai/xturing

0.0 0.0 0.0 18.81 MB

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Home Page: https://xturing.stochastic.ai

License: Apache License 2.0

Python 100.00%

xturingllmfinetuning's Introduction

Stochastic.ai Stochastic.ai

Build, modify, and control your own personalized LLMs



xTuring provides fast, efficient and simple fine-tuning of open-source LLMs, such as Mistral, LLaMA, GPT-J, and more. By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it simple to build, modify, and control LLMs. The entire process can be done inside your computer or in your private cloud, ensuring data privacy and security.

With xTuring you can,

  • Ingest data from different sources and preprocess them to a format LLMs can understand
  • Scale from single to multiple GPUs for faster fine-tuning
  • Leverage memory-efficient methods (i.e. INT4, LoRA fine-tuning) to reduce hardware costs by up to 90%
  • Explore different fine-tuning methods and benchmark them to find the best performing model
  • Evaluate fine-tuned models on well-defined metrics for in-depth analysis

⚙️ Installation

pip install xturing

🚀 Quickstart

from xturing.datasets import InstructionDataset
from xturing.models import BaseModel

# Load the dataset
instruction_dataset = InstructionDataset("./examples/models/llama/alpaca_data")

# Initialize the model
model = BaseModel.create("llama_lora")

# Finetune the model
model.finetune(dataset=instruction_dataset)

# Perform inference
output = model.generate(texts=["Why LLM models are becoming so important?"])

print("Generated output by the model: {}".format(output))

You can find the data folder here.


🌟 What's new?

We are excited to announce the latest enhancements to our xTuring library:

  1. LLaMA 2 integration - You can use and fine-tune the LLaMA 2 model in different configurations: off-the-shelf, off-the-shelf with INT8 precision, LoRA fine-tuning, LoRA fine-tuning with INT8 precision and LoRA fine-tuning with INT4 precision using the GenericModel wrapper and/or you can use the Llama2 class from xturing.models to test and finetune the model.
from xturing.models import Llama2
model = Llama2()

## or
from xturing.models import BaseModel
model = BaseModel.create('llama2')
  1. Evaluation - Now you can evaluate any Causal Language Model on any dataset. The metrics currently supported is perplexity.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel

# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model
model = BaseModel.create('gpt2')

# Run the Evaluation of the model on the dataset
result = model.evaluate(dataset)

# Print the result
print(f"Perplexity of the evalution: {result}")
  1. INT4 Precision - You can now use and fine-tune any LLM with INT4 Precision using GenericLoraKbitModel.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel

# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')

# Run the fine-tuning
model.finetune(dataset)
  1. CPU inference - The CPU, including laptop CPUs, is now fully equipped to handle LLM inference. We integrated Intel® Extension for Transformers to conserve memory by compressing the model with weight-only quantization algorithms and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
# Make the necessary imports
from xturing.models import BaseModel

# Initializes the model: quantize the model with weight-only algorithms
# and replace the linear with Itrex's qbits_linear kernel
model = BaseModel.create("llama2_int8")

# Once the model has been quantized, do inferences directly
output = model.generate(texts=["Why LLM models are becoming so important?"])
print(output)
  1. Batch integration - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel

# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')

# Generate outputs on desired prompts
outputs = model.generate(dataset = dataset, batch_size=10)

An exploration of the Llama LoRA INT4 working example is recommended for an understanding of its application.

For an extended insight, consider examining the GenericModel working example available in the repository.


CLI playground

$ xturing chat -m "<path-to-model-folder>"

UI playground

from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
from xturing.ui import Playground

dataset = InstructionDataset("./alpaca_data")
model = BaseModel.create("<model_name>")

model.finetune(dataset=dataset)

model.save("llama_lora_finetuned")

Playground().launch() ## launches localhost UI

📚 Tutorials


📊 Performance

Here is a comparison for the performance of different fine-tuning techniques on the LLaMA 7B model. We use the Alpaca dataset for fine-tuning. The dataset contains 52K instructions.

Hardware:

4xA100 40GB GPU, 335GB CPU RAM

Fine-tuning parameters:

{
  'maximum sequence length': 512,
  'batch size': 1,
}
LLaMA-7B DeepSpeed + CPU Offloading LoRA + DeepSpeed LoRA + DeepSpeed + CPU Offloading
GPU 33.5 GB 23.7 GB 21.9 GB
CPU 190 GB 10.2 GB 14.9 GB
Time/epoch 21 hours 20 mins 20 mins

Contribute to this by submitting your performance results on other GPUs by creating an issue with your hardware specifications, memory consumption and time per epoch.


📎 Fine-tuned model checkpoints

We have already fine-tuned some models that you can use as your base or start playing with. Here is how you would load them:

from xturing.models import BaseModel
model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")
model dataset Path
DistilGPT-2 LoRA alpaca x/distilgpt2_lora_finetuned_alpaca
LLaMA LoRA alpaca x/llama_lora_finetuned_alpaca

Supported Models

Below is a list of all the supported models via BaseModel class of xTuring and their corresponding keys to load them.

Model Key
Bloom bloom
Cerebras cerebras
DistilGPT-2 distilgpt2
Falcon-7B falcon
Galactica galactica
GPT-J gptj
GPT-2 gpt2
LlaMA llama
LlaMA2 llama2
OPT-1.3B opt

The above mentioned are the base variants of the LLMs. Below are the templates to get their LoRA, INT8, INT8 + LoRA and INT4 + LoRA versions.

Version Template
LoRA <model_key>_lora
INT8 <model_key>_int8
INT8 + LoRA <model_key>_lora_int8

** In order to load any model's INT4+LoRA version, you will need to make use of GenericLoraKbitModel class from xturing.models. Below is how to use it:

model = GenericLoraKbitModel('<model_path>')

The model_path can be replaced with you local directory or any HuggingFace library model like facebook/opt-1.3b.

📈 Roadmap

  • Support for LLaMA, GPT-J, GPT-2, OPT, Cerebras-GPT, Galactica and Bloom models
  • Dataset generation using self-instruction
  • Low-precision LoRA fine-tuning and unsupervised fine-tuning
  • INT8 low-precision fine-tuning support
  • OpenAI, Cohere and AI21 Studio model APIs for dataset generation
  • Added fine-tuned checkpoints for some models to the hub
  • INT4 LLaMA LoRA fine-tuning demo
  • INT4 LLaMA LoRA fine-tuning with INT4 generation
  • Support for a Generic model wrapper
  • Support for Falcon-7B model
  • INT4 low-precision fine-tuning support
  • Evaluation of LLM models
  • INT3, INT2, INT1 low-precision fine-tuning support
  • Support for Stable Diffusion

🤝 Help and Support

If you have any questions, you can create an issue on this repository.

You can also join our Discord server and start a discussion in the #xturing channel.


📝 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


🌎 Contributing

As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.

xturingllmfinetuning's People

Contributors

stochasticromanageev avatar tushar2407 avatar sarthaklangde avatar marcosriveramartinez avatar subhash-stc avatar toan-do avatar riccardoromagnoli avatar yiliu30 avatar glennko avatar yujichai avatar cnbeining avatar georgehe4 avatar eltociear avatar j0hngou avatar romaa2000 avatar xiaoranzhou avatar semgrep-bot avatar shashankshet avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.