mlabonne / llm-course Goto Github PK

View Code? Open in Web Editor NEW

28.9K 28.9K 3.0K 6.75 MB

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Home Page: https://mlabonne.github.io/blog/

License: Apache License 2.0

Jupyter Notebook 100.00%

course large-language-models llm machine-learning roadmap

llm-course's Introduction

🐦 Follow me on X • 🤗 Hugging Face • 💻 Blog • 📙 Hands-on GNN

Hi, I'm a Machine Learning Scientist, Author, Blogger, and LLM Developer.

💼 Projects

The LLM Course: A popular curated list of resources to get into LLMs (>28k ⭐).
Hands-on GNNs: My book about graph neural networks published by Packt (all the code is open source).
LLM Tools: Automate LLM pipelines with Colab notebooks like LLM AutoEval, LazyMergekit, LazyAxolotl, and AutoQuant.

🤗 Models

AlphaMonarch-7B: Top performer in terms of reasoning + conversational abilities on a variety of benchmarks. [Demo]
NeuralBeagle14-7B: The most powerful 7B model (rank 10 on the entire Open LLM Leaderboard). [Demo]
Phixtral: Novel Mixture of Experts architecture with phi-2 models. [Demo]
Beyonder-4x7B-v3: Mixture of Experts with four excellent fine-tuned Mistral-7b models. [Demo]
NeuralMarcoro14: My previous best 7B model (rank 1 on the Open LLM Leaderboard 7B param). [Demo]
NeuralHermes: A DPO fine-tuned version of OpenHermes (extremely cost-efficient). [Demo]

llm-course's People

Contributors

Stargazers

Watchers

Forkers

bestcourses-ai alinandrei74 elfxxx hellojio tendaishoko care4truth gkakavel nathalierocelle allthingsllm okeefe4 harshul-24 son1128 deveshdutt2710 jeffara jackylens wendi-code seshakiran aaditya29 mohit-choithwani coinhubx apollohuang1 drgonzalomora spankyed ngthanhtin aqdesk yolantele fayezalhussein deltavml neeland nikolausn blackhawkee marloncepeda nguyenvanson1998 auyez-khassenov techthiyanes kustomzone krzysiaczek99 afirez manu87ds stephanrempel zjc17 kwasganguly jrhumberto f901107 songkq gemelgb patrickcnkm divein2learning abdoiiii abhinav23484 cmokale balakreshnan gulsmyigit elpolini manojpatilmoutgrowdigital lian-ai ale9806 charlie-xiaoqi finalyearplacementbc muhammadsajid1997 chengkai-huang webwahab rogetxtap aganiezgoda d-t-n mohan-chinnappan-n ebridge-llm 0x11c11e grv805 schosmiel maxreiss123 revanks chenyh19 owami onejune2018 enesbol ixrst riiduan lrochetta ymg2007 cesarcalvocobo samee99 jaislp3 openmindx syq23719034 hurricanejin akashad98 laper01 ukaserge kumar045 hal22 anil2k nirvanesque sshuster look4pritam chenzongxiong asheraz6019 allenkaichen jerryyu ritesh1991

llm-course's Issues

not able to quantize after fine tuning

i am not able to quantize and getting this error
FileNotFoundError: Could not find tokenizer.model in llama-2-7b-meditext or its parent; if it's in another directory, pass the directory as --vocab-dir
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5
what do i do?

consider adding content to learn about agents

I think agents are also becoming a critical components of what an LLM engineer has to implement, consider adding some contents to this excellent resource covering that.

A UTF-8 locale is required. Got ANSI_X3.4-1968

Cannot log into hugging face after fine tuning the model.

Turkish Version..

Hi! Is it okay for you, if we try to do similar one for Turkish users e.g referencing this repo and using the sources as well?
Best Regards,
Zaur Samedov!

Pls help, stuck with AutoGGUF

I tried to make ggufs of different models (one that was already available and one which I made using the lazymergekit).

I always get the same error how ever. It's this one (I edited the model name out but it happens with both I tested. They are Mistral 7b based ones):

GML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla T4, compute capability 7.5, VMM: yes
main: build = 2151 (704359e2)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing 'ModelName/modelname.fp16.bin' to 'ModelName/modelname.Q4_K_S.gguf' as Q4_K_S
llama_model_quantize: failed to quantize: failed to open ModelName/modelname.fp16.bin: No such file or directory
main: failed to quantize model from 'ModelName/modelname.fp16.bin'

Also, before that error, I get another error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 4.25.2 which is incompatible.
torchaudio 2.1.0+cu121 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.
torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.
torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.
torchvision 0.16.0+cu121 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.
Successfully installed gguf-0.6.0 numpy-1.24.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 protobuf-4.25.2 torch-2.1.2

WARNING: The following packages were previously imported in this runtime:
  [numpy]
You must restart the runtime in order to use newly installed versions.

Is there any solution? I would like to try the model I merged locally, I was even able to evaluate it in the leaderboard but I can't turn it into GGUF.
Also is there a dedicated GitHub page for that notebook?

Prompt is getting repeated in response

I tried to retrain Llama-2 model. I just followed the steps you have mentioned. But when I am generating the text with the following code snippet -

prompt = "What is a large language model?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

I am getting a weird response as below -

[INST] What is a large language model? [/INST]
[INST] What is a large language model? [/INST]
[INST] What is a large language model? [INST]
[INST] What is a large language model? [INST]
[INST] What is a large language model? [INST] [INST] What is a large language model? [/INST]
[INST] What is a large language model? [INST] [INST] What is a large language model? [/INST] [INST] What is a large language model? [INST] [INST] What is a large language model? [/INST] [INST] What is a large language model? [INST] [INST] What is a large language model? [/INST] [INST] What is a large language model? [INST] [INST] What is a large language model? [/INST] [INST] What is a

What could be the issue?

Cannot quantize after fine tuning on colab

Getting this error after on quantizing after fine tuning with the instructions for colab.

FileNotFoundError: Could not find tokenizer.model in llama-2-7b-meditext or its parent; if it's in another directory, pass the directory as --vocab-dir
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5
main: build = 1267 (bc9d3e3)
main: built with cc (Ubuntu 11.4.0-1ubuntu122.04) 11.4.0 for x86_64-linux-gnu
main: quantizing 'llama-2-7b-meditext/llama-2-7b-meditext.gguf.fp16.bin' to 'llama-2-7b-meditext/llama-2-7b-meditext.gguf.q4_k_m.bin' as Q4_K_M
llama_model_quantize: failed to quantize: failed to open llama-2-7b-meditext/llama-2-7b-meditext.gguf.fp16.bin: No such file or directory
main: failed to quantize model from 'llama-2-7b-meditext/llama-2-7b-meditext.gguf.fp16.bin'
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5
main: build = 1267 (bc9d3e3)
main: built with cc (Ubuntu 11.4.0-1ubuntu122.04) 11.4.0 for x86_64-linux-gnu
main: quantizing 'llama-2-7b-meditext/llama-2-7b-meditext.gguf.fp16.bin' to 'llama-2-7b-meditext/llama-2-7b-meditext.gguf.q5_k_m.bin' as Q5_K_M
llama_model_quantize: failed to quantize: failed to open llama-2-7b-meditext/llama-2-7b-meditext.gguf.fp16.bin: No such file or directory
main: failed to quantize model from 'llama-2-7b-meditext/llama-2-7b-meditext.gguf.fp16.bin'

Add example on fine-tuning for function calling

Hey @mlabonne

Great work on this repo!

Would be amazing if you would want to add an example of fine-tuning for function calling. This would gain a lot of traction due to devs wanting OSS alternatives to GPT.

Security

Can I translate it into chinese?

Hi Maxime
Thank you very much for writing such a tutorial.

Your tutorial is the most outstanding one I have seen, with comprehensive coverage and very complete explanations and experiments. May I translate it into Chinese?

Collaboration: Unsloth + llm-course

Hey @mlabonne! Actually found this repo via Linkedin! :) Happy New Year!

Had a look through your notebooks - they look sick! Interestingly I was trying myself to run axolotl via Google Colab to no avail.

Anyways I'm the maintainer of Unsloth, which makes QLoRA 2.2x faster and use 62% less memory! It would be awesome if we could somehow collaborate :)

I have a few examples:

Mistral 7b + Alpaca: https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing
DPO Zephyr replication: https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing
TinyLlama automatic RoPE Scaling from 2048 to 4096 tokens + full Alpaca dataset in 80 minutes. https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing (still running since TinyLlama was just released!)

Anyways great work again!

hi please tell me the approach to solve this problem

You have to solve a multi-label classification problem statement.
It contains two files: train.csv and test.csv.
The dataset contains the following columns:
- LossDescription: Description of Event
- ResultingInjuryDesc: Injury Description
- PartInjuredDesc: Body Part Injured Description
- Cause - Hierarchy 1: Cause Hierarchy 1
- Body Part - Hierarchy 1: Body Part Hierarchy 1
- Index: Identifier
Tasks:
- Perform exploratory data analysis (EDA) on the dataset.
- Train multi-label classification models to predict "Cause - Hierarchy 1" and "Body Part - Hierarchy 1" when other columns are given.
  Two models will be required to predict each target variable.

Llm

Great work. Can I translate in Tamil.

Its Great work, and I am requesting your permission to translate in Tamil.

Please specify a license

Hi, great articles, great Colabs, thanks!

My request: please specify a license for the repository so I would know if there are any limitations on the use of this code.

Cheers!

Issue after finetuning

Hi I have finetuned my custom dataset but finding diificulties to load the model during inference. can u help me regarding that

lazyaxolotl runpod not running

... because seemingly template not found anymore.
you can use image_name="winglian/axolotl-runpod:main-latest",
without #template_id="eul6o46pab",
but then get in the container: ... ServerApp] Bad config encountered during initialization: /workspace is outside root contents directory
currently have no time to look into this further

My Favorite Course

Train Custom data set PDF

Hello, Super interesting. I would like to train the model on my own data in pdf format.
How to adapt the code. instead of using # The instruction dataset to use
dataset_name = "mlabonne/guanaco-llama2-1k" I would like to replace it with
dataset= doc.pdf but it sucks.
Do you have an idea? THANKS

Issue with pad_token == eos_token : model not "learning when to stop"

Hey @mlabonne thanks a lot for the great resources!

I have been reading the Fine_tune_Llama_2_in_Google_Colab.ipynb notebook and I am encountering an issue.

Just to play around I have tried adapting your notebook to fine-tune a model to perform PII masking using this dataset (to do it very quickly I adapted the format such that examples look like this: <s>[INST] Mise à jour : l'heure de début de la thérapie physique a été modifiée à 8:46 AM. Lieu : Suite 348 Iva Junctions. Veuillez nous excuser pour le désagrément. [/INST] Mise à jour : l'heure de début de la thérapie physique a été modifiée à [TIME_1]. Lieu : [SECONDARYADDRESS_1] [STREET_1]. Veuillez nous excuser pour le désagrément. </s>).

After fine-tuning the model I noticed that it was continuously generating text, effectively never producing the EOS_TOKEN and thus only stopping at the max sequence length.

By looking online it seems that this might be related to the default DataCollatorForLanguageModeling (which gets passed to the SFTTrainer class by default).During training with that collator I think that the PAD tokens are getting masked out and excluded from the loss computation, thus leading the model not to "learn when to stop", and I see that you have added the PAD token to be the same as the EOS token with the following lines:

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

Do you know if this might actually be the issue here / do you have an idea for a fix? I tried to comment out the line where you set the 2 tokens to be the same, but in that case my model trains for a while and then the loss suddendly drops to 0 so something must be wrong!

DPO with Axolotl

It is possible to perform DPO with Axolotl. If I were to create a notebook for DPO fine-tuning, do you think it would be suitable for your repository?

Kernel is dying on Fine-tune Llama 2

Libraries & Versions:
Package Version

absl-py 1.4.0
accelerate 0.21.0
aiohttp 3.8.5
aiosignal 1.3.1
anyio 3.7.1
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
asttokens 2.2.1
astunparse 1.6.3
async-lru 2.0.4
async-timeout 4.0.2
attrs 23.1.0
Babel 2.12.1
backcall 0.2.0
beautifulsoup4 4.12.2
bitsandbytes 0.40.2
bleach 6.0.0
cachetools 5.3.1
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 3.2.0
cmake 3.27.1
comm 0.1.4
datasets 2.14.3
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.7
exceptiongroup 1.1.2
executing 1.2.0
fastjsonschema 2.18.0
filelock 3.12.2
flatbuffers 23.5.26
frozenlist 1.4.0
fsspec 2023.6.0
gast 0.4.0
google-auth 2.22.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.56.2
h5py 3.9.0
huggingface-hub 0.16.4
idna 3.4
importlib-metadata 6.8.0
importlib-resources 6.0.1
ipykernel 6.25.0
ipython 8.12.2
ipython-genutils 0.2.0
ipywidgets 8.1.0
jedi 0.19.0
Jinja2 3.1.2
json5 0.9.14
jsonschema 4.18.6
jsonschema-specifications 2023.7.1
jupyter 1.0.0
jupyter-client 8.3.0
jupyter-console 6.6.3
jupyter-core 5.3.1
jupyter-events 0.7.0
jupyter-lsp 2.2.0
jupyter-server 2.7.0
jupyter-server-terminals 0.4.4
jupyterlab 4.0.4
jupyterlab-pygments 0.2.2
jupyterlab-server 2.24.0
jupyterlab-widgets 3.0.8
keras 2.10.0
Keras-Preprocessing 1.1.2
libclang 16.0.6
lit 16.0.6
Markdown 3.4.4
MarkupSafe 2.1.3
matplotlib-inline 0.1.6
mistune 3.0.1
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
nbclient 0.8.0
nbconvert 7.7.3
nbformat 5.9.2
nest-asyncio 1.5.7
networkx 3.1
notebook 7.0.2
notebook-shim 0.2.3
numpy 1.24.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
opt-einsum 3.3.0
overrides 7.4.0
packaging 23.1
pandas 2.0.3
pandocfilters 1.5.0
parso 0.8.3
peft 0.4.0
pexpect 4.8.0
pickleshare 0.7.5
pip 20.0.2
pip-autoremove 0.10.0
pkg-resources 0.0.0
pkgutil-resolve-name 1.3.10
platformdirs 3.10.0
prometheus-client 0.17.1
prompt-toolkit 3.0.39
protobuf 3.19.6
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 12.0.1
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
Pygments 2.16.1
pyspark 3.4.1
python-dateutil 2.8.2
python-json-logger 2.0.7
python-version 0.0.2
pytz 2023.3
PyYAML 6.0.1
pyzmq 25.1.0
qtconsole 5.4.3
QtPy 2.3.1
referencing 0.30.2
regex 2023.6.3
requests 2.31.0
requests-oauthlib 1.3.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rpds-py 0.9.2
rsa 4.9
safetensors 0.3.1
scipy 1.10.1
Send2Trash 1.8.2
setuptools 44.0.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.4.1
stack-data 0.6.2
sympy 1.12
tensorboard 2.10.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorboardX 2.6.2
tensorflow-estimator 2.10.0
tensorflow-io-gcs-filesystem 0.33.0
termcolor 2.3.0
terminado 0.17.1
tinycss2 1.2.1
tokenizers 0.13.3
tomli 2.0.1
torch 2.0.1
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
transformers 4.31.0
triton 2.0.0
trl 0.4.7
typing-extensions 4.5.0
tzdata 2023.3
urllib3 1.26.16
wcwidth 0.2.6
webencodings 0.5.1
websocket-client 1.6.1
Werkzeug 2.3.6
wheel 0.34.2
widgetsnbextension 4.0.8
wrapt 1.15.0
xxhash 3.3.0
yarl 1.9.2
zipp 3.16.2

Script:
`import os
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

The model that you want to train from the Hugging Face hub

model_name = "NousResearch/Llama-2-7b-chat-hf"

The instruction dataset to use

dataset_name = "mlabonne/guanaco-llama2-1k"

Fine-tuned model name

new_model = "llama-2-7b-miniguanaco"

################################################################################

QLoRA parameters

################################################################################

LoRA attention dimension

lora_r = 64

Alpha parameter for LoRA scaling

lora_alpha = 16

Dropout probability for LoRA layers

lora_dropout = 0.1

################################################################################

bitsandbytes parameters

################################################################################

Activate 4-bit precision base model loading

use_4bit = True

Compute dtype for 4-bit base models

bnb_4bit_compute_dtype = "float16"

Quantization type (fp4 or nf4)

bnb_4bit_quant_type = "nf4"

Activate nested quantization for 4-bit base models (double quantization)

use_nested_quant = False

################################################################################

TrainingArguments parameters

################################################################################

Output directory where the model predictions and checkpoints will be stored

output_dir = "./results"

Number of training epochs

num_train_epochs = 1

Enable fp16/bf16 training (set bf16 to True with an A100)

fp16 = False
bf16 = False

Batch size per GPU for training

per_device_train_batch_size = 4

Batch size per GPU for evaluation

per_device_eval_batch_size = 4

Number of update steps to accumulate the gradients for

gradient_accumulation_steps = 1

Enable gradient checkpointing

gradient_checkpointing = True

Maximum gradient normal (gradient clipping)

max_grad_norm = 0.3

Initial learning rate (AdamW optimizer)

learning_rate = 2e-4

Weight decay to apply to all layers except bias/LayerNorm weights

weight_decay = 0.001

Optimizer to use

optim = "paged_adamw_32bit"

Learning rate schedule

lr_scheduler_type = "cosine"

Number of training steps (overrides num_train_epochs)

max_steps = -1

Ratio of steps for a linear warmup (from 0 to learning rate)

warmup_ratio = 0.03

Group sequences into batches with same length

Saves memory and speeds up training considerably

group_by_length = True

Save checkpoint every X updates steps

save_steps = 0

Log every X updates steps

logging_steps = 25

################################################################################

SFT parameters

################################################################################

Maximum sequence length to use

max_seq_length = None

Pack multiple short examples in the same input sequence to increase efficiency

packing = False

Load the entire model on the GPU 0

device_map = {"": 0}

Load dataset (you can process it here)

dataset = load_dataset(dataset_name, split="train")

Load tokenizer and model with QLoRA configuration

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)

Check GPU compatibility with bfloat16

if compute_dtype == torch.float16 and use_4bit:
major, _ = torch.cuda.get_device_capability()
if major >= 8:
print("=" * 80)
print("Your GPU supports bfloat16: accelerate training with bf16=True")
print("=" * 80)

Load base model

model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Load LLaMA tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

Load LoRA configuration

peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
)

Set training parameters

training_arguments = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard"
)

Set supervised fine-tuning parameters

trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)

Train model

trainer.train()

Save trained model

trainer.model.save_pretrained(new_model)`

Error: At: trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
Error operation not supported at line 351 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/pythonInterface.c
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit

In LazyMergekit.ipynb file Merged model cannot be uploaded in Huggingface

It is showing that secret does not exist. But I have tried different secret key.

All fine-tuned models should be available for inference with HF TGI

model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

All fine-tuned models should be available for inference with HF TGI
However showed NotSupportedError: Model fine-tuned mode is not available for inference with this client.
Is there any way to cope with?

LazyMergeKit - Tensor model.final_layernorm.weight required but not present in model ...

Hi there I'm trying to merge Phi-2 models using the following config:
`MODEL_NAME = "..."
yaml_config = """
models:

model: microsoft/phi-2
no parameters necessary for base model
model: rhysjones/phi-2-orange
parameters:
density: 0.5
weight: 0.5
model: cognitivecomputations/dolphin-2_6-phi-2
parameters:
density: 0.5
weight: 0.3
merge_method: ties
base_model: microsoft/phi-2
parameters:
normalize: true
dtype: float16

"""`

but i get the following error:
RuntimeError: Tensor model.final_layernorm.weight required but not present in model rhysjones/phi-2-orange

I tried with lxuechen/phi-2-dpo before insead of phi-2-orange but got the same error.

I'm executinhg on Google Collab with CPU Runtime with Remote_Code set to true.

Can someone help and tell me if I'm doing something wrong or if it just oesnt work with Phi?

Here is the full log:
mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle --trust-remote-code Warmup loader cache: 0% 0/3 [00:00<?, ?it/s] Fetching 10 files: 100% 10/10 [00:00<00:00, 9925.00it/s] Warmup loader cache: 33% 1/3 [00:00<00:00, 5.18it/s] Fetching 11 files: 100% 11/11 [00:00<00:00, 71977.14it/s] Warmup loader cache: 67% 2/3 [00:00<00:00, 5.58it/s] Fetching 10 files: 100% 10/10 [00:00<00:00, 31583.61it/s] Warmup loader cache: 100% 3/3 [00:00<00:00, 5.69it/s] 0% 1/2720 [00:00<00:02, 1276.42it/s] Traceback (most recent call last): File "/usr/local/bin/mergekit-yaml", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/content/mergekit/mergekit/options.py", line 76, in wrapper f(*args, **kwargs) File "/content/mergekit/mergekit/scripts/run_yaml.py", line 47, in main run_merge( File "/content/mergekit/mergekit/merge.py", line 90, in run_merge for _task, value in exec.run(): File "/content/mergekit/mergekit/graph.py", line 191, in run res = task.execute(**arguments) File "/content/mergekit/mergekit/io/tasks.py", line 73, in execute raise RuntimeError( RuntimeError: Tensor model.final_layernorm.weight required but not present in model rhysjones/phi-2-orange

plz delete this issue, sorry to bother

moe version update? and llama pro?

4-bit LLM Quantization with GPTQ Tokenizer stuck

I'm trying to run the 4-bit LLM Quantization with GPTQ notebook with my own fine-tuned Llama2 7b model. However, it is getting stuck at the tokenizer step:

tokenized_data = tokenizer("\n\n".join(data['text']), return_tensors='pt')

I already tried using the tokenizer from the merged fine-tune model as well as the tokenizer from the llama2 repo. However, it still hangs on this step. Would appreciate any help or tips on how to fix this.

Error in mergeKit

File "/usr/local/bin/mergekit-moe", line 8, in
sys.exit(main())

File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in_call_
return self.main(*args, **kwargs)

File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)

File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)

File r/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return callback(*args, **kwargs)

File "/content/mergekit/mergekit/options.py", line 76, in wrapper
f(*args, **kwargs)

File "/content/mergekit/mergekit/scripts/mixtral_moe.py", line 452, in main
config = MistralMOEConfig.model_validate(yaml.safe_load(config_source))

File
usr/local/lib/python3.10/dist-packages/pydantic/main.py", line 503, in model_validate
return cls. pydantic_validator.validate_python(

pydantic_core._pydantic_core.ValidationError: 1 validation error for MistralMOEConfig
experts

Field required [type=missing, input_value={'slices': [{'sources': [...}]}, 'dtype': 'float16'}, input_type=dict] information visit https://errors.pydantic.dev/2.5/v/missing

Add resources about training and finetuning for MOE models

中文版整理到这里了

将相关论文和博客的网址整理到一起，按照不同类别做了分组。

大语言模型学习路线图和关键技术

`ref_model` not needed in `Fine_tune_a_Mistral_7b_model_with_DPO.ipynb`

Hi here @mlabonne! Congratulations on your awesome work with this course 🤝🏻

After going through Fine_tune_a_Mistral_7b_model_with_DPO.ipynb I realised that there's no need to define the ref_model required by DPO, since when fine-tuning using LoRA, the reference model is not required, as the one without the adapters will be used to compute the logprobs, so you can remove the ref_model and the result will still be the same, but using even less resources.

Finally, as a tip, when using the DPOTrainer for full fine-tunes you can also specify precompute_ref_log_probs to compute those in advance before the actual fine-tune starts, so that the ref_model is not needed either.

Mikor es hogyan

Egyes itt jelenlevők tudnak segiteni? Uj vagyok

LazyMergeKit ERROR

mergekit-moe: command not found

mergekit-moe config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle --trust-remote-code
/bin/bash: line 1: mergekit-moe: command not found

Could save the fine-tune model as saffetensors?

Hi!
Could save the fine-tune model as saffetensors?
Thanks

New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning

Hi, I've uploaded colab that follows your article.

The Merge operation is missing, because I didn't know if you were interested.

Downloads.zip

RAG

Hi sir again! Do you plan to add some content for RAG? If not I'd like to sum some content and push them here.
Best Regards,

error in fine tune LLM using axolotl

/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to Accelerator is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an accelerate.DataLoaderConfiguration instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
warnings.warn(
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/axolotl/src/axolotl/cli/train.py", line 59, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(varargs, **kwargs)
File "/content/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/content/axolotl/src/axolotl/cli/train.py", line 55, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/content/axolotl/src/axolotl/train.py", line 104, in train
trainer = setup_trainer(
File "/content/axolotl/src/axolotl/utils/trainer.py", line 338, in setup_trainer
return trainer_builder.build(total_num_steps)
File "/content/axolotl/src/axolotl/core/trainer_builder.py", line 1245, in build
trainer = trainer_cls(
File "/content/axolotl/src/axolotl/core/trainer_builder.py", line 223, in init
super().init(_args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 539, in init
self.callback_handler = CallbackHandler(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer_callback.py", line 313, in init
self.add_callback(cb)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer_callback.py", line 330, in add_callback
cb = callback() if isinstance(callback, type) else callback
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/integration_utils.py", line 954, in init
raise RuntimeError("MLflowCallback requires mlflow to be installed. Run pip install mlflow.")
RuntimeError: MLflowCallback requires mlflow to be installed. Run pip install mlflow.
Exception ignored in: <function MLflowCallback.del at 0x7d8a76cbf400>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/integration_utils.py", line 1105, in del
self._auto_end_run
AttributeError: 'MLflowCallback' object has no attribute '_auto_end_run'
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1057, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'config.yaml']' returned non-zero exit status 1.`

RuntimeError: Expected to mark a variable ready only once... error while finetuning Llama 2

I am following along with the "Fine-tune Llama 2 in Google Colab" example notebook in Databricks, but I am receiving this error when I attempt to fine tune the model:

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 127 has been marked as ready twice. This means that multiple autograd engine  hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.

And here is the final block of the traceback:

File /databricks/python/lib/python3.10/site-packages/torch/autograd/__init__.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    195     retain_graph = create_graph
    197 # The reason we repeat same the comment below is that
    198 # some Python versions print out the first line of a multi-line function
    199 # calls in the traceback and some print out the last line
--> 200 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202     allow_unreachable=True, accumulate_grad=True)

I have tried turning off gradient checkpointing but I received the same error. I am using a g4dn.4xl cluster. Is the problem due to my verion of torch? or cuda? I'm not sure how to set the environment variable, but from what I've seen online it's not very helpful when dealing with these higher level libraries (peft, transformers). Some solutions mention fiddling with find_unused_parameters and _set_static_graph(), but I believe that is on the pytorch level of things, and not a changeable parameter in the code as it stands.

Mobile deploy of LLM project

Thanks your project~🚀

Perhaps deploying LLM (Language Learning Models) with a size of 1.8 to 2 billion parameters on mobile or edge devices will become the next hotspot. Here is a mention of work in this area.

https://github.com/wangzhaode/mnn-llm

LLM Course

INST problems in mistral 7b DPO script

See discussion here: https://huggingface.co/CultriX/NeuralTrix-7B-dpo/discussions/1

any reason why the finetuning llama notebook is running only on colab?

i tried running the same notebook on gcp A100 machine, and it failed on :

`File ~/.local/lib/python3.9/site-packages/transformers/utils/bitsandbytes.py:109, in set_module_quantized_tensor_to_device(module, tensor_name, device, value, fp16_statistics)
107 new_value = old_value.to(device)
108 elif isinstance(value, torch.Tensor):
--> 109 new_value = value.to(device)
110 else:
111 new_value = torch.tensor(value, device=device)

RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.`

on colab it work perfectly.
any idea ?

Dependency Map and minimum path for each category

This repo is stunning! Kudos for the creators and maintainers, foremost!

I want to contribute with a suggestion.

For each "path" make a visual guidance categorizing with colors or other approach the minimal path.

Also, I want to know if in this course as it is organized it's possible to start with LLM Engineer path without LLM Fundamentals and LLM Fundamentals Path as many audiences is developers without math and data sciences skills just want to create applications with API, Vectors DBs all boundaries tools and technics to use LLMs models but not get deep on them or how LLMs models works under the root.

Best roadmap and must follow Repo for this decade for everyone that's needs to acquire knowledge in this field. Or learning at most how to use AI and LLMs or be unemployed in near future. Sad but true.

cuda out of memory

VRAM doesn't clear even when I run cell [7].

mlabonne / llm-course Goto Github PK

llm-course's Introduction

💼 Projects

🤗 Models

llm-course's People

Contributors

Stargazers

Watchers

Forkers

llm-course's Issues

Libraries & Versions: Package Version

The model that you want to train from the Hugging Face hub

The instruction dataset to use

Fine-tuned model name

QLoRA parameters

LoRA attention dimension

Alpha parameter for LoRA scaling

Dropout probability for LoRA layers

bitsandbytes parameters

Activate 4-bit precision base model loading

Compute dtype for 4-bit base models

Quantization type (fp4 or nf4)

Activate nested quantization for 4-bit base models (double quantization)

TrainingArguments parameters

Output directory where the model predictions and checkpoints will be stored

Number of training epochs

Enable fp16/bf16 training (set bf16 to True with an A100)

Batch size per GPU for training

Batch size per GPU for evaluation

Number of update steps to accumulate the gradients for

Enable gradient checkpointing

Maximum gradient normal (gradient clipping)

Initial learning rate (AdamW optimizer)

Weight decay to apply to all layers except bias/LayerNorm weights

Optimizer to use

Learning rate schedule

Number of training steps (overrides num_train_epochs)

Ratio of steps for a linear warmup (from 0 to learning rate)

Group sequences into batches with same length

Saves memory and speeds up training considerably

Save checkpoint every X updates steps

Log every X updates steps

SFT parameters

Maximum sequence length to use

Pack multiple short examples in the same input sequence to increase efficiency

Load the entire model on the GPU 0

Load dataset (you can process it here)

Load tokenizer and model with QLoRA configuration

Check GPU compatibility with bfloat16

Load base model

Load LLaMA tokenizer

Load LoRA configuration

Set training parameters

Set supervised fine-tuning parameters

Train model

Save trained model

no parameters necessary for base model

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Libraries & Versions:
Package Version