georgian-io / llm-finetuning-toolkit Goto Github PK

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

License: Apache License 2.0

Shell 1.20% Python 98.17% Dockerfile 0.52% Makefile 0.11%

classification fine-tuning finetuning large-language-models nlp nlp-machine-learning summarization falcon flan-t5 llama2

llm-finetuning-toolkit's Introduction

LLM Finetuning Toolkit

Overview

LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM fine-tuning experiments on your data and gathering their results. From one single yaml config file, control all elements of a typical experimentation pipeline - prompts, open-source LLMs, optimization strategy and LLM testing.

Installation

pipx (recommended)

pipx installs the package and dependencies in a separate virtual environment

pipx install llm-toolkit

pip

pip install llm-toolkit

Quick Start

This guide contains 3 stages that will enable you to get the most out of this toolkit!

Basic: Run your first LLM fine-tuning experiment
Intermediate: Run a custom experiment by changing the components of the YAML configuration file
Advanced: Launch series of fine-tuning experiments across different prompt templates, LLMs, optimization techniques -- all through one YAML configuration file

Basic

llmtune generate config
llmtune run ./config.yml

The first command generates a helpful starter config.yml file and saves in the current working directory. This is provided to users to quickly get started and as a base for further modification.

Then the second command initiates the fine-tuning process using the settings specified in the default YAML configuration file config.yaml.

Intermediate

The configuration file is the central piece that defines the behavior of the toolkit. It is written in YAML format and consists of several sections that control different aspects of the process, such as data ingestion, model definition, training, inference, and quality assurance. We highlight some of the critical sections.

Flash Attention 2

To enable Flash-attention for supported models. First install flash-attn:

pipx

pipx inject llm-toolkit flash-attn --pip-args=--no-build-isolation

pip

pip install flash-attn --no-build-isolation

Then, add to config file.

model:
  torch_dtype: "bfloat16" # or "float16" if using older GPU
  attn_implementation: "flash_attention_2"

Data Ingestion

An example of what the data ingestion may look like:

data:
  file_type: "huggingface"
  path: "yahma/alpaca-cleaned"
  prompt:
    ### Instruction: {instruction}
    ### Input: {input}
    ### Output:
  prompt_stub: { output }
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42

While the above example illustrates using a public dataset from Hugging Face, the config file can also ingest your own data.

   file_type: "json"
   path: "<path to your data file>

   file_type: "csv"
   path: "<path to your data file>

The prompt fields help create instructions to fine-tune the LLM on. It reads data from specific columns, mentioned in {} brackets, that are present in your dataset. In the example provided, it is expected for the data file to have column names: instruction, input and output.
The prompt fields use both prompt and prompt_stub during fine-tuning. However, during testing, only the prompt section is used as input to the fine-tuned LLM.

LLM Definition

model:
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  quantize: true
  bitsandbytes:
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

# LoRA Params -------------------
lora:
  task_type: "CAUSAL_LM"
  r: 32
  lora_dropout: 0.1
  target_modules:
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj

While the above example showcases using Llama2 7B, in theory, any open-source LLM supported by Hugging Face can be used in this toolkit.

hf_model_ckpt: "mistralai/Mistral-7B-v0.1"

hf_model_ckpt: "tiiuae/falcon-7b"

The parameters for LoRA, such as the rank r and dropout, can be altered.

lora:
  r: 64
  lora_dropout: 0.25

Quality Assurance

qa:
  llm_metrics:
    - length_test
    - word_overlap_test

To ensure that the fine-tuned LLM behaves as expected, you can add tests that check if the desired behaviour is being attained. Example: for an LLM fine-tuned for a summarization task, we may want to check if the generated summary is indeed smaller in length than the input text. We would also like to learn the overlap between words in the original text and generated summary.

Artifact Outputs

This config will run fine-tuning and save the results under directory ./experiment/[unique_hash]. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.

After the script finishes running you will see these distinct artifacts:

/dataset # generated pkl file in hf datasets format
/model # peft model weights in hf format
/results # csv of prompt, ground truth, and predicted values
/qa # csv of test results: e.g. vector similarity between ground truth and prediction

Once all the changes have been incorporated in the YAML file, you can simply use it to run a custom fine-tuning experiment!

python toolkit.py --config-path <path to custom YAML file>

Advanced

Fine-tuning workflows typically involve running ablation studies across various LLMs, prompt designs and optimization techniques. The configuration file can be altered to support running ablation studies.

Specify different prompt templates to experiment with while fine-tuning.

data:
  file_type: "huggingface"
  path: "yahma/alpaca-cleaned"
  prompt:
    - >-
      This is the first prompt template to iterate over
      ### Input: {input}
      ### Output:
    - >-
      This is the second prompt template
      ### Instruction: {instruction}
      ### Input: {input}
      ### Output:
  prompt_stub: { output }
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42

Specify various LLMs that you would like to experiment with.

model:
  hf_model_ckpt:
    [
      "NousResearch/Llama-2-7b-hf",
      mistralai/Mistral-7B-v0.1",
      "tiiuae/falcon-7b",
    ]
  quantize: true
  bitsandbytes:
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

Specify different configurations of LoRA that you would like to ablate over.

lora:
  r: [16, 32, 64]
  lora_dropout: [0.25, 0.50]

Extending

The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, fine-tuning, inference, and quality assurance testing, is designed to be easily extendable.

Contributing

Open-source contributions to this toolkit are welcome and encouraged. If you would like to contribute, please see CONTRIBUTING.md.

llm-finetuning-toolkit's People

Contributors

Stargazers

Watchers

Forkers

kinddevil hbcbh1999 cxz jansystemic tonywhite11 binshi-bing jamun23 zongking123 ai-jie01 selinz mikejin5c jojojocelyn big-data-ai techbhaskar techthiyanes ukaserge justsujay parnazi ssahgal anthonyprinaldi dcoinhub k2m5t2 izardy chryron akashmandlik555 humbertomendes mistrywoman hamidbekamiri mbz-0 calculatedcontent brunoscaglione kalchakra13 silvacarl2 naryaai avetakashi manoelbritto cwickniss weezy-is-coding sdahal51 stmnk jitu028 jekhardt tsaingra skaiphd chomolungma shariq03 wsf monster119120 csyanghan wonigox mushfiqur-rahman-robin kp-forks samiratzn yesyesyeskumar mattwanjia msquarme kimty15 byh711 joannku suncong132 saravananpsg viveksinghds balmasi chanh-1 bpmct pavai-research zhangtianer521 bhawnapiryani sailfish009 narasimmansaravana1994 jxzhangjhu hughes-research rohitsaha benjaminye viveksingh-ctrl sinclairhudson sajjanin naqi akatakan fivehills atnafuatx ykvd89sri8 mallelavamshi saeednajafi theapproach

llm-finetuning-toolkit's Issues

quickstart basic - missing qa/llm_tests:?

Ran:
llmtune generate config
llmtune run ./config.yml

Things worked well (once I fixed my mistake with Mistral/huggingface repo permissions). The job ran very fast and put results into the "experiment" directory. But the experiment/XXX/results/ directory only has a "results.csv" file in it. I expected there to be results from the qa/llm_tests section in the config.yml file, which looks like this:
qa:
llm_tests:
- jaccard_similarity
- dot_product
- rouge_score
- word_overlap
- verb_percent
- adjective_percent
- noun_percent
- summary_length

Do I have to do something extra to get the qa to run?

cuda device-side runtime error when training on custom dataset for JSON outputs

Describe the bug
When attempting to train on this dataset: https://huggingface.co/datasets/azizshaw/text_to_json

To Reproduce
Steps to reproduce the behaviour:
Checkout main branch
Replace the data ingestion portion of llmtune/config.yml with:

data:
  file_type: "huggingface" # one of 'json', 'csv', 'huggingface'
  path: "azizshaw/text_to_json"
  prompt:
    >- # prompt, make sure column inputs are enclosed in {} brackets and that they match your data
    {instruction}
    Now create a json object for the following scenario
    {input}
  prompt_stub:
    >- # Stub to add for training at the end of prompt, for test set or inference, this is omitted; make sure only one variable is present
    {output}
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42

And then run

llmtune run llmtune/config.yml

Expected behavior
To my knowledge, this should run without error.

Environment:

OS: Ubuntu 20.04
running locally on a 3090
using the developer poetry environment/shell

This bug doesn't occur on the normal dataset, just on this other one. So, it could be something with a specific token or encoding in this dataset? Or there could be an issue with JSON outputs interfering with YAML syntax in the config.

[LLM Test] Exact Match

Evaluate exact match between ground truth vs predicted

`pipx` installation doesn't work

Describe the bug
Having trouble to install with pipx

To Reproduce
Steps to reproduce the behavior:

brew install pipx
pipx install llm-toolkit

Expected behavior
Installs fine

Screenshots

Environment:

OS: MacOS

After Running command llmtune run ./config.yml not getting output

PS C:\Users\Administrator> llmtune run ./config.yml
C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
[nltk_data] Downloading package stopwords to
[nltk_data] C:\Users\Administrator\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data] C:\Users\Administrator\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] C:\Users\Administrator\AppData\Roaming\nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
───────────────────────────────── Loading Data ─────────────────────────────────
Loading formatted dataset from directory experiment\sHR8ns\dataset
╭─ Train Example - Raw ────────────────╮╭─ Inference Example - Raw ────────────╮
│ ┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓ ││ ┏━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ │
│ ┃ output ┃ input ┃ instruc… ┃ ││ ┃ output ┃ input ┃ instruct… ┃ │
│ ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩ ││ ┡━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │
│ │ This is a │ The city │ Identify │ ││ │ Early, │ She left │ Rearrange │ │
│ │ grade 3 │ lights │ the │ ││ │ she left │ the │ the │ │
│ │ level │ that │ level of │ ││ │ the │ party │ following │ │
│ │ poem. │ spring to │ this │ ││ │ party. │ early │ sentence │ │
│ │ │ life, │ poem. │ ││ │ │ │ to make │ │
│ │ │ Bright │ │ ││ │ │ │ the │ │
│ │ │ fires to │ │ ││ │ │ │ sentence │ │
╰──────────────────────────────────────╯╰──────────────────────────────────────╯
╭─ Train Example - Formatted ──────────╮╭─ Inference Example - Formatted ──────╮
│ Below is an instruction that ││ Below is an instruction that │
│ describes a task. Write a response ││ describes a task. Write a response │
│ that appropriately completes the ││ that appropriately completes the │
│ request. ### Instruction: Identify ││ request. ### Instruction: Rearrange │
│ the level of this poem. ### Input: ││ the following sentence to make the │
│ The city lights that spring to life, ││ sentence more interesting. ### │
│ Bright fires to the endless night, ││ Input: She left the party early ### │
│ A promise of a better life ││ Output: │
│ That never quite can put things ││ │
│ right. ### Output:This is a grade 3 ││ │
│ level poem. ││ │
╰──────────────────────────────────────╯╰──────────────────────────────────────╯
Dataset Saved at experiment\sHR8ns\dataset
Post-Split data size:
Train: 500
Test: 25
──────────────────────────────── 😃 Finetuning ─────────────────────────────────
Fine-Tuned Model Found at experiment\sHR8ns\weights... skipping training
────────────────────────────────── 🧐 Testing ──────────────────────────────────
Inference Results Found at experiment\sHR8ns\results

Add data distributed training capabilities.

There is no good & easy-to-start end-to-end distributed training example on the web. Plus, there are so many ways of doing this: via raw PyTorch, via Ray Train, via TorchX, via Accelerate or via DeepSpeed.

How could I do this with toolkit?

Publish the documentation on Netlify or GitHub Pages instead of the current solution.

It's much better to publish documentation on a dedicated static hosting solution.

https://docs.github.com/en/pages/getting-started-with-github-pages or https://medium.com/swlh/publish-a-static-website-in-a-day-with-mkdocs-and-netlify-3cc076d0efaf

[Toolkit] Parameterize config file path

Would be nice to be able to specify where the config file is.

`jsonl` support

see related issue: ICRAR/ijson#96

Change this file to include JSONL and pass the correct parameter as mentioned above. https://github.com/georgian-io/LLM-Finetuning-Toolkit/blob/main/llmtune/data/ingestor.py

[LoRA] Use Validation Set

If I have:

test_split: 0.1
train_split: 0.8

Maybe we can get calc_val_split=1-0.1-0.8=0.1 split as validation. Maybe also apply something like max(calc_val_split, 0.05) to prevent val split to be too big

Make column variable format in prompt `{{ var }}` instead of `{ var }`

This improves user experience when desire output is json

E.g.

prompt_stub:
  >-
  {
    "foo": {col_1},
    "bar": {col_2}
  }

The current approach would not work as we're capturing everything inside {}

JSON output test

A lot of people want LLMs to output JSON for easy parsing/post-processing

A test that passes if the output is valid JSON, fails otherwise.

[README] Image not showing up on PyPI

Caused by using relative path instead of full link

How do you handle history?

Some of my examples of historical chat logs. How should I incorporate this into the input ideally?

Add unit tests for the code.

We should cover most of our functionality with unit tests and report the coverage back to the README.

[CLI] Add `llmtune inference [experiment_dir]`

The command llmtune inference [experiment_dir] aims to provide a versatile interface for running inference on pre-trained language models, allowing users to:

Load and run inference on a dataset; or
Provide arbitrary text inputs for inference for spot checks; or
Specify specific inputs to be injected in prompt template for inference

Proposed CLI

llmtune inference [experiment_dir] [options]

Arguments

experiment_dir: The experiment directory from finetuning experiments

Options

--dataset [dataset_path]: Path to a dataset (e.g., CSV, JSON, or Huggingface)
--text-input [text]: An arbitrary text input to run inference on. This option can be used for a single text input or for quick manual inference.
--column [name=value]: Allows specification of a column name and value for custom inputs. This option can be used multiple times to specify different column values.

Examples

Inference on a dataset:

llmtune inference ./my_experiment --dataset ./data/my_dataset.csv

Inference on arbitrary text:

llmtune inference ./my_experiment --text-input "This is an example text input for inference."

Inference with specific input values:

llmtune inference ./my_experiment --column column_1="foo" --column column_2="bar"

Related to: #160

example config file to run inference only on fine-tuned model

Is it possible to provide a config file that shows how to run inference on an already fine-tuned model?

I have run the starter config, and it looks like the final PEFT model weights are in experiment/XXX/weights/.

So how do I re-run inference only (and possibly qa checks) on that model?

Change `infer_all` to `infer_test_set` for inference module

Makes it clear that this method performs inference on the test set

Add ROADMAP section to the Readme

Add versioning convention for the package.

References:

https://py-pkgs.org/07-releasing-versioning.html
https://packaging.python.org/en/latest/discussions/versioning/
https://peps.python.org/pep-0440/

CLI to Generate Example `config.yml`

For better usability, instead of having to copy config.yml out of the source repo. We can write a simple script to download the file and output to user's current working directory

Remove unused `accelerate` code

There are a bunch of unused code relating to accelerate. Should remove to keep code cleaner

[Feature Request]: Official Dockerfile / Docker-Compose

It would be nice if this project had an official Docker image and a docker-compose example - that would make trying it out easier for a lot of folks 😄

Rename the current Installation section to the Development section, and add a new Installation section with simplified instructions for installing the PyPI package

Quickstart Basic uses a very large model and is slow.

The basic "quickstart" example downloads Mistral-7B-Instruct-v0.2, which is ~15GB, taking me over 20 minutes to download. A smaller model should be used as a quickstart example.

To Reproduce
Steps to reproduce the behavior:

Follow the "basic" level of quickstart

The basic version of the quickstart should be, in my opinion, a 10 minute (max) process and not require so much disk space.

Environment:

OS: Ubuntu 22.04
Packages Installed

Add docker image package.

Make sure we have docker image for toolkit.

Latency and Throughput Calculation

How is latency and throughput is being measured for Llama 2 7B model inference benchmarking using TGI. Reference

Trying to access gated repo error, Quickstart Basic

After installation, run:
llmtune generate config
==> works fine
llmtune run ./config.yml
==> get this error

OSError: You are trying to access a gated repo.
Make sure to request access at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 and pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>.

So then I do:
huggingface-cli login
===> login successfully
llmtune run ./config.yml
==> get same error

Any ideas?

Update README with instructions ho-to pull docker image and install pipy package.

Make sure we have:

Tested pipy package.
Tested docker package.
Agreed on versioning.
Have stable release process.

[Dataset] Dataset Generation Always Returns Cached Version

Describe the bug
At dataset creation, the dataset generated will always get the cached version despite change in file.

To Reproduce

Run toolkit.py
Ctrl-C
Add a line in the dataset
toolkit.py will not create a new dataset with desired changes

Expected behavior

Dataset to be generated with new data

Environment:

OS: Ubuntu

Add comment to indicate tf32 won't be available for older GPUs

Describe the bug
I'm trying to run this toolkit on colab notebook with T4 GPU and ran into errors. In order to get it working, I needed to turn bf16 and tf32 to false, and fp16 to true. There's already a note for the bf16 and fp16, maybe we can add a note for tf32 as well.

Add a Makefile to simplify the execution of tests, styling, and other Bash commands.

Ensure that we include a Makefile containing all the necessary development commands, such as how to run tests, perform releases, and execute style checks, among others.

For a great example, see the Makefile at: https://github.com/huggingface/transformers/blob/main/Makefile

Add flash_attention_1/flash_attention_2 support & examples

Are we supporting flash_attention feature? https://github.com/Dao-AILab/flash-attention/tree/main

Add code coverage report for each PR, block PRs if coverage decreases

Reference:

https://coverage.readthedocs.io/en/7.4.4/
https://pypi.org/project/pytest-cov/

https://github.com/marketplace/actions/code-coverage-summary
https://github.com/marketplace/actions/code-coverage-report-difference

Allow custom train/test datasets

Is your feature request related to a problem? Please describe.
I'm working on a problem that requires me to split my data in a specific way (base on dates). Right now the config only allows for a single dataset to be provided and it internally does a train-test split based on the values provided for the test_size and train_size parameters.

Describe the solution you'd like
Ideally, an option to specify paths to both train and test data.

Describe alternatives you've considered
The alternative would be to add in support for other types of data splitting which I don't think makes sense for this repo to include.

Additional context
None

Add GitHub Actions CI for checking style, running tests, publishing Docker images, PyPI packages, and documentation.

Ensure all releases, style checks, and unit tests can be run via CI, blocking any PRs that fail CI.

For Docker packages, use: https://github.com/orgs/georgian-io/packages

For PyPI packages, use: https://pypi.org/

question about fine tuning falcon

Hello
I ran the falcon classification task uaing the following command:
!python falcon_classification.py --lora_r 64 --epochs 1 --dropout 0.1 # finetune Falcon-7B on newsgroup classification dataset
Upon inspecting the model, I find that many of the layers are full rank and not the lower rank

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("experiments/classification-sampleFraction-0.99_epochs-1_rank-64_dropout-0.1/assets")

Here is a screenshot showing this

Is this expected behavior ?

Add pipy package.

Make sure we have pipy package for toolkit.

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f648' in position 0: character maps to <undefined>

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f648' in
position 0: character maps to

During handling of the above exception, another exception occurred:

+--------------------- Traceback (most recent call last) ---------------------+
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\llmtune\cli\toolki |
| t.py:122 in run |
| |
| 119 config = yaml.safe_load(file) |
| 120 config = Config(**config) |
| 121 |
| > 122 run_one_experiment(config, config_path) |
| 123 |
| 124 |
| 125 @generate_app.command("config") |
| |
| +-------------------------------- locals ---------------------------------+ |
| | config = Config( | |
| | save_dir='./experiment/', | |
| | ablation=AblationConfig( | |
| | use_ablate=False, | |
| | study_name='ablation' | |
| | ), | |
| | data=DataConfig( | |
| | file_type='json', | |
| | path='alpaca_data_cleaned.json', | |
| | prompt='Below is an instruction that describes a | |
| | task. Write a response that appropriat'+89, | |
| | prompt_stub='{output}', | |
| | train_size=500, | |
| | test_size=25, | |
| | train_test_split_seed=42 | |
| | ), | |
| | model=ModelConfig( | |
| | hf_model_ckpt='facebook/opt-125m', | |
| | device_map='auto', | |
| | torch_dtype='bfloat16', | |
| | attn_implementation=None, | |
| | quantize=True, | |
| | bitsandbytes=BitsAndBytesConfig( | |
| | load_in_8bit=False, | |
| | llm_int8_threshold=6.0, | |
| | llm_int8_skip_modules=None, | |
| | llm_int8_enable_fp32_cpu_offload=False, | |
| | llm_int8_has_fp16_weight=False, | |
| | load_in_4bit=True, | |
| | bnb_4bit_compute_dtype='bfloat16', | |
| | bnb_4bit_quant_type='nf4', | |
| | bnb_4bit_use_double_quant=True | |
| | ) | |
| | ), | |
| | lora=LoraConfig( | |
| | r=32, | |
| | task_type='CAUSAL_LM', | |
| | lora_alpha=64, | |
| | bias='none', | |
| | lora_dropout=0.1, | |
| | target_modules='all-linear', | |
| | fan_in_fan_out=False, | |
| | modules_to_save=None, | |
| | layers_to_transform=None, | |
| | layers_pattern=None | |
| | ), | |
| | training=TrainingConfig( | |
| | training_args=TrainingArgs( | |
| | num_train_epochs=1, | |
| | per_device_train_batch_size=4, | |
| | gradient_accumulation_steps=4, | |
| | gradient_checkpointing=True, | |
| | optim='paged_adamw_32bit', | |
| | logging_steps=1, | |
| | learning_rate=0.0002, | |
| | bf16=True, | |
| | tf32=True, | |
| | fp16=False, | |
| | max_grad_norm=0.3, | |
| | warmup_ratio=0.03, | |
| | lr_scheduler_type='constant', | |
| | save_steps=500 | |
| | ), | |
| | sft_args=SftArgs( | |
| | max_seq_length=1024, | |
| | neftune_noise_alpha=None | |
| | ) | |
| | ), | |
| | inference=InferenceConfig( | |
| | max_length=None, | |
| | max_new_tokens=256, | |
| | min_length=0, | |
| | min_new_tokens=None, | |
| | early_stopping=False, | |
| | max_time=None, | |
| | do_sample=True, | |
| | num_beams=1, | |
| | num_beam_groups=1, | |
| | penalty_alpha=None, | |
| | use_cache=True, | |
| | temperature=0.8, | |
| | top_k=50, | |
| | top_p=0.9, | |
| | typical_p=1.0, | |
| | epsilon_cutoff=0.0, | |
| | eta_cutoff=0.0, | |
| | diversity_penalty=0.0, | |
| | repetition_penalty=1.0, | |
| | encoder_repetition_penalty=1.0, | |
| | length_penalty=1.0, | |
| | no_repeat_ngram_size=0, | |
| | bad_words_ids=None, | |
| | force_words_ids=None, | |
| | renormalize_logits=False | |
| | ), | |
| | qa=QaConfig( | |
| | llm_tests=[ | |
| | 'jaccard_similarity', | |
| | 'dot_product', | |
| | 'rouge_score', | |
| | 'word_overlap', | |
| | 'verb_percent', | |
| | 'adjective_percent', | |
| | 'noun_percent', | |
| | 'summary_length' | |
| | ] | |
| | ) | |
| | ) | |
| | config_path = './config.yml' | |
| | configs = [ | |
| | { | |
| | 'save_dir': './experiment/', | |
| | 'ablation': {'use_ablate': False}, | |
| | 'data': { | |
| | 'file_type': 'json', | |
| | 'path': 'alpaca_data_cleaned.json', | |
| | 'prompt': 'Below is an instruction that | |
| | describes a task. Write a response that appropriat'+89, | |
| | 'prompt_stub': '{output}', | |
| | 'test_size': 25, | |
| | 'train_size': 500, | |
| | 'train_test_split_seed': 42 | |
| | }, | |
| | 'model': { | |
| | 'hf_model_ckpt': 'facebook/opt-125m', | |
| | 'torch_dtype': 'bfloat16', | |
| | 'quantize': True, | |
| | 'bitsandbytes': { | |
| | 'load_in_4bit': True, | |
| | 'bnb_4bit_compute_dtype': 'bfloat16', | |
| | 'bnb_4bit_quant_type': 'nf4' | |
| | } | |
| | }, | |
| | 'lora': { | |
| | 'task_type': 'CAUSAL_LM', | |
| | 'r': 32, | |
| | 'lora_alpha': 64, | |
| | 'lora_dropout': 0.1, | |
| | 'target_modules': 'all-linear' | |
| | }, | |
| | 'training': { | |
| | 'training_args': { | |
| | 'num_train_epochs': 1, | |
| | 'per_device_train_batch_size': 4, | |
| | 'gradient_accumulation_steps': 4, | |
| | 'gradient_checkpointing': True, | |
| | 'optim': 'paged_adamw_32bit', | |
| | 'logging_steps': 1, | |
| | 'learning_rate': 0.0002, | |
| | 'bf16': True, | |
| | 'tf32': True, | |
| | 'max_grad_norm': 0.3, | |
| | ... +2 | |
| | }, | |
| | 'sft_args': {'max_seq_length': 1024} | |
| | }, | |
| | 'inference': { | |
| | 'max_new_tokens': 256, | |
| | 'use_cache': True, | |
| | 'do_sample': True, | |
| | 'top_p': 0.9, | |
| | 'temperature': 0.8 | |
| | }, | |
| | 'qa': { | |
| | 'llm_tests': [ | |
| | 'jaccard_similarity', | |
| | 'dot_product', | |
| | 'rouge_score', | |
| | 'word_overlap', | |
| | 'verb_percent', | |
| | 'adjective_percent', | |
| | 'noun_percent', | |
| | 'summary_length' | |
| | ] | |
| | } | |
| | } | |
| | ] | |
| | dir_helper = <llmtune.utils.save_utils.DirectoryHelper object at | |
| | 0x000002B9A12FC550> | |
| | file = <_io.TextIOWrapper | |
| | name='experiment\eH6u\config\config.yml' mode='r' | |
| | encoding='cp1252'> | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\llmtune\cli\toolki |
| t.py:46 in run_one_experiment |
| |
| 43 # Loading Data ------------------------------- |
| 44 RichUI.before_dataset_creation() |
| 45 |
| > 46 with RichUI.during_dataset_creation("Injecting Values into Prompt |
| 47 dataset_generator = DatasetGenerator(**config.data.model_dump |
| 48 |
| 49 _ = dataset_generator.train_columns |
| |
| +-------------------------------- locals ---------------------------------+ |
| | config = Config( | |
| | save_dir='./experiment/', | |
| | ablation=AblationConfig( | |
| | use_ablate=False, | |
| | study_name='ablation' | |
| | ), | |
| | data=DataConfig( | |
| | file_type='json', | |
| | path='alpaca_data_cleaned.json', | |
| | prompt='Below is an instruction that describes a | |
| | task. Write a response that appropriat'+89, | |
| | prompt_stub='{output}', | |
| | train_size=500, | |
| | test_size=25, | |
| | train_test_split_seed=42 | |
| | ), | |
| | model=ModelConfig( | |
| | hf_model_ckpt='facebook/opt-125m', | |
| | device_map='auto', | |
| | torch_dtype='bfloat16', | |
| | attn_implementation=None, | |
| | quantize=True, | |
| | bitsandbytes=BitsAndBytesConfig( | |
| | load_in_8bit=False, | |
| | llm_int8_threshold=6.0, | |
| | llm_int8_skip_modules=None, | |
| | llm_int8_enable_fp32_cpu_offload=False, | |
| | llm_int8_has_fp16_weight=False, | |
| | load_in_4bit=True, | |
| | bnb_4bit_compute_dtype='bfloat16', | |
| | bnb_4bit_quant_type='nf4', | |
| | bnb_4bit_use_double_quant=True | |
| | ) | |
| | ), | |
| | lora=LoraConfig( | |
| | r=32, | |
| | task_type='CAUSAL_LM', | |
| | lora_alpha=64, | |
| | bias='none', | |
| | lora_dropout=0.1, | |
| | target_modules='all-linear', | |
| | fan_in_fan_out=False, | |
| | modules_to_save=None, | |
| | layers_to_transform=None, | |
| | layers_pattern=None | |
| | ), | |
| | training=TrainingConfig( | |
| | training_args=TrainingArgs( | |
| | num_train_epochs=1, | |
| | per_device_train_batch_size=4, | |
| | gradient_accumulation_steps=4, | |
| | gradient_checkpointing=True, | |
| | optim='paged_adamw_32bit', | |
| | logging_steps=1, | |
| | learning_rate=0.0002, | |
| | bf16=True, | |
| | tf32=True, | |
| | fp16=False, | |
| | max_grad_norm=0.3, | |
| | warmup_ratio=0.03, | |
| | lr_scheduler_type='constant', | |
| | save_steps=500 | |
| | ), | |
| | sft_args=SftArgs( | |
| | max_seq_length=1024, | |
| | neftune_noise_alpha=None | |
| | ) | |
| | ), | |
| | inference=InferenceConfig( | |
| | max_length=None, | |
| | max_new_tokens=256, | |
| | min_length=0, | |
| | min_new_tokens=None, | |
| | early_stopping=False, | |
| | max_time=None, | |
| | do_sample=True, | |
| | num_beams=1, | |
| | num_beam_groups=1, | |
| | penalty_alpha=None, | |
| | use_cache=True, | |
| | temperature=0.8, | |
| | top_k=50, | |
| | top_p=0.9, | |
| | typical_p=1.0, | |
| | epsilon_cutoff=0.0, | |
| | eta_cutoff=0.0, | |
| | diversity_penalty=0.0, | |
| | repetition_penalty=1.0, | |
| | encoder_repetition_penalty=1.0, | |
| | length_penalty=1.0, | |
| | no_repeat_ngram_size=0, | |
| | bad_words_ids=None, | |
| | force_words_ids=None, | |
| | renormalize_logits=False | |
| | ), | |
| | qa=QaConfig( | |
| | llm_tests=[ | |
| | 'jaccard_similarity', | |
| | 'dot_product', | |
| | 'rouge_score', | |
| | 'word_overlap', | |
| | 'verb_percent', | |
| | 'adjective_percent', | |
| | 'noun_percent', | |
| | 'summary_length' | |
| | ] | |
| | ) | |
| | ) | |
| | config_path = './config.yml' | |
| | dir_helper = <llmtune.utils.save_utils.DirectoryHelper object at | |
| | 0x000002B9A12FD210> | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\llmtune\ui\rich_ui |
| .py:28 in exit |
| |
| 25 return self # This allows you to use variables from this con |
| 26 |
| 27 def exit(self, exc_type, exc_val, exc_tb): |
| > 28 self.task.exit(exc_type, exc_val, exc_tb) # Cleanly exit |
| 29 |
| 30 |
| 31 class LiveContext: |
| |
| +-------------------------------- locals ---------------------------------+ |
| | exc_tb = <traceback object at 0x000002B9A108ED00> | |
| | exc_type = <class 'UnicodeEncodeError'> | |
| | exc_val = UnicodeEncodeError('charmap', '\U0001f648 ', 0, 1, 'character maps | |
| | to ') | |
| | self = <llmtune.ui.rich_ui.StatusContext object at | |
| | 0x000002B9A12EFD90> | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich\status.py:106 |
| in exit |
| |
| 103 exc_val: Optional[BaseException], |
| 104 exc_tb: Optional[TracebackType], |
| 105 ) -> None: |
| > 106 self.stop() |
| 107 |
| 108 |
| 109 if name == "main": # pragma: no cover |
| |
| +-------------------------------- locals ---------------------------------+ |
| | exc_tb = <traceback object at 0x000002B9A108ED00> | |
| | exc_type = <class 'UnicodeEncodeError'> | |
| | exc_val = UnicodeEncodeError('charmap', '\U0001f648 ', 0, 1, 'character maps | |
| | to ') | |
| | self = <rich.status.Status object at 0x000002B9FFA44A50> | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich\status.py:91 |
| in stop |
| |
| 88 |
| 89 def stop(self) -> None: |
| 90 """Stop the spinner animation.""" |
| > 91 self._live.stop() |
| 92 |
| 93 def rich(self) -> RenderableType: |
| 94 return self.renderable |
| |
| +------------------------- locals -------------------------+ |
| | self = <rich.status.Status object at 0x000002B9FFA44A50> | |
| +----------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich\live.py:147 |
| in stop |
| |
| 144 self._refresh_thread = None |
| 145 # allow it to fully render on the last even if overflow |
| 146 self.vertical_overflow = "visible" |
| > 147 with self.console: |
| 148 try: |
| 149 if not self._alt_screen and not self.console.is_j |
| 150 self.refresh() |
| |
| +----------------------- locals -----------------------+ |
| | self = <rich.live.Live object at 0x000002B9A12FC3D0> | |
| +------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich\console.py:86 |
| 5 in exit |
| |
| 862 |
| 863 def exit(self, exc_type: Any, exc_value: Any, traceback: Any |
| 864 """Exit buffer context.""" |
| > 865 self._exit_buffer() |
| 866 |
| 867 def begin_capture(self) -> None: |
| 868 """Begin capturing console output. Call :meth:end_capture |
| |
| +---------------------- locals ----------------------+ |
| | exc_type = None | |
| | exc_value = None | |
| | self = | |
| | traceback = None | |
| +----------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich\console.py:82 |
| 3 in _exit_buffer |
| |
| 820 def _exit_buffer(self) -> None: |
| 821 """Leave buffer context, and render content if required.""" |
| 822 self._buffer_index -= 1 |
| > 823 self._check_buffer() |
| 824 |
| 825 def set_live(self, live: "Live") -> None: |
| 826 """Set Live instance. Used by Live context manager. |
| |
| +------------------- locals --------------------+ |
| | self = | |
| +-----------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich\console.py:20 |
| 27 in _check_buffer |
| |
| 2024 if self.no_color and self._color_system: |
| 2025 buffer = list(Segment.remove_color(b |
| 2026 |
| > 2027 legacy_windows_render(buffer, LegacyWind |
| 2028 else: |
| 2029 # Either a non-std stream on legacy Wind |
| 2030 text = self._render_buffer(self._buffer[ |
| |
| +-------------------------------- locals ---------------------------------+ |
| | buffer = [ | |
| | Segment( | |
| | 'Generating train split: ', | |
| | Style() | |
| | ), | |
| | Segment( | |
| | '0', | |
| | Style( | |
| | color=Color( | |
| | 'cyan', | |
| | ColorType.STANDARD, | |
| | number=6 | |
| | ), | |
| | bold=True, | |
| | italic=False | |
| | ) | |
| | ), | |
| | Segment(' examples ', Style()), | |
| | Segment('[', Style(bold=True)), | |
| | Segment( | |
| | '00:00', | |
| | Style( | |
| | color=Color( | |
| | 'bright_green', | |
| | ColorType.STANDARD, | |
| | number=10 | |
| | ), | |
| | bold=True | |
| | ) | |
| | ), | |
| | Segment(', ? examples/s', Style()), | |
| | Segment(']', Style(bold=True)), | |
| | Segment('\n'), | |
| | Segment( | |
| | '\U0001f648 ', | |
| | Style( | |
| | color=Color( | |
| | 'green', | |
| | ColorType.STANDARD, | |
| | number=2 | |
| | ) | |
| | ) | |
| | ), | |
| | Segment( | |
| | ' Injecting Values into Prompt', | |
| | Style() | |
| | ), | |
| | ... +6 | |
| | ] | |
| | fileno = 1 | |
| | legacy_windows_render = <function legacy_windows_render at | |
| | 0x000002B9A12F2B60> | |
| | LegacyWindowsTerm = <class | |
| | 'rich._win32_console.LegacyWindowsTerm'> | |
| | self = | |
| | use_legacy_windows_render = True | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich_windows_rend |
| erer.py:17 in legacy_windows_render |
| |
| 14 for text, style, control in buffer: |
| 15 if not control: |
| 16 if style: |
| > 17 term.write_styled(text, style) |
| 18 else: |
| 19 term.write_text(text) |
| 20 else: |
| |
| +-------------------------------- locals ---------------------------------+ |
| | buffer = [ | |
| | Segment('Generating train split: ', Style()), | |
| | Segment( | |
| | '0', | |
| | Style( | |
| | color=Color( | |
| | 'cyan', | |
| | ColorType.STANDARD, | |
| | number=6 | |
| | ), | |
| | bold=True, | |
| | italic=False | |
| | ) | |
| | ), | |
| | Segment(' examples ', Style()), | |
| | Segment('[', Style(bold=True)), | |
| | Segment( | |
| | '00:00', | |
| | Style( | |
| | color=Color( | |
| | 'bright_green', | |
| | ColorType.STANDARD, | |
| | number=10 | |
| | ), | |
| | bold=True | |
| | ) | |
| | ), | |
| | Segment(', ? examples/s', Style()), | |
| | Segment(']', Style(bold=True)), | |
| | Segment('\n'), | |
| | Segment( | |
| | '\U0001f648 ', | |
| | Style( | |
| | color=Color( | |
| | 'green', | |
| | ColorType.STANDARD, | |
| | number=2 | |
| | ) | |
| | ) | |
| | ), | |
| | Segment(' Injecting Values into Prompt', Style()), | |
| | ... +6 | |
| | ] | |
| | control = None | |
| | style = Style(color=Color('green', ColorType.STANDARD, number=2)) | |
| | term = <rich._win32_console.LegacyWindowsTerm object at | |
| | 0x000002B9A108EC90> | |
| | text = '\U0001f648 ' | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich_win32_consol |
| e.py:442 in write_styled |
| |
| 439 SetConsoleTextAttribute( |
| 440 self._handle, attributes=ctypes.c_ushort(fore | (back << |
| 441 ) |
| > 442 self.write_text(text) |
| 443 SetConsoleTextAttribute(self._handle, attributes=self._defaul |
| 444 |
| 445 def move_cursor_to(self, new_position: WindowsCoordinates) -> Non |
| |
| +-------------------------------- locals ---------------------------------+ |
| | back = 0 | |
| | bgcolor = None | |
| | color = Color('green', ColorType.STANDARD, number=2) | |
| | fore = 2 | |
| | self = <rich._win32_console.LegacyWindowsTerm object at | |
| | 0x000002B9A108EC90> | |
| | style = Style(color=Color('green', ColorType.STANDARD, number=2)) | |
| | text = '\U0001f648 ' | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\rich_win32_consol |
| e.py:403 in write_text |
| |
| 400 Args: |
| 401 text (str): The text to write to the console |
| 402 """ |
| > 403 self.write(text) |
| 404 self.flush() |
| 405 |
| 406 def write_styled(self, text: str, style: Style) -> None: |
| |
| +-------------------------------- locals ---------------------------------+ |
| | self = <rich._win32_console.LegacyWindowsTerm object at | |
| | 0x000002B9A108EC90> | |
| | text = '\U0001f648 ' | |
| +-------------------------------------------------------------------------+ |
| |
| C:\ProgramData\anaconda3\envs\llm_tkit\Lib\encodings\cp1252.py:19 in encode |
| |
| 16 |
| 17 class IncrementalEncoder(codecs.IncrementalEncoder): |
| 18 def encode(self, input, final=False): |
| > 19 return codecs.charmap_encode(input,self.errors,encoding_table |
| 20 |
| 21 class IncrementalDecoder(codecs.IncrementalDecoder): |
| 22 def decode(self, input, final=False): |
| |
| +-------------------------------- locals ---------------------------------+ |
| | final = False | |
| | input = '\U0001f648 ' | |
| | self = <encodings.cp1252.IncrementalEncoder object at | |
| | 0x000002B9A5964D50> | |
| +-------------------------------------------------------------------------+ |
+-----------------------------------------------------------------------------+
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f648' in
position 0: character maps to
Exception ignored in: <function tqdm.del at 0x000002B9DEC44A40>
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\tqdm\std.py", line 1148, in del
self.close()
File "C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\tqdm\std.py", line 1277, in close
if self.last_print_t < self.start_t + self.delay:
^^^^^^^^^^^^^^^^^
AttributeError: 'tqdm' object has no attribute 'last_print_t'
!set PYTHONIOENCODINGS='utf-8'
Click to add a cell.

Update Entrypoint of the CLI

See #115 (comment)

[RichUI] Better Dataset Generation Display

Is your feature request related to a problem? Please describe.

Dataset creation table display always display all columns of dataset, instead of ones needed by prompt and prompt_stub
Dataset creation table display highlighting uses string matching, leading to weird outputs when there are overlaps

Describe the solution you'd like

Fix these issues!

Allow users to set verbosity of outputs

Right now debug outputs and warnings are suppressed in favor of a cleaner UI
Should leave users to choose a more verbose output by running something like

llmtune run --verbose
llmtune run -v

[Package] Rename source directory `src` to something more descriptive

Describe the solution you'd like

As specified in the title.

Training of FlanT5 for summarization

I tried following the same framewrok for training the other llms (falcon,mistral,etc) with SFTTrainer to train the FlanT5 model as well.
But the results are bad, as if the llm doesn't learn anything.
Training it with the Seq2Seq method works. Why did you use this method for FlanT5 an SFTTrainer for all the other llms?

UnicodeEncodeError: 'charmap' codec can't encode character '\u2264' in position in command prompt

Generating on test set: 25/25
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ prompt ┃ ground truth ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Below is an instruction that │ The language of the text "Buonasera, │
│ describes a task. Write a response │ il mio nome e Giuseppe" is Italian. │
│ that appropriately completes the │ │
│ request. ### Instruction: Identify │ │
│ the language of the text "Buonasera, │ │
│ il mio nome e Giuseppe." ### Input: │ │
│ ### Output: │ │
└──────────────────────────────────────┴───────────────────────────────────────┘
Prediction >

Output: ### Input: ### Output: ### Input: ### Input: ### Output:

Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output: ### Output: ### Output: ###
Output: ### Output: ### Output: ### Output:
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\llmtune\cli\toolkit │
│ .py:122 in run │
│ │
│ 119 │ │ │ config = yaml.safe_load(file) │
│ 120 │ │ │ config = Config(**config) │
│ 121 │ │ │
│ ❱ 122 │ │ run_one_experiment(config, config_path) │
│ 123 │
│ 124 │
│ 125 @generate_app.command("config") │
│ │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ config = Config( │ │
│ │ │ save_dir='./experiment/', │ │
│ │ │ ablation=AblationConfig( │ │
│ │ │ │ use_ablate=False, │ │
│ │ │ │ study_name='ablation' │ │
│ │ │ ), │ │
│ │ │ data=DataConfig( │ │
│ │ │ │ file_type='huggingface', │ │
│ │ │ │ path='yahma/alpaca-cleaned', │ │
│ │ │ │ prompt='Below is an instruction that describes a │ │
│ │ task. Write a response that appropriat'+89, │ │
│ │ │ │ prompt_stub='{output}', │ │
│ │ │ │ train_size=500, │ │
│ │ │ │ test_size=25, │ │
│ │ │ │ train_test_split_seed=42 │ │
│ │ │ ), │ │
│ │ │ model=ModelConfig( │ │
│ │ │ │ hf_model_ckpt='facebook/opt-125m', │ │
│ │ │ │ device_map='auto', │ │
│ │ │ │ torch_dtype='bfloat16', │ │
│ │ │ │ attn_implementation=None, │ │
│ │ │ │ quantize=True, │ │
│ │ │ │ bitsandbytes=BitsAndBytesConfig( │ │
│ │ │ │ │ load_in_8bit=False, │ │
│ │ │ │ │ llm_int8_threshold=6.0, │ │
│ │ │ │ │ llm_int8_skip_modules=None, │ │
│ │ │ │ │ llm_int8_enable_fp32_cpu_offload=False, │ │
│ │ │ │ │ llm_int8_has_fp16_weight=False, │ │
│ │ │ │ │ load_in_4bit=True, │ │
│ │ │ │ │ bnb_4bit_compute_dtype='bfloat16', │ │
│ │ │ │ │ bnb_4bit_quant_type='nf4', │ │
│ │ │ │ │ bnb_4bit_use_double_quant=True │ │
│ │ │ │ ) │ │
│ │ │ ), │ │
│ │ │ lora=LoraConfig( │ │
│ │ │ │ r=32, │ │
│ │ │ │ task_type='CAUSAL_LM', │ │
│ │ │ │ lora_alpha=64, │ │
│ │ │ │ bias='none', │ │
│ │ │ │ lora_dropout=0.1, │ │
│ │ │ │ target_modules='all-linear', │ │
│ │ │ │ fan_in_fan_out=False, │ │
│ │ │ │ modules_to_save=None, │ │
│ │ │ │ layers_to_transform=None, │ │
│ │ │ │ layers_pattern=None │ │
│ │ │ ), │ │
│ │ │ training=TrainingConfig( │ │
│ │ │ │ training_args=TrainingArgs( │ │
│ │ │ │ │ num_train_epochs=1, │ │
│ │ │ │ │ per_device_train_batch_size=4, │ │
│ │ │ │ │ gradient_accumulation_steps=4, │ │
│ │ │ │ │ gradient_checkpointing=True, │ │
│ │ │ │ │ optim='paged_adamw_32bit', │ │
│ │ │ │ │ logging_steps=1, │ │
│ │ │ │ │ learning_rate=0.0002, │ │
│ │ │ │ │ bf16=True, │ │
│ │ │ │ �� tf32=True, │ │
│ │ │ │ │ fp16=False, │ │
│ │ │ │ │ max_grad_norm=0.3, │ │
│ │ │ │ │ warmup_ratio=0.03, │ │
│ │ │ │ │ lr_scheduler_type='constant', │ │
│ │ │ │ │ save_steps=500 │ │
│ │ │ │ ), │ │
│ │ │ │ sft_args=SftArgs( │ │
│ │ │ │ │ max_seq_length=1024, │ │
│ │ │ │ │ neftune_noise_alpha=None │ │
│ │ │ │ ) │ │
│ │ │ ), │ │
│ │ │ inference=InferenceConfig( │ │
│ │ │ │ max_length=None, │ │
│ │ │ │ max_new_tokens=256, │ │
│ │ │ │ min_length=0, │ │
│ │ │ │ min_new_tokens=None, │ │
│ │ │ │ early_stopping=False, │ │
│ │ │ │ max_time=None, │ │
│ │ │ │ do_sample=True, │ │
│ │ │ │ num_beams=1, │ │
│ │ │ │ num_beam_groups=1, │ │
│ │ │ │ penalty_alpha=None, │ │
│ │ │ │ use_cache=True, │ │
│ │ │ │ temperature=0.8, │ │
│ │ │ │ top_k=50, │ │
│ │ │ │ top_p=0.9, │ │
│ │ │ │ typical_p=1.0, │ │
│ │ │ │ epsilon_cutoff=0.0, │ │
│ │ │ │ eta_cutoff=0.0, │ │
│ │ │ │ diversity_penalty=0.0, │ │
│ │ │ │ repetition_penalty=1.0, │ │
│ │ │ │ encoder_repetition_penalty=1.0, │ │
│ │ │ │ length_penalty=1.0, │ │
│ │ │ │ no_repeat_ngram_size=0, │ │
│ │ │ │ bad_words_ids=None, │ │
│ │ │ │ force_words_ids=None, │ │
│ │ │ │ renormalize_logits=False │ │
│ │ │ ), │ │
│ │ │ qa=QaConfig( │ │
│ │ │ │ llm_tests=[ │ │
│ │ │ │ │ 'jaccard_similarity', │ │
│ │ │ │ │ 'dot_product', │ │
│ │ │ │ │ 'rouge_score', │ │
│ │ │ │ │ 'word_overlap', │ │
│ │ │ │ │ 'verb_percent', │ │
│ │ │ │ │ 'adjective_percent', │ │
│ │ │ │ │ 'noun_percent', │ │
│ │ │ │ │ 'summary_length' │ │
│ │ │ │ ] │ │
│ │ │ ) │ │
│ │ ) │ │
│ │ config_path = './config.yml' │ │
│ │ configs = [ │ │
│ │ │ { │ │
│ │ │ │ 'save_dir': './experiment/', │ │
│ │ │ │ 'ablation': {'use_ablate': False}, │ │
│ │ │ │ 'data': { │ │
│ │ │ │ │ 'file_type': 'huggingface', │ │
│ │ │ │ │ 'path': 'yahma/alpaca-cleaned', │ │
│ │ │ │ │ 'prompt': 'Below is an instruction that │ │
│ │ describes a task. Write a response that appropriat'+89, │ │
│ │ │ │ │ 'prompt_stub': '{output}', │ │
│ │ │ │ │ 'test_size': 25, │ │
│ │ │ │ │ 'train_size': 500, │ │
│ │ │ │ │ 'train_test_split_seed': 42 │ │
│ │ │ │ }, │ │
│ │ │ │ 'model': { │ │
│ │ │ │ │ 'hf_model_ckpt': 'facebook/opt-125m', │ │
│ │ │ │ │ 'torch_dtype': 'bfloat16', │ │
│ │ │ │ │ 'quantize': True, │ │
│ │ │ │ │ 'bitsandbytes': { │ │
│ │ │ │ │ │ 'load_in_4bit': True, │ │
│ │ │ │ │ │ 'bnb_4bit_compute_dtype': 'bfloat16', │ │
│ │ │ │ │ │ 'bnb_4bit_quant_type': 'nf4' │ │
│ │ │ │ │ } │ │
│ │ │ │ }, │ │
│ │ │ │ 'lora': { │ │
│ │ │ │ │ 'task_type': 'CAUSAL_LM', │ │
│ │ │ │ │ 'r': 32, │ │
│ │ │ │ │ 'lora_alpha': 64, │ │
│ │ │ │ │ 'lora_dropout': 0.1, │ │
│ │ │ │ │ 'target_modules': 'all-linear' │ │
│ │ │ │ }, │ │
│ │ │ │ 'training': { │ │
│ │ │ │ │ 'training_args': { │ │
│ │ │ │ │ │ 'num_train_epochs': 1, │ │
│ │ │ │ │ │ 'per_device_train_batch_size': 4, │ │
│ │ │ │ │ │ 'gradient_accumulation_steps': 4, │ │
│ │ │ │ │ │ 'gradient_checkpointing': True, │ │
│ │ │ │ │ │ 'optim': 'paged_adamw_32bit', │ │
│ │ │ │ │ │ 'logging_steps': 1, │ │
│ │ │ │ │ │ 'learning_rate': 0.0002, │ │
│ │ │ │ │ │ 'bf16': True, │ │
│ │ │ │ │ │ 'tf32': True, │ │
│ │ │ │ │ │ 'max_grad_norm': 0.3, │ │
│ │ │ │ │ │ ... +2 │ │
│ │ │ │ │ }, │ │
│ │ │ │ │ 'sft_args': {'max_seq_length': 1024} │ │
│ │ │ │ }, │ │
│ │ │ │ 'inference': { │ │
│ │ │ │ │ 'max_new_tokens': 256, │ │
│ │ │ │ │ 'use_cache': True, │ │
│ │ │ │ │ 'do_sample': True, │ │
│ │ │ │ │ 'top_p': 0.9, │ │
│ │ │ │ │ 'temperature': 0.8 │ │
│ │ │ │ }, │ │
│ │ │ │ 'qa': { │ │
│ │ │ │ │ 'llm_tests': [ │ │
│ │ │ │ │ │ 'jaccard_similarity', │ │
│ │ │ │ │ │ 'dot_product', │ │
│ │ │ │ │ │ 'rouge_score', │ │
│ │ │ │ │ │ 'word_overlap', │ │
│ │ │ │ │ │ 'verb_percent', │ │
│ │ │ │ │ │ 'adjective_percent', │ │
│ │ │ │ │ │ 'noun_percent', │ │
│ │ │ │ │ │ 'summary_length' │ │
│ │ │ │ │ ] │ │
│ │ │ │ } │ │
│ │ │ } │ │
│ │ ] │ │
│ │ dir_helper = <llmtune.utils.save_utils.DirectoryHelper object at │ │
│ │ 0x00000259901554D0> │ │
│ │ file = <_io.TextIOWrapper │ │
│ │ name='experiment\sHR8ns\config\config.yml' mode='r' │ │
│ │ encoding='cp1252'> │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│ │
│ C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\llmtune\cli\toolkit │
│ .py:84 in run_one_experiment │
│ │
│ 81 │ results_file_path = dir_helper.save_paths.results_file │
│ 82 │ if not results_file_path.exists(): │
│ 83 │ │ inference_runner = LoRAInference(test, test_column, config, di │
│ ❱ 84 │ │ inference_runner.infer_all() │
│ 85 │ │ RichUI.after_inference(results_path) │
│ 86 │ else: │
│ 87 │ │ RichUI.results_found(results_path) │
│ │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ _ = ['instruction', 'input'] │ │
│ │ config = Config( │ │
│ │ │ save_dir='./experiment/', │ │
│ │ │ ablation=AblationConfig( │ │
│ │ │ │ use_ablate=False, │ │
│ │ │ │ study_name='ablation' │ │
│ │ │ ), │ │
│ │ │ data=DataConfig( │ │
│ │ │ │ file_type='huggingface', │ │
│ │ │ │ path='yahma/alpaca-cleaned', │ │
│ │ │ │ prompt='Below is an instruction that │ │
│ │ describes a task. Write a response that │ │
│ │ appropriat'+89, │ │
│ │ │ │ prompt_stub='{output}', │ │
│ │ │ │ train_size=500, │ │
│ │ │ │ test_size=25, │ │
│ │ │ │ train_test_split_seed=42 │ │
│ │ │ ), │ │
│ │ │ model=ModelConfig( │ │
│ │ │ │ hf_model_ckpt='facebook/opt-125m', │ │
│ │ │ │ device_map='auto', │ │
│ │ │ │ torch_dtype='bfloat16', │ │
│ │ │ │ attn_implementation=None, │ │
│ │ │ │ quantize=True, │ │
│ │ │ │ bitsandbytes=BitsAndBytesConfig( │ │
│ │ │ │ │ load_in_8bit=False, │ │
│ │ │ │ │ llm_int8_threshold=6.0, │ │
│ │ │ │ │ llm_int8_skip_modules=None, │ │
│ │ │ │ │ llm_int8_enable_fp32_cpu_offload=False, │ │
│ │ │ │ │ llm_int8_has_fp16_weight=False, │ │
│ �� │ │ │ load_in_4bit=True, │ │
│ │ │ │ │ bnb_4bit_compute_dtype='bfloat16', │ │
│ │ │ │ │ bnb_4bit_quant_type='nf4', │ │
│ │ │ │ │ bnb_4bit_use_double_quant=True │ │
│ │ │ │ ) │ │
│ │ │ ), │ │
│ │ │ lora=LoraConfig( │ │
│ │ │ │ r=32, │ │
│ │ │ │ task_type='CAUSAL_LM', │ │
│ │ │ │ lora_alpha=64, │ │
│ │ │ │ bias='none', │ │
│ │ │ │ lora_dropout=0.1, │ │
│ │ │ │ target_modules='all-linear', │ │
│ │ │ │ fan_in_fan_out=False, │ │
│ │ │ │ modules_to_save=None, │ │
│ │ │ │ layers_to_transform=None, │ │
│ │ │ │ layers_pattern=None │ │
│ │ │ ), │ │
│ │ │ training=TrainingConfig( │ │
│ │ │ │ training_args=TrainingArgs( │ │
│ │ │ │ │ num_train_epochs=1, │ │
│ │ │ │ │ per_device_train_batch_size=4, │ │
│ │ │ │ │ gradient_accumulation_steps=4, │ │
│ │ │ │ │ gradient_checkpointing=True, │ │
│ │ │ │ │ optim='paged_adamw_32bit', │ │
│ │ │ │ │ logging_steps=1, │ │
│ │ │ │ │ learning_rate=0.0002, │ │
│ │ │ │ │ bf16=True, │ │
│ │ │ │ │ tf32=True, │ │
│ │ │ │ │ fp16=False, │ │
│ │ │ │ │ max_grad_norm=0.3, │ │
│ │ │ │ │ warmup_ratio=0.03, │ │
│ │ │ │ │ lr_scheduler_type='constant', │ │
│ │ │ │ │ save_steps=500 │ │
│ │ │ │ ), │ │
│ │ │ │ sft_args=SftArgs( │ │
│ │ │ │ │ max_seq_length=1024, │ │
│ │ │ │ │ neftune_noise_alpha=None │ │
│ │ │ │ ) │ │
│ │ │ ), │ │
│ │ │ inference=InferenceConfig( │ │
│ │ │ │ max_length=None, │ │
│ │ │ │ max_new_tokens=256, │ │
│ │ │ │ min_length=0, │ │
│ │ │ │ min_new_tokens=None, │ │
│ │ │ │ early_stopping=False, │ │
│ │ │ │ max_time=None, │ │
│ │ │ │ do_sample=True, │ │
│ │ │ │ num_beams=1, │ │
│ │ │ │ num_beam_groups=1, │ │
│ │ │ │ penalty_alpha=None, │ │
│ │ │ │ use_cache=True, │ │
│ │ │ │ temperature=0.8, │ │
│ │ │ │ top_k=50, │ │
│ │ │ │ top_p=0.9, │ │
│ │ │ │ typical_p=1.0, │ │
│ │ │ │ epsilon_cutoff=0.0, │ │
│ │ │ │ eta_cutoff=0.0, │ │
│ │ │ │ diversity_penalty=0.0, │ │
│ │ │ │ repetition_penalty=1.0, │ │
│ │ │ │ encoder_repetition_penalty=1.0, │ │
│ │ │ │ length_penalty=1.0, │ │
│ │ │ │ no_repeat_ngram_size=0, │ │
│ │ │ │ bad_words_ids=None, │ │
│ │ │ │ force_words_ids=None, │ │
│ │ │ │ renormalize_logits=False │ │
│ │ │ ), │ │
│ │ │ qa=QaConfig( │ │
│ │ │ │ llm_tests=[ │ │
│ │ │ │ │ 'jaccard_similarity', │ │
│ │ │ │ │ 'dot_product', │ │
│ │ │ │ │ 'rouge_score', │ │
│ │ │ │ │ 'word_overlap', │ │
│ │ │ │ │ 'verb_percent', │ │
│ │ │ │ │ 'adjective_percent', │ │
│ │ │ │ │ 'noun_percent', │ │
│ │ │ │ │ 'summary_length' │ │
│ │ │ │ ] │ │
│ │ │ ) │ │
│ │ ) │ │
│ │ config_path = './config.yml' │ │
│ │ dataset_generator = <llmtune.data.dataset_generator.DatasetGenerator │ │
│ │ object at 0x00000259FF8B8750> │ │
│ │ dataset_path = WindowsPath('experiment/sHR8ns/dataset') │ │
│ │ dir_helper = <llmtune.utils.save_utils.DirectoryHelper object at │ │
│ │ 0x000002599017BE90> │ │
│ │ finetuner = <llmtune.finetune.lora.LoRAFinetune object at │ │
│ │ 0x000002598EED0310> │ │
│ │ inference_runner = <llmtune.inference.lora.LoRAInference object at │ │
│ │ 0x000002598FFD99D0> │ │
│ │ results_file_path = WindowsPath('experiment/sHR8ns/results/results.csv') │ │
│ │ results_path = WindowsPath('experiment/sHR8ns/results') │ │
│ │ test = Dataset({ │ │
│ │ │ features: ['output', 'input', 'instruction', │ │
│ │ 'formatted_prompt'], │ │
│ │ │ num_rows: 25 │ │
│ │ }) │ │
│ │ test_column = 'output' │ │
│ │ train = Dataset({ │ │
│ │ │ features: ['output', 'input', 'instruction', │ │
│ │ 'formatted_prompt'], │ │
│ │ │ num_rows: 500 │ │
│ │ }) │ │
│ │ weights_path = WindowsPath('experiment/sHR8ns/weights') │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│ │
│ C:\ProgramData\anaconda3\envs\llm_tkit\Lib\site-packages\llmtune\inference\l │
│ ora.py:80 in infer_all │
│ │
│ 77 │ │ │ writer = csv.writer(f) │
│ 78 │ │ │ writer.writerow(header) │
│ 79 │ │ │ for row in results: │
│ ❱ 80 │ │ │ │ writer.writerow(row) │
│ 81 │ │
│ 82 │ def infer_one(self, prompt: str) -> str: │
│ 83 │ │ input_ids = self.tokenizer(prompt, return_tensors="pt", trunca │
│ │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ f = <_io.TextIOWrapper │ │
│ │ name='experiment\sHR8ns\results\results.csv' mode='w' │ │
│ │ encoding='cp1252'> │ │
│ │ header = ['Prompt', 'Ground Truth', 'Predicted'] │ │
│ │ idx = 24 │ │
│ │ label = 'The language of the text "Buonasera, il mio nome e Giuseppe" │ │
│ │ is Italian.' │ │
│ │ labels = [ │ │
│ │ │ 'Early, she left the party.', │ │
│ │ │ 'We solve the equation f(x) = 0 on the domains x ≤ 1 and x │ │
│ │ > 1.\n\nIf x ≤ 1, then f'+261, │ │
│ │ │ "Warm breeze on my face\nEndless sun brings joy and │ │
│ │ peace\nSummer, please don't fad"+1, │ │
│ │ │ 'Here are several methods to improve the accuracy of │ │
│ │ machine learning models:\n\n1.'+1416, │ │
│ │ │ 'Global warming can be reversed by reducing greenhouse gas │ │
│ │ emissions and deforest'+6, │ │
│ │ │ 'Putting the pieces of a puzzle together is like crafting │ │
│ │ a well-written piece. J'+349, │ │
│ │ │ 'We can solve for x by using algebraic operations to │ │
│ │ isolate the variable on one '+340, │ │
│ │ │ '1. "We all make mistakes,"\n2. "but",\n3. "learning from │ │
│ │ them is important."', │ │
│ │ │ '1. Brazil\n2. Argentina\n3. Colombia\n4. Chile', │ │
│ │ │ 'Based on the given nutritional information, the food item │ │
│ │ can be classified as a'+72, │ │
│ │ │ ... +15 │ │
│ │ ] │ │
│ │ prompt = 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+139 │ │
│ │ prompts = [ │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+164, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+175, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+109, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+145, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+180, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+124, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+107, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+174, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+114, │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+184, │ │
│ │ │ ... +15 │ │
│ │ ] │ │
│ │ result = ' ### Output: ### Input: ### Output: ### Input: ### │ │
│ │ Input: ### Output: ###'+749 │ │
│ │ results = [ │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+164, │ │
│ │ │ │ 'Early, she left the party.', │ │
│ │ │ │ ' You left the party early\n> ### Input: She left the │ │
│ │ party early > ### Output: '+147 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+175, │ │
│ │ │ │ 'We solve the equation f(x) = 0 on the domains x ≤ 1 │ │
│ │ and x > 1.\n\nIf x ≤ 1, then f'+261, │ │
│ │ │ │ ' * 1\n\nThat’s a good idea. If you’ve got a lot of │ │
│ │ time, you can try to solve thi'+876 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+109, │ │
│ │ │ │ "Warm breeze on my face\nEndless sun brings joy and │ │
│ │ peace\nSummer, please don't fad"+1, │ │
│ │ │ │ ' ### Input: ### Output: ### Input: ### Output: ### │ │
│ │ Output: ### Output: ### Inpu'+938 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+145, │ │
│ │ │ │ 'Here are several methods to improve the accuracy of │ │
│ │ machine learning models:\n\n1.'+1416, │ │
│ │ │ │ ' ### Input: ### Output: ### Input: ### Output: │ │
│ │ ### Input: ### Output: ###'+954 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+180, │ │
│ │ │ │ 'Global warming can be reversed by reducing greenhouse │ │
│ │ gas emissions and deforest'+6, │ │
│ │ │ │ ' ________ ### ### ### ### ### ### ###\n\nBelow │ │
│ │ is an instruction that descr'+657 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+124, │ │
│ │ │ │ 'Putting the pieces of a puzzle together is like │ │
│ │ crafting a well-written piece. J'+349, │ │
│ │ │ │ ' You need to be a certain type of person ### │ │
│ │ Character: Do you have a good idea '+1100 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+107, │ │
│ │ │ │ 'We can solve for x by using algebraic operations to │ │
│ │ isolate the variable on one '+340, │ │
│ │ │ │ ' ### Input: ### Output: ### Output: ### Input: │ │
│ │ ### Output: ### Input: ###'+744 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+174, │ │
│ │ │ │ '1. "We all make mistakes,"\n2. "but",\n3. "learning │ │
│ │ from them is important."', │ │
│ │ │ │ " We don't have the answers. ### Results: Our mistakes │ │
│ │ do not make us lazy, but i"+130 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+114, │ │
│ │ │ │ '1. Brazil\n2. Argentina\n3. Colombia\n4. Chile', │ │
│ │ │ │ ' ### * Requested by: ### * Requested by: ### * │ │
│ │ Requested by: ### ### #'+733 │ │
│ │ │ ), │ │
│ │ │ ( │ │
│ │ │ │ 'Below is an instruction that describes a task. Write │ │
│ │ a response that appropriat'+184, │ │
│ │ │ │ 'Based on the given nutritional information, the food │ │
│ │ item can be classified as a'+72, �� │
│ │ │ │ ' Calories: 1,843 | Protein: 1g ### Input: Calories: │ │
│ │ 11,731 | Protein: 1g ### Out'+1077 │ │
│ │ │ ), │ │
│ │ │ ... +15 │ │
│ │ ] │ │
│ │ row = ( │ │
│ │ │ 'Below is an instruction that describes a task. Write a │ │
│ │ response that appropriat'+175, │ │
│ │ │ 'We solve the equation f(x) = 0 on the domains x ≤ 1 and x │ │
│ │ > 1.\n\nIf x ≤ 1, then f'+261, │ │
│ │ │ ' * 1\n\nThat’s a good idea. If you’ve got a lot of time, │ │
│ │ you can try to solve thi'+876 │ │
│ │ ) │ │
│ │ self = <llmtune.inference.lora.LoRAInference object at │ │
│ │ 0x000002598FFD99D0> │ │
│ │ writer = <_csv.writer object at 0x00000259901B3100> │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│ │
│ C:\ProgramData\anaconda3\envs\llm_tkit\Lib\encodings\cp1252.py:19 in encode │
│ │
│ 16 │
│ 17 class IncrementalEncoder(codecs.IncrementalEncoder): │
│ 18 │ def encode(self, input, final=False): │
│ ❱ 19 │ │ return codecs.charmap_encode(input,self.errors,encoding_table) │
│ 20 │
│ 21 class IncrementalDecoder(codecs.IncrementalDecoder): │
│ 22 │ def decode(self, input, final=False): │
│ │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ final = False │ │
│ │ input = '"Below is an instruction that describes a task. Write a │ │
│ │ response that appropria'+1482 │ │
│ │ self = <encodings.cp1252.IncrementalEncoder object at │ │
│ │ 0x0000025990187AD0> │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
UnicodeEncodeError: 'charmap' codec can't encode character '\u2264' in position
154: character maps to

Test Metric table should give units, where applicable

It's quite difficult to interpret the values in the Test Metrics table, produced after inference. What is a mean word overlap of 21.19? What is a summary length of 570?

Describe the solution you'd like
An extra column in the test metric table, to contextualize the metrics and give their units, and maybe their minimum and maximum.

I can't assign this myself, but I would like to fix this one. Please assign to @SinclairHudson 🎸

Inferencing script not executable due to package dependency errors

I tried to go through the README file as mentioned, and once i execute llama2_baseline_inference.py I am thrown with the error

ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or pip install bitsandbytes`

even though the packages are installed in my environment. I was able to circumvent this problem by upgrading my datasets library using pip install -U datasets and now I received one other error as given below in this link

To avoid this issue, I downgraded my transformers library to 4.3 and currently, I am unable to download some of the checkpoints. I feel the packages need to be revamped with the latest versions

[Workflow] Automatically run `black`, `flake8`, `isort` via Github Action

Is your feature request related to a problem? Please describe.

Automatic formatting and linting to improve code consistency

Describe the solution you'd like

Have pre-commit hooks that run before commits
Example

Describe alternatives you've considered

I've been manually running black from time to time on the whole repo, but not the best solution for collaboration

georgian-io / llm-finetuning-toolkit Goto Github PK

llm-finetuning-toolkit's Introduction

LLM Finetuning Toolkit

Overview

Installation

pipx (recommended)

pip

Quick Start

Basic

Intermediate

Flash Attention 2

Data Ingestion

LLM Definition

Quality Assurance

Artifact Outputs

Advanced

Extending

Contributing

llm-finetuning-toolkit's People

Contributors

Stargazers

Watchers

Forkers

llm-finetuning-toolkit's Issues

Proposed CLI

Arguments

Options

Examples

Output: ### Input: ### Output: ### Input: ### Input: ### Output:

Recommend Projects

Recommend Topics

Recommend Org

Jobs