GithubHelp home page GithubHelp logo

peft's Introduction

πŸ€— PEFT-SFT

Sparse Fine-Tuning for Large Language Models

This is a fork of πŸ€— PEFT implementing efficient sparse fine-tuning (SFT) as described in the paper Scaling Sparse Fine-Tuning to Large Language Models. The scripts for the instruction-tuning experiments from the paper can be found at https://github.com/ducdauge/sft-llm. You can also find a simple QA example with πŸ€— Trainer here.

Installation

You can install this package as follows:

git clone https://github.com/AlanAnsell/peft.git
cd peft
python setup.py develop # or "pip install .", but this way is recommended

or use

pip install git+https://github.com/AlanAnsell/peft.git

Creating an SFT model

You can prepare a model for SFT as follows:

from transformers import AutoModelForCausalLM
from peft import get_peft_config, get_peft_model, SftConfig, TaskType
model_name_or_path = "meta-llama/Llama-2-7b-hf"

peft_config = SftConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    density=0.01,
    selection_algorithm="rigl", # or "sm3" for moment approximation SFT
    target_modules=["q_proj", "o_proj", "v_proj", "k_proj", "gate_proj", "up_proj", "down_proj"],
)

model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)

SFT with πŸ€— Trainer

Because SFT updates the set of trainable parameters during training, some code needs to be added to the training loop. If you are using πŸ€— Trainer, create an SftTrainer subclass and then construct it normally with your peft_config as argument like so:

from peft import SftTrainer

...

trainer_cls = SftTrainer(MyTrainer) # MyTrainer = Trainer or any subclass thereof
trainer = trainer_cls(
    model=model,
    args=training_args,
    ...
    sft_config=peft_config,
)

You should then be able to use trainer as you would normally.

SFT with a custom training loop

If you are using a custom training loop, you should use the SftAdamW/SftSM3 optimizer depending on whether you are using accumulated gradient or moment approximation SFT, and construct an SftSelector object:

from peft import SftAdamW, SftSM3, SftSelector

...

optimizer_grouped_parameters = [
    {
        "params": [
            p for n, p in model.named_parameters()
            if p.requires_grad
        ],
        "weight_decay": weight_decay,
    },
]

if peft_config.selection_algorithm == "sm3":
    deltas = {
        delta.values: delta
        for _1, _2, delta in model.active_deltas()
    }
    optimizer = SftSM3(
        optimizer_grouped_parameters,
        deltas,
        lr=learning_rate,
    )
else:
    optimizer = SftAdamW(
        optimizer_grouped_parameters,
        lr=learning_rate,
        momentum_dtype=torch.float32,
    )

...

selector = SftSelector(
    model,
    optimizer,
    peft_config,
    num_train_steps, # total expected duration of training in update steps
    gradient_accumulation_steps, # grad accumulation steps per update step
)

Then call the selector's .step() method at the end of each update step, e.g.

for i, batch in enumerate(train_dataloader):
    ...
    loss = model(**batch)
    loss.backward()
    ...

    if (i + 1) % grad_accumulation_steps == 0:
        ...
        optimizer.step()
        optimizer.zero_grad()
        selector.step()

SFT options

The following hyperparameters can be modified through the SftConfig:

  • density/num_tunable_weights set the number of tunable parameters as a proportion of total model params / as an absolute number respectively. Defaults to density=0.01.
  • selection_algorithm: sets the SFT selection algorithm. Supply "rigl" for gradient accumulation/RigL-style SFT or "sm3" for moment approximation SFT with the SM3 optimizer. Defaults to "rigl".
  • reselection_steps: sets the number of steps between parameter reselections. Defaults to 20. You may want to use a larger value for small batch sizes.
  • selection_accumulation_steps: for gradient accumulation SFT, controls the number of steps over which gradients are accumulated.
  • initial_reselection_rate: the proportion of parameters that will be reselected initially. This is reduced linearly to zero over the course of training. Defaults to 0.2.
  • target_modules: controls which linear modules SFT is applied to. If not supplied, SFT will be applied to all linear modules within Transformer blocks.

PEFT

For details on using PEFT please refer to the HuggingFace documentation or the πŸ€— PEFT repository.

Citing

If you use our SFT implementation, please use the following snippet to cite our work:

@misc{ansell2024scaling,
      title={Scaling Sparse Fine-Tuning to Large Language Models}, 
      author={Alan Ansell and Ivan Vulić and Hannah Sterz and Anna Korhonen and Edoardo M. Ponti},
      year={2024},
      eprint={2401.16405},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

If you want to cite πŸ€— PEFT in your publication, use the following snippet:

@Misc{peft,
  title =        {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
  author =       {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul},
  howpublished = {\url{https://github.com/huggingface/peft}},
  year =         {2022}
}

peft's People

Contributors

aarnphm avatar alanansell avatar alvanli avatar benjaminbossan avatar bigeagle avatar ducdauge avatar dumpmemory avatar glerzing avatar guspan-tanadi avatar hsterz avatar jiqing-feng avatar kashif avatar kovalexal avatar mayank31398 avatar mkhalusova avatar mrm8488 avatar nafiturgut avatar orenwang avatar pacman100 avatar qingruzhang avatar sauravmaheshkar avatar sayakpaul avatar stas00 avatar stevhliu avatar sumanthrh avatar sunmarc avatar sywangyi avatar thomas-schillaci avatar younesbelkada avatar zphang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

peft's Issues

What the sft_delta array represents and how it is intended to be updated?

Hi,

I've been examining the select_rigl method within the peft/src/peft/tunners/sft/trainer.py:SftSelector module. From my understanding, this method serves as the core selection methodology. However, upon inspection, I noticed that the sft_delta variable used within the function does not seem to be modified anywhere else in the codes.

I want to ensure that I am interpreting the code correctly. Could you please provide some insight into what the sft_delta array represents and how it is intended to be updated?

Thank you for your assistance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.