GithubHelp home page GithubHelp logo

moooocat / finetune_llama Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chaoyi-wu/finetune_llama

0.0 0.0 0.0 20.7 MB

简单易懂的LLaMA微调指南。

Shell 0.14% C++ 0.02% Python 85.11% C 0.01% Cuda 0.32% Makefile 0.01% Jupyter Notebook 6.98% Cython 0.01% Dockerfile 0.04% Jsonnet 0.01% MDX 7.36%

finetune_llama's Introduction

微调LLAMA的中文指南

本项目旨在引导中文用户微调Large Language Model(LLAMA),整合了目前多个框架(Minimal LLaMAAlpacaLMFlow),尽量避免不必要的封装,保证代码可读性。

S1:

进入Python_Package安装相关peft包和transformers包。

建议先使用pip安装online package保证依赖包都顺利安装,再pip install -e .本地安装替换。

注意
pytorch包务必使用conda安装!conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
不要忘记安装sentencepiece!pip install sentencepiece\

S2:

进入LLAMA_Model下载模型参数 https://huggingface.co/decapoda-research/llama-7b-hf 或者官网下载llama,使用convert_llama_weights_to_hf.py进行处理。

S3:

进入Data_sample按照示例处理数据。

S4:

修改finetune_pp.py或finetune_pp_peft.py相关参数(前者为整个网络参数均进行finetune,后者参考lora进行部分参数finetune),指定GPU,即可进行训练。

注意:finetune_pp.py与finetune_pp_peft.py无多卡加速,训练速度缓慢,但可以有效避免oom,适合debug。加速请参考FSDP多卡并行DeepSpeed多卡并行

S5:

参考test_sample.py进行测试,测试时,尽量避免使用多卡。

FSDP多卡并行:

finetune_pp_peft_trainer_lora.py与finetune_pp_peft_trainer.py种利用transformers.trainer简单实现单机多卡并行,使用fsdp解决了单卡爆卡的问题,训练速度显著加快。

DeepSpeed多卡并行:

DeepSpeed的库安装:
conda install -c omgarcia gcc-6,使用conda安装gcc6
conda install -c anaconda libstdcxx-ng ,更新gcc的动态库
git clone https://github.com/microsoft/DeepSpeed,下载DS库
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .完成安装。
cuda环境有问题可以参考这个issue#2684

sh finetune_pp_peft_trainer_deepspeed.sh进行训练,传入参数--lora_used True(False)控制是否使用lora。

DS版本支持33B(lora)llama快速进行finetune,训练时长与LMFlow一致。

实测:尽量避免使用DeepSpeed,DS默认使用cpu_offload极大的拖慢了训练速度。

checkpointing内存优化并行:

请参考finetune_pp_peft_trainer_checkpointing.sh 的实现。

对于超过7B的大模型,内存问题非常严重,使用gradient checkpointing可以大幅降低内存占用,扩大batch size 在大模型大数据的情况下,可以快速的进行预训练。代价是牺牲step数目。

训练时长统计:

在4.8M PMCOA papers上统计各种训练设置的耗时。

训练时默认采用8张A100,每次对paper随机抽取一段512 tokens长度的句子进行训练,等价于一个epoch会处理2.5Btokens。

Statistic on S2ORC (4.8M PMCOA papers)
Model_Size Batch_Size Accelerate Strategy Time/epoch
13B 384 DS*(Opt&Par) ~122h
7B 768 DS(Opt&Par) ~100h
7B 128 DS(Opt&Par) ~100h
7B 384 DS(Opt) ~90h
7B 384 FSDP_no_cpu ~35h
7B 128 FSDP_no_cpu ~36h

DS(Opt&Par):optimizer and persistent parameters offloaded to cpu
DS(Opt):optimizer offloaded to cpu
FSDP_no_cpu: No cpu involved
注:cpu参与会导致训练速度变慢,但规模上去后,比如13B,必须CPU参与才可以完成多卡并行。表中上标*代表必须采用这种加速策略才能避免OOM。

参数设置参考:https://github.com/mosaicml/examples/tree/release/v0.0.4/examples/llm/throughput

Acknowledge:

参考 Minimal LLaMA https://github.com/zphang/minimal-llama 实现,主要修复了部分bug。

参考alpaca https://github.com/tatsu-lab/stanford_alpaca 加入fsdp。

参考LMFLow https://github.com/OptimalScale/LMFlow/tree/main/src/lmflow 加入deepspeed模块。

LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

@article{touvron2023llama, title={LLaMA: Open and Efficient Foundation Language Models}, author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{'e}e and Rozi{`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume}, journal={arXiv preprint arXiv:2302.13971}, year={2023} }

finetune_llama's People

Contributors

chaoyi-wu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.