Comments (5)
Yeah, no worries, I thought I'd just add my example for future reference by anyone that stumbles across this issue :)
from pai-megatron-patch.
Hi, thanks for your interest.
For Qwen1.5 performance, please refer to this report
.
Please refer to convert for model converting and train for model training.
For finetuning Qwen-72b, you need 4 nodes of A100/A800-80G*8.
from pai-megatron-patch.
Hi @lwmlyy , thanks for the response.
Just in case anyone is interested, I was able to train Qwen 1.5 72B GPTQ 4bit with a LoRA adapter using this config with Axolotl using only 1 A100 (80GB) card.
base_model: Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
gptq: true
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: ABCDEFG.json
ds_type: json # see other options below
type: sharegpt
conversation: chatml
dataset_prepared_path: /workspace/ABCDEFG
val_set_size: 0.0002
output_dir: /workspace/ABCDEFG
sequence_len: 6000
sample_packing: true
pad_to_sequence_len:
adapter: lora
lora_model_dir:
lora_r: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
use_wandb: true
wandb_project: ABCDEFG
wandb_entity: ABCDEFG
wandb_name: ABCDEFG
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00005
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention:
warmup_steps: 10
evals_per_epoch: 40
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 40
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
from pai-megatron-patch.
Hi, the resource requirement I mentioned is for full-parameter training.
from pai-megatron-patch.
您好,qwen1.5的32和72b现在都已经正常支持了,可以pull下最新的代码再试试
from pai-megatron-patch.
Related Issues (20)
- 钉钉群满了 HOT 5
- [rank31]: OSError: error stat()ing file 数据集map问题
- 是否支持sharegpt格式数据?或者带"history"字段的多轮对话数据? HOT 1
- 打扰了,提个关于多机训练的issues HOT 2
- Flash-Attn 3的支持 HOT 1
- 群满了 HOT 3
- optimizer offloading 太强了 HOT 1
- Mcore是不支持pp吗? HOT 3
- starcoder依赖哪个版本的megatron-lm? HOT 3
- 保存的checkpoints中缺少distrib_optim.pt
- Channel Loss支持 HOT 1
- 断点续训问题 HOT 1
- 关于使用idxmap格式finetune qwen2 HOT 2
- mmap数据格式问题 HOT 1
- 安装pyarrow失败 HOT 1
- mcore 权重转换不支持pp>1 HOT 2
- 使用flash-attn训练Qwen1.5 1.8B 加速效果不明显 HOT 1
- TypeError: get_cpu_offload_context() missing 1 required positional argument: 'weight_offloading' HOT 2
- qwen2-sft 训练起步阶段就卡住 HOT 2
- deepseek模型转换问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pai-megatron-patch.