GithubHelp home page GithubHelp logo

ymcui / chinese-mixtral Goto Github PK

View Code? Open in Web Editor NEW
545.0 15.0 40.0 531 KB

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

Home Page: https://arxiv.org/abs/2403.01851

License: Apache License 2.0

Shell 2.89% Python 97.11%
32k large-language-models llm mixtral mixture-of-experts moe nlp 64k

chinese-mixtral's Introduction

🇨🇳中文 | 🌐English | 📖文档/Docs | ❓提问/Issues | 💬讨论/Discussions | ⚔️竞技场/Arena



GitHub GitHub release (latest by date) GitHub top language

本项目基于Mistral.ai发布的Mixtral模型进行开发,该模型使用了稀疏混合专家模型(Sparse MoE)架构。本项目利用大规模中文无标注数据进行了中文增量训练,得到了中文Mixtral基础模型,并且进一步通过指令精调,得到了中文Mixtral-Instruct指令模型。该模型原生支持32K上下文(实测可达128K),能够有效地处理长文本,同时在数学推理、代码生成等方面获得了显著性能提升。使用llama.cpp进行量化推理时,最低只需16G内存(或显存)。

技术报告[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral [论文解读]

本项目主要内容

  • 🚀 开源中文Mixtral基础模型,该模型在Mixtral-8x7B-v0.1的基础上进行了中文增量训练
  • 🚀 开源中文Mixtral-Instruct指令模型,该模型在中文Mixtral的基础上进一步进行了指令精调
  • 🚀 开源了预训练脚本、指令精调脚本,用户可根据需要进一步训练或微调模型
  • 🚀 提供了利用个人电脑CPU/GPU快速在本地进行大模型量化和部署的教程
  • 🚀 支持🤗transformers, llama.cpp, text-generation-webui, LangChain, privateGPT, vLLM等Mixtral生态

中文LLaMA-2&Alpaca-2大模型 | 中文LLaMA&Alpaca大模型 | 多模态中文LLaMA&Alpaca大模型 | 多模态VLE | 中文MiniRBT | 中文LERT | 中英文PERT | 中文MacBERT | 中文ELECTRA | 中文XLNet | 中文BERT | 知识蒸馏工具TextBrewer | 模型裁剪工具TextPruner | 蒸馏裁剪一体化GRAIN

新闻

[2024/04/30] Chinese-LLaMA-Alpaca-3 已正式发布,开源基于Llama-3的Llama-3-Chinese-8B和Llama-3-Chinese-8B-Instruct,请参阅:https://github.com/ymcui/Chinese-LLaMA-Alpaca-3

[2024/03/27] 添加1-bit/2-bit/3-bit量化版GGUF模型:[🤗HF];同时,本项目已入驻机器之心SOTA!模型平台,欢迎关注:https://sota.jiqizhixin.com/project/chinese-mixtral

[2024/03/26] 添加仿OpenAI API部署模式。详情查看:📚v1.2版本发布日志

[2024/03/05] 开源模型训练和精调代码,发布技术报告。详情查看:📚v1.1版本发布日志

[2024/01/29] 🚀 正式发布Chinese-Mixtral(基座模型),Chinese-Mixtral-Instruct(指令/chat模型)。详情查看:📚v1.0版本发布日志

内容导引

章节 描述
💁🏻‍♂️模型简介 简要介绍本项目相关模型的技术特点
⏬模型下载 中文Mixtral大模型下载地址
💻推理与部署 介绍了如何对模型进行量化并使用个人电脑部署并体验大模型
💯模型效果 介绍了模型在部分任务上的效果
📝训练与精调 介绍了如何训练和精调中文Mixtral大模型
❓常见问题 一些常见问题的回复

模型简介

本项目开源了基于Mixtral模型开发的中文Mixtral、中文Mixtral-Instruct模型,其主要特点如下:

📖 稀疏混合专家模型

Mixtral是一个稀疏混合专家模型。该模型与以往的LLaMA等主流大模型结构具有显著差异,主要体现在以下几点:

  • 每个FFN层包含8个不同的"专家"(全连接层),根据门控值选取最优的2个进行激活
  • 输入序列中的每个token都会独立地选取专家,而不是整个序列对应一组专家
  • 实际参数量约为46.7B,在推理时激活的参数量约为13B

以下是Mixtral论文中的结构示意图:



🚄 原生支持32K上下文(实测支持128K)

Chinese-LLaMA-Alpaca以及Chinese-LLaMA-Alpaca-2项目不同,Mixtral模型原生支持32K上下文(实测可达128K)。用户可使用单一模型来解决不同长度的各类任务。

模型下载

模型选择指引

以下是本项目的模型对比以及建议使用场景。如需聊天交互,请选择Instruct版。

对比项 中文Mixtral 中文Mixtral-Instruct
模型类型 基座模型 指令/Chat模型(类ChatGPT)
模型大小 8x7B(实际激活约13B) 8x7B(实际激活约13B)
专家数量 8个(实际激活2个) 8个(实际激活2个)
训练类型 Causal-LM (CLM) 指令精调
训练方式 QLoRA + 全量emb/lm-head QLoRA + 全量emb/lm-head
基于什么模型训练 原版Mixtral-8x7B-v0.1 中文Mixtral
训练语料 无标注通用语料 有标注指令数据
词表大小 原版词表,32000 原版词表,32000
支持上下文长度 32K(实测可达128K) 32K(实测可达128K)
输入模板 不需要 需要套用Mixtral-Instruct模板
适用场景 文本续写:给定上文,让模型生成下文 指令理解:问答、写作、聊天、交互等

下载地址

以下提供了3种不同类型的模型:

  • 完整版模型:直接下载即可使用,无需其他合并步骤,推荐网络带宽充足的用户;
  • LoRA版模型:无法单独使用,必须与原版Mixtral-8x7B-v0.1合并才能转为完整版模型,推荐网络带宽不足且手头有原版Mixtral的用户。合并方法请参阅:💻 模型合并步骤
  • GGUF版模型:兼容llama.cpp等工具的GGUF量化版模型,推荐只需要做推理部署的用户下载。
模型名称 类型 规格 完整版(87 GB) LoRA版(2.4 GB) GGUF版
Chinese-Mixtral 基座模型 8x7B [Baidu] [🤗HF]
[🤖ModelScope]
[Baidu] [🤗HF]
[🤖ModelScope]
[🤗HF]
Chinese-Mixtral-Instruct 指令模型 8x7B [Baidu] [🤗HF]
[🤖ModelScope]
[Baidu] [🤗HF]
[🤖ModelScope]
[🤗HF]

Note

若无法访问HF,可考虑一些镜像站点(如hf-mirror.com),具体方法请自行查找解决。

推理与部署

本项目中的相关模型主要支持以下量化、推理和部署方式,具体内容请参考对应教程。

工具 特点 CPU GPU 量化 GUI API vLLM 教程
llama.cpp 丰富的量化选项和高效本地推理 [link]
🤗Transformers 原生transformers推理接口 [link]
仿OpenAI API调用 仿OpenAI API接口的服务器Demo [link]
text-generation-webui 前端Web UI界面的部署方式 [link]
LangChain 适合二次开发的大模型应用开源框架 [link]
privateGPT 多文档本地问答框架 [link]
LM Studio 多平台聊天软件(带界面) [link]

模型效果

为了评测相关模型的效果,本项目分别进行了生成效果评测和客观效果评测(NLU类),从不同角度对大模型进行评估。推荐用户在自己关注的任务上进行测试,选择适配相关任务的模型。

生成效果评测

  • 本项目仿照Fastchat Chatbot Arena推出了模型在线对战平台,可浏览和评测模型回复质量。对战平台提供了胜率、Elo评分等评测指标,并且可以查看两两模型的对战胜率等结果。⚔️ 模型竞技场:http://llm-arena.ymcui.com
  • examples目录中提供了Chinese-Mixtral-Instruct与Chinese-Alpaca-2-13B的输出样例,并通过GPT-4进行了打分对比,Chinese-Mixtral-Instruct平均得分为8.20、Chinese-Alpaca-2-13B平均得分为7.05📄 输出样例对比:examples

客观效果评测

C-Eval

C-Eval是一个全面的中文基础模型评估套件,其中验证集和测试集分别包含1.3K和12.3K个选择题,涵盖52个学科。C-Eval推理代码请参考本项目:📖GitHub Wiki

Models 类型 Valid (0-shot) Valid (5-shot) Test (0-shot) Test (5-shot)
Chinese-Mixtral-Instruct 指令 51.7 55.0 50.0 51.5
Chinese-Mixtral 基座 45.8 54.2 43.1 49.1
Mixtral-8x7B-Instruct-v0.1 指令 51.6 54.0 48.7 50.7
Mixtral-8x7B-v0.1 基座 47.3 54.6 46.1 50.3
Chinese-Alpaca-2-13B 指令 44.3 45.9 42.6 44.0
Chinese-LLaMA-2-13B 基座 40.6 42.7 38.0 41.6

CMMLU

CMMLU是另一个综合性中文评测数据集,专门用于评估语言模型在中文语境下的知识和推理能力,涵盖了从基础学科到高级专业水平的67个主题,共计11.5K个选择题。CMMLU推理代码请参考本项目:📖GitHub Wiki

Models 类型 Test (0-shot) Test (5-shot)
Chinese-Mixtral-Instruct 指令 50.0 53.0
Chinese-Mixtral 基座 42.5 51.0
Mixtral-8x7B-Instruct-v0.1 指令 48.2 51.6
Mixtral-8x7B-v0.1 基座 44.3 51.6
Chinese-Alpaca-2-13B 指令 43.2 45.5
Chinese-LLaMA-2-13B 基座 38.9 42.5

MMLU

MMLU是一个用于评测自然语言理解能力的英文评测数据集,是当今用于评测大模型能力的主要数据集之一,其中验证集和测试集分别包含1.5K和14.1K个选择题,涵盖57个学科。MMLU推理代码请参考本项目:📖GitHub Wiki

Models 类型 Valid (0-shot) Valid (5-shot) Test (0-shot) Test (5-shot)
Chinese-Mixtral-Instruct 指令 65.1 69.6 67.5 69.8
Chinese-Mixtral 基座 63.2 67.1 65.5 68.3
Mixtral-8x7B-Instruct-v0.1 指令 68.5 70.4 68.2 70.2
Mixtral-8x7B-v0.1 基座 64.9 69.0 67.0 69.5
Chinese-Alpaca-2-13B 指令 49.6 53.2 50.9 53.5
Chinese-LLaMA-2-13B 基座 46.8 50.0 46.6 51.8

LongBench

LongBench是一个大模型长文本理解能力的评测基准,由6大类、20个不同的任务组成,多数任务的平均长度在5K-15K之间,共包含约4.75K条测试数据。以下是本项目模型在该中文任务(含代码任务)上的评测效果。LongBench推理代码请参考本项目:📖GitHub Wiki

Models 单文档QA 多文档QA 摘要 FS学习 代码补全 合成任务 平均
Chinese-Mixtral-Instruct 50.3 34.2 16.4 42.0 56.1 89.5 48.1
Chinese-Mixtral 32.0 23.7 0.4 42.5 27.4 14.0 23.3
Mixtral-8x7B-Instruct-v0.1 56.5 35.7 15.4 46.0 63.6 98.0 52.5
Mixtral-8x7B-v0.1 35.5 9.5 16.4 46.5 57.2 83.5 41.4
Chinese-Alpaca-2-13B-16K 47.9 26.7 13.0 22.3 46.6 21.5 29.7
Chinese-LLaMA-2-13B-16K 36.7 17.7 3.1 29.8 13.8 3.0 17.3
Chinese-Alpaca-2-7B-64K 44.7 28.1 14.4 39.0 44.6 5.0 29.3
Chinese-LLaMA-2-7B-64K 27.2 16.4 6.5 33.0 7.8 5.0 16.0

量化效果评测

在llama.cpp下,测试了Chinese-Mixtral量化版模型的性能,如下表所示。

F16 Q8_0 Q6_K Q5_K Q5_0 Q4_K Q4_0 Q3_K IQ3_XXS Q2_K IQ2_XS IQ2_XXS
Size (GB) 87.0 46.2 35.7 30.0 30.0 24.6 24.6 19.0 17.1 16.1 12.7 11.4
BPW 16.0 8.50 6.57 5.69 5.52 4.87 4.53 3.86 3.14 2.96 2.34 2.10
PPL - 4.4076 4.4092 4.4192 4.4224 4.4488 4.4917 4.5545 4.5990 5.1846 6.9784 8.5981
M3 Max Speed - - 36.0 36.9 35.7 31.2 27.8 37.6 - 29.1 - -
A100 Speed - - 29.9 22.6 20.5 21.7 17.1 21.7 20.6 20.3 23.7 22.5

Note

  • 模型大小:单位GB
  • BPW(Bits-Per-Weight):单位参数比特,例如Q6_K实际平均精度为6.57
  • PPL(困惑度):以4K上下文测量,数值越低越好
  • 生成速度:提供了Apple M3 Max(Metal)以及NVIDIA A100(40G)的生成速度(单位ms/token),数值越低越好

以Chinese-Mixtral-Q4_0为例,下图展示了不同上下文长度下的PPL变化趋势,选取了2组不同的纯文本数据。实验结果表明Mixtral模型支持的上下文长度已超过标称的32K,在64K+上下文下仍然具有较好的表现(实测可达128K)。



训练与精调

预训练

  • 在原版Mixtral的基础上,利用大规模无标注数据进行增量训练,得到Chinese-Mixtral基座模型
  • 训练数据采用Chinese-LLaMA-Alpaca项目中与基础版模型一致的数据,其总量约20G纯文本文件
  • 训练代码及使用教程:📖预训练脚本Wiki

指令精调

  • 在Chinese-Mixtral的基础上,利用有标注指令数据进行进一步精调,得到Chinese-Mixtral-Instruct指令模型
  • 训练数据采用了Chinese-LLaMA-Alpaca-2项目中使用的指令数据,其总量约500万条指令数据
  • 训练代码及使用教程:📖指令精调脚本Wiki

指令模板

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]

注意:<s></s>是表示序列开始和结束的特殊token,而[INST][/INST]则是普通字符串。

常见问题

请在提Issue前务必先查看FAQ中是否已存在解决方案。具体问题和解答请参考本项目 📖GitHub Wiki

问题1:后续会不会用更多数据进行训练?会不会做RLHF/DPO对齐?
问题2:为什么本次的模型没有做中文词表扩展?
问题3:是否支持Mixtral的下游生态?

引用

@article{chinese-mixtral,
      title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral}, 
      author={Cui, Yiming and Yao, Xin},
      journal={arXiv preprint arXiv:2403.01851},
      url={https://arxiv.org/abs/2403.01851},
      year={2024}
}

免责声明

本项目基于由Mistral.ai发布的Mixtral模型进行开发,使用过程中请严格遵守Mixtral的开源许可协议。如果涉及使用第三方代码,请务必遵从相关的开源许可协议。模型生成的内容可能会因为计算方法、随机因素以及量化精度损失等影响其准确性,因此,本项目不对模型输出的准确性提供任何保证,也不会对任何因使用相关资源和输出结果产生的损失承担责任。如果将本项目的相关模型用于商业用途,开发者应遵守当地的法律法规,确保模型输出内容的合规性,本项目不对任何由此衍生的产品或服务承担责任。

问题反馈

如有疑问,请在GitHub Issue中提交。礼貌地提出问题,构建和谐的讨论社区。

  • 在提交问题之前,请先查看FAQ能否解决问题,同时建议查阅以往的issue是否能解决你的问题。
  • 提交问题请使用本项目设置的Issue模板,以帮助快速定位具体问题。
  • 重复以及与本项目无关的issue会被stable-bot处理,敬请谅解。

chinese-mixtral's People

Contributors

imounttai avatar ymcui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chinese-mixtral's Issues

训练错误

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

None

操作系统

None

详细描述问题

全参或lora微调时遇到了问题,遇到的问题同hiyouga/LLaMA-Factory#1998
目前只有4bits+zero2_no_offload可以跑通,想问一下作者在训练中有没有遇到这个问题,怎么解决的,谢谢

依赖情况(代码类问题务必提供)

No response

运行日志或截图

No response

训练脚本会开放吗

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

模型训练与精调

操作系统

Linux

详细描述问题

在此处粘贴运行代码

none

### 依赖情况(代码类问题务必提供)


non

### 运行日志或截图


none

训练成本问题请教?

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

其他问题

操作系统

Linux

详细描述问题

(请在此处详细描述遇到的问题)
请问使用的训练数据集大小以及使用的显卡型号数量和显存情况?我们希望能够在自己的数据集上进一步的微调以适应我们自己的领域。感谢支持~

依赖情况(代码类问题务必提供)

不涉及

运行日志或截图

不涉及

sft微调的时候,保存完一个checkpoint后中断,然后试着从保存的checkpoint继续跑,报同样的错误,请问该如何解决?

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

模型训练与精调

操作系统

Linux

详细描述问题

(请在此处详细描述遇到的问题)

# CUDA_VISIBLE_DEVICES=0,1,2,3
lr=1e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate,w1,w2,w3"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=/share1/zouff/llm_model/chinese-mixtral-instruct
dataset_dir=/share1/zouff/py_pro/Chinese-Mixtral/train_data/zybm_mix_1_0318
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
max_seq_length=512
output_dir=/share1/zouff/py_pro/Chinese-Mixtral/output_model/zybm_mix_1_0318
validation_file=/share1/zouff/py_pro/Chinese-Mixtral/scripts/data/zybm_medical_dev_new.json

deepspeed_config_file=ds_zero2_no_offload.json

torchrun --nnodes 1 --nproc_per_node 4 run_clm_sft_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${pretrained_model} \
    --dataset_dir ${dataset_dir} \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --do_train \
    --do_eval \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 3 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.1 \
    --logging_strategy steps \
    --logging_steps 200 \
    --save_strategy steps \
    --save_total_limit 5 \
    --evaluation_strategy steps \
    --eval_steps 200 \
    --save_steps 200 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --max_seq_length ${max_seq_length} \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --modules_to_save ${modules_to_save} \
    --torch_dtype float16 \
    --validation_file ${validation_file} \
    --load_in_kbits 4 \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False \
    --output_router_logits \
    --resume_from_checkpoint /share1/zouff/py_pro/Chinese-Mixtral/output_model/zybm_mix_1_0318/checkpoint-200

依赖情况(代码类问题务必提供)

# accelerate==0.27.2
addict==2.4.0
aiofiles==23.2.1
aiohttp==3.9.3
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.0
aliyun-python-sdk-kms==2.16.2
altair==5.2.0
annotated-types==0.6.0
anyio==4.3.0
arxiv==2.1.0
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes==0.42.0
blessed==1.20.0
blinker==1.7.0
cachetools==5.3.3
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1704278392174/work
contourpy==1.2.0
cpm-kernels==1.0.11
crcmod==1.7
cryptography==42.0.5
cupy-cuda12x==12.1.0
cycler==0.12.1
dataclasses-json==0.6.4
datasets==2.17.1
debugpy @ file:///croot/debugpy_1690905042057/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
deepspeed==0.13.1
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
docstring-parser==0.15
einops==0.7.0
entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1704921103267/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work
fastapi==0.110.0
fastrlock==0.8.2
feedparser==6.0.10
ffmpy==0.3.2
filelock==3.13.1
fonttools==4.49.0
frozenlist==1.4.1
fsspec==2023.10.0
gast==0.5.4
gitdb==4.0.11
GitPython==3.1.42
gpustat==1.1.1
gradio==4.19.2
gradio_client==0.10.1
greenlet==3.0.3
h11==0.14.0
hjson==3.1.0
httpcore==1.0.4
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.21.3
idna==3.6
importlib-metadata==7.0.1
importlib_resources==6.1.2
interegular==0.3.3
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1708996548741/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1709559745751/work
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
jieba==0.42.1
Jinja2==3.1.3
jmespath==0.10.0
joblib==1.3.2
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter-client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1654730843242/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1704727030956/work
kiwisolver==1.4.5
langchain==0.1.9
langchain-community==0.0.24
langchain-core==0.1.27
langchainhub==0.1.14
langsmith==0.1.10
lark==1.1.9
latex2mathml==3.77.0
llvmlite==0.42.0
loguru==0.7.2
Markdown==3.5.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.21.0
matplotlib==3.8.3
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
mdtex2html==1.3.0
mdurl==0.1.2
modelscope==1.13.0
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
numba==0.59.0
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.535.133
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openai==1.13.3
orjson==3.9.15
oss2==2.18.4
outlines==0.0.34
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1696202382185/work
pandas==2.2.1
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
peft==0.9.0
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pillow==10.2.0
pip==23.3.1
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1706713388748/work
prometheus_client==0.20.0
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1702399386289/work
protobuf==4.25.3
psutil @ file:///opt/conda/conda-bld/psutil_1656431268089/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
py-cpuinfo==9.0.0
pyarrow==15.0.0
pyarrow-hotfix==0.6
pycparser==2.21
pycryptodome==3.20.0
pydantic==2.6.3
pydantic_core==2.16.3
pydeck==0.8.1b0
pydub==0.25.1
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1700607939962/work
PyJWT==2.8.0
pynvml==11.5.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq @ file:///croot/pyzmq_1705605076900/work
ray==2.9.3
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.1
rouge-chinese==1.0.3
rpds-py==0.18.0
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
ruff==0.2.2
safetensors==0.4.2
scikit-learn==1.4.1.post1
scipy==1.12.0
semantic-version==2.10.0
sentence-transformers==2.4.0
sentencepiece==0.2.0
setuptools==68.2.2
sgmllib3k==1.0.0
shellingham==1.5.4
shtab==1.7.0
simplejson==3.19.2
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smmap==5.0.1
sniffio==1.3.1
sortedcontainers==2.4.0
SQLAlchemy==2.0.27
sse-starlette==2.0.0
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
starlette==0.36.3
streamlit==1.31.1
sympy==1.12
tenacity==8.2.3
threadpoolctl==3.3.0
tiktoken==0.6.0
timm==0.9.16
tokenizers==0.15.2
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.0
toolz==0.12.1
torch==2.1.2
torchvision==0.17.1
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1648827254365/work
tqdm==4.66.2
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1704212992681/work
transformers==4.38.2
triton==2.1.0
trl==0.7.11
typer==0.9.0
types-requests==2.31.0.20240218
typing-inspect==0.9.0
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1708904622550/work
tyro==0.7.3
tzdata==2024.1
tzlocal==5.2
urllib3==2.2.1
uvicorn==0.27.1
uvloop==0.19.0
validators==0.22.0
vllm==0.3.3
watchdog==4.0.0
watchfiles==0.21.0
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
websockets==11.0.3
wheel==0.41.2
xformers==0.0.23.post1
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zhipuai==2.0.1
zipp==3.17.0

运行日志或截图

#  26%|██▋       | 201/762 [00:35<01:38,  5.67it/s]
 27%|██▋       | 202/762 [01:09<03:53,  2.40it/s]
 27%|██▋       | 203/762 [01:44<07:02,  1.32it/s]
 27%|██▋       | 204/762 [02:18<11:24,  1.23s/it]
 27%|██▋       | 205/762 [02:53<17:33,  1.89s/it]
 27%|██▋       | 206/762 [03:27<25:40,  2.77s/it]
 27%|██▋       | 207/762 [04:01<36:33,  3.95s/it]
 27%|██▋       | 208/762 [04:35<50:38,  5.49s/it]
 27%|██▋       | 209/762 [05:10<1:08:46,  7.46s/it]
 28%|██▊       | 210/762 [05:44<1:30:08,  9.80s/it]
 28%|██▊       | 211/762 [06:17<1:54:33, 12.47s/it]
 28%|██▊       | 212/762 [06:51<2:21:25, 15.43s/it]
 28%|██▊       | 213/762 [07:25<2:49:23, 18.51s/it]
 28%|██▊       | 214/762 [07:59<3:15:56, 21.45s/it]
 28%|██▊       | 215/762 [08:34<3:41:47, 24.33s/it]
 28%|██▊       | 216/762 [09:08<4:02:33, 26.65s/it]
 28%|██▊       | 217/762 [09:42<4:18:47, 28.49s/it]
 29%|██▊       | 218/762 [10:16<4:31:55, 29.99s/it]
 29%|██▊       | 219/762 [10:51<4:41:46, 31.14s/it]
 29%|██▉       | 220/762 [11:25<4:49:47, 32.08s/it]
 29%|██▉       | 221/762 [12:00<4:55:29, 32.77s/it]
 29%|██▉       | 222/762 [12:34<4:59:26, 33.27s/it]
 29%|██▉       | 223/762 [13:08<5:01:07, 33.52s/it]
 29%|██▉       | 224/762 [13:42<5:01:59, 33.68s/it]
 30%|██▉       | 225/762 [14:17<5:02:52, 33.84s/it]
 30%|██▉       | 226/762 [14:51<5:03:45, 34.00s/it]
 30%|██▉       | 227/762 [15:25<5:04:03, 34.10s/it]
 30%|██▉       | 228/762 [15:59<5:03:23, 34.09s/it]
 30%|███       | 229/762 [16:33<5:02:05, 34.01s/it]
 30%|███       | 230/762 [17:07<5:01:36, 34.02s/it]
 30%|███       | 231/762 [17:42<5:02:07, 34.14s/it]
 30%|███       | 232/762 [18:15<5:00:55, 34.07s/it]
 31%|███       | 233/762 [18:49<5:00:10, 34.05s/it]
 31%|███       | 234/762 [19:24<4:59:41, 34.06s/it]
 31%|███       | 235/762 [19:58<4:59:40, 34.12s/it]
 31%|███       | 236/762 [20:32<4:58:51, 34.09s/it]
 31%|███       | 237/762 [21:06<4:58:00, 34.06s/it]
 31%|███       | 238/762 [21:40<4:57:09, 34.03s/it]
 31%|███▏      | 239/762 [22:14<4:56:19, 33.99s/it]
 31%|███▏      | 240/762 [22:48<4:56:25, 34.07s/it][E ProcessGroupNCCL.cpp:475] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=6909, OpType=ALLREDUCE, NumelIn=131072000, NumelOut=131072000, Timeout(ms)=1800000) ran for 1800084 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:475] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=6909, OpType=ALLREDUCE, NumelIn=131072000, NumelOut=131072000, Timeout(ms)=1800000) ran for 1800654 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:916] [Rank 2] NCCL watchdog thread terminated with exception: [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=6909, OpType=ALLREDUCE, NumelIn=131072000, NumelOut=131072000, Timeout(ms)=1800000) ran for 1800084 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:916] [Rank 3] NCCL watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=6909, OpType=ALLREDUCE, NumelIn=131072000, NumelOut=131072000, Timeout(ms)=1800000) ran for 1800654 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:475] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=6909, OpType=ALLREDUCE, NumelIn=131072000, NumelOut=131072000, Timeout(ms)=1800000) ran for 1800128 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=6909, OpType=ALLREDUCE, NumelIn=131072000, NumelOut=131072000, Timeout(ms)=1800000) ran for 1800128 milliseconds before timing out.
[2024-03-19 10:54:46,862] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 128173 closing signal SIGTERM
[2024-03-19 10:54:46,862] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 128174 closing signal SIGTERM
[2024-03-19 10:54:54,867] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -6) local_rank: 0 (pid: 128172) of binary: /home/zouff/anaconda3/envs/glm/bin/python
Traceback (most recent call last):
  File "/home/zouff/anaconda3/envs/glm/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
=======================================================
run_clm_sft_with_peft.py FAILED
-------------------------------------------------------
Failures:
[1]:
  time      : 2024-03-19_10:54:46
  host      : g01
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 128175)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 128175
-------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-19_10:54:46
  host      : g01
  rank      : 0 (local_rank: 0)
  exitcode  : -6 (pid: 128172)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 128172
=======================================================

词表扩充问题

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

其他问题

操作系统

Linux

详细描述问题

请问为什么没有进行词表扩充呢?Chinese-Llama里面做了词表扩充,这样不会对中文更友好吗

依赖情况(代码类问题务必提供)

# 请在此处粘贴依赖情况(图片请放在代码块外,否则不能显示)

运行日志或截图

# 请在此处粘贴运行日志(图片请放在代码块外,否则不能显示)

训练细节

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

None

操作系统

Linux

详细描述问题

请问可以提供一下训练代码吗,用自己代码训练时遇到一些问题,谢谢

# 请在此处粘贴运行代码

依赖情况(代码类问题务必提供)

# 请在此处粘贴依赖情况(图片请放在代码块外,否则不能显示)

运行日志或截图

# 请在此处粘贴运行日志(图片请放在代码块外,否则不能显示)

扩充词表了吗

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

None

操作系统

None

详细描述问题

(请在此处详细描述遇到的问题)

# 请在此处粘贴运行代码

依赖情况(代码类问题务必提供)

# 请在此处粘贴依赖情况(图片请放在代码块外,否则不能显示)

运行日志或截图

# 请在此处粘贴运行日志(图片请放在代码块外,否则不能显示)

Compution assesment

Check before submitting issues

  • Make sure to pull the latest code, as some issues and bugs have been fixed.
  • I have read the Wiki and FAQ section AND searched for similar issues and did not find a similar problem or solution
  • Third-party plugin issues - e.g., llama.cpp, LangChain, text-generation-webui, we recommend checking the corresponding project for solutions

Type of Issue

Other issues

Operating System

None

Describe your issue in detail

you had described that the training was with 48 A40 GPUs, can you share (or estimate) also the time it took to train the pretraining phase and the instructions phase?

Dependencies (must be provided for code-related issues)

No response

Execution logs or screenshots

No response

load_in_kbits设置成8和16都报错,只有4能微调,这是啥原因?

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

模型训练与精调

操作系统

Linux

详细描述问题

load_in_kbits设置成8和16都报错,只有4能微调,这是啥原因

# CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
lr=1e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate,w1,w2,w3"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=/share1/zouff/llm_model/chinese-mixtral-instruct
dataset_dir=/share1/zouff/py_pro/Chinese-Mixtral/train_data/zybm_mix_1_0318
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
max_seq_length=512
output_dir=/share1/zouff/py_pro/Chinese-Mixtral/output_model/zybm_mix_30_0320
validation_file=/share1/zouff/py_pro/Chinese-Mixtral/scripts/data/zybm_medical_dev_new.json

deepspeed_config_file=ds_zero2_no_offload.json

torchrun --nnodes 1 --nproc_per_node 8 --master_port 12355 run_clm_sft_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${pretrained_model} \
    --dataset_dir ${dataset_dir} \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --do_train \
    --do_eval \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 30 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.1 \
    --logging_strategy steps \
    --logging_steps 200 \
    --save_strategy steps \
    --save_total_limit 40 \
    --evaluation_strategy steps \
    --eval_steps 200 \
    --save_steps 200 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --max_seq_length ${max_seq_length} \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --modules_to_save ${modules_to_save} \
    --torch_dtype float16 \
    --validation_file ${validation_file} \
    --load_in_kbits 8 \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False \
    --output_router_logits \

依赖情况(代码类问题务必提供)

# accelerate                0.27.2
addict                    2.4.0
aiofiles                  23.2.1
aiohttp                   3.9.3
aiosignal                 1.3.1
aliyun-python-sdk-core    2.15.0
aliyun-python-sdk-kms     2.16.2
altair                    5.2.0
annotated-types           0.6.0
anyio                     4.3.0
arxiv                     2.1.0
asttokens                 2.4.1
async-timeout             4.0.3
attrs                     23.2.0
bitsandbytes              0.42.0
blessed                   1.20.0
blinker                   1.7.0
cachetools                5.3.3
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpickle               3.0.0
colorama                  0.4.6
comm                      0.2.1
contourpy                 1.2.0
cpm-kernels               1.0.11
crcmod                    1.7
cryptography              42.0.5
cupy-cuda12x              12.1.0
cycler                    0.12.1
dataclasses-json          0.6.4
datasets                  2.17.1
debugpy                   1.6.7
decorator                 5.1.1
deepspeed                 0.13.1
dill                      0.3.8
diskcache                 5.6.3
distro                    1.9.0
docstring-parser          0.15
einops                    0.7.0
entrypoints               0.4
exceptiongroup            1.2.0
executing                 2.0.1
fastapi                   0.110.0
fastrlock                 0.8.2
feedparser                6.0.10
ffmpy                     0.3.2
filelock                  3.13.1
fonttools                 4.49.0
frozenlist                1.4.1
fsspec                    2023.10.0
gast                      0.5.4
gitdb                     4.0.11
GitPython                 3.1.42
gpustat                   1.1.1
gradio                    4.19.2
gradio_client             0.10.1
greenlet                  3.0.3
h11                       0.14.0
hjson                     3.1.0
httpcore                  1.0.4
httptools                 0.6.1
httpx                     0.27.0
huggingface-hub           0.21.3
idna                      3.6
importlib-metadata        7.0.1
importlib_resources       6.1.2
interegular               0.3.3
ipykernel                 6.29.3
ipython                   8.22.2
jedi                      0.19.1
jieba                     0.42.1
Jinja2                    3.1.3
jmespath                  0.10.0
joblib                    1.3.2
jsonpatch                 1.33
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter-client            7.3.4
jupyter_core              5.7.1
kiwisolver                1.4.5
langchain                 0.1.9
langchain-community       0.0.24
langchain-core            0.1.27
langchainhub              0.1.14
langsmith                 0.1.10
lark                      1.1.9
latex2mathml              3.77.0
llvmlite                  0.42.0
loguru                    0.7.2
Markdown                  3.5.2
markdown-it-py            3.0.0
MarkupSafe                2.1.5
marshmallow               3.21.0
matplotlib                3.8.3
matplotlib-inline         0.1.6
mdtex2html                1.3.0
mdurl                     0.1.2
modelscope                1.13.0
mpmath                    1.3.0
msgpack                   1.0.8
multidict                 6.0.5
multiprocess              0.70.16
mypy-extensions           1.0.0
nest_asyncio              1.6.0
networkx                  3.2.1
ninja                     1.11.1.1
nltk                      3.8.1
numba                     0.59.0
numpy                     1.26.4
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-ml-py              12.535.133
nvidia-nccl-cu12          2.18.1
nvidia-nvjitlink-cu12     12.3.101
nvidia-nvtx-cu12          12.1.105
openai                    1.13.3
orjson                    3.9.15
oss2                      2.18.4
outlines                  0.0.34
packaging                 23.2
pandas                    2.2.1
parso                     0.8.3
peft                      0.9.0
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    10.2.0
pip                       23.3.1
platformdirs              4.2.0
prometheus_client         0.20.0
prompt-toolkit            3.0.42
protobuf                  4.25.3
psutil                    5.9.0
ptyprocess                0.7.0
pure-eval                 0.2.2
py-cpuinfo                9.0.0
pyarrow                   15.0.0
pyarrow-hotfix            0.6
pycparser                 2.21
pycryptodome              3.20.0
pydantic                  2.6.3
pydantic_core             2.16.3
pydeck                    0.8.1b0
pydub                     0.25.1
Pygments                  2.17.2
PyJWT                     2.8.0
pynvml                    11.5.0
pyparsing                 3.1.1
python-dateutil           2.8.2
python-dotenv             1.0.1
python-multipart          0.0.9
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     25.1.2
ray                       2.9.3
referencing               0.33.0
regex                     2023.12.25
requests                  2.31.0
rich                      13.7.1
rouge-chinese             1.0.3
rpds-py                   0.18.0
ruamel.yaml               0.18.6
ruamel.yaml.clib          0.2.8
ruff                      0.2.2
safetensors               0.4.2
scikit-learn              1.4.1.post1
scipy                     1.12.0
semantic-version          2.10.0
sentence-transformers     2.4.0
sentencepiece             0.2.0
setuptools                68.2.2
sgmllib3k                 1.0.0
shellingham               1.5.4
shtab                     1.7.0
simplejson                3.19.2
six                       1.16.0
smmap                     5.0.1
sniffio                   1.3.1
sortedcontainers          2.4.0
SQLAlchemy                2.0.27
sse-starlette             2.0.0
stack-data                0.6.2
starlette                 0.36.3
streamlit                 1.31.1
sympy                     1.12
tenacity                  8.2.3
threadpoolctl             3.3.0
tiktoken                  0.6.0
timm                      0.9.16
tokenizers                0.15.2
toml                      0.10.2
tomli                     2.0.1
tomlkit                   0.12.0
toolz                     0.12.1
torch                     2.1.2
torchvision               0.17.1
tornado                   6.1
tqdm                      4.66.2
traitlets                 5.14.1
transformers              4.38.2
triton                    2.1.0
trl                       0.7.11
typer                     0.9.0
types-requests            2.31.0.20240218
typing_extensions         4.10.0
typing-inspect            0.9.0
tyro                      0.7.3
tzdata                    2024.1
tzlocal                   5.2
urllib3                   2.2.1
uvicorn                   0.27.1
uvloop                    0.19.0
validators                0.22.0
vllm                      0.3.3
watchdog                  4.0.0
watchfiles                0.21.0
wcwidth                   0.2.13
websockets                11.0.3
wheel                     0.41.2
xformers                  0.0.23.post1
xxhash                    3.4.1
yapf                      0.40.2
yarl                      1.9.4
zhipuai                   2.0.1
zipp                      3.17.0

运行日志或截图

# Traceback (most recent call last):
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
Traceback (most recent call last):
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
    main()
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
        main()train_result = trainer.train(resume_from_checkpoint=checkpoint)

  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
Traceback (most recent call last):
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
    main()
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
    return inner_training_loop(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
    return inner_training_loop(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    return inner_training_loop(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    tr_loss_step = self.training_step(model, inputs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
        tr_loss_step = self.training_step(model, inputs)torch.autograd.backward(outputs_with_grad, args_with_grad)

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
Traceback (most recent call last):
Traceback (most recent call last):
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
    self.optimizer.backward(loss, retain_graph=retain_graph)
Traceback (most recent call last):
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
Traceback (most recent call last):
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
        main()main()

  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    main()
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
    main()
      File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)    
train_result = trainer.train(resume_from_checkpoint=checkpoint)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
    torch.autograd.backward(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    return user_fn(self, *args)    
return inner_training_loop(  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    
return inner_training_loop(  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
    torch.autograd.backward(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    return inner_training_loop(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
            Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward passreturn inner_training_loop(Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass


  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
        return user_fn(self, *args)return user_fn(self, *args)

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
    RuntimeErrortr_loss_step = self.training_step(model, inputs): 
    The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
tr_loss_step = self.training_step(model, inputs)

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    tr_loss_step = self.training_step(model, inputs)
      File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
return user_fn(self, *args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
    tr_loss_step = self.training_step(model, inputs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    self.engine.backward(loss, **kwargs)    
self.deepspeed_engine_wrapped.backward(loss, **kwargs)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    
ret_val = func(*args, **kwargs)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
    ret_val = func(*args, **kwargs)    
self.deepspeed_engine_wrapped.backward(loss, **kwargs)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
        self.engine.backward(loss, **kwargs)ret_val = func(*args, **kwargs)

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
    ret_val = func(*args, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)    
self.optimizer.backward(loss, retain_graph=retain_graph)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    torch.autograd.backward(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    scaled_loss.backward(retain_graph=retain_graph)    
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    
torch.autograd.backward(  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(    
Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    
return user_fn(self, *args)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
            return user_fn(self, *args)torch.autograd.backward(torch.autograd.backward(outputs_with_grad, args_with_grad)


  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    return user_fn(self, *args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
        torch.autograd.backward(outputs_with_grad, args_with_grad)Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass    
    
Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply


  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
            Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward passreturn user_fn(self, *args)    return user_fn(self, *args)

Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
        return user_fn(self, *args)    torch.autograd.backward(outputs_with_grad, args_with_grad)
return user_fn(self, *args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward

      File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
.mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))    
.mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: RuntimeErrorThe size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0: 
    The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0return user_fn(self, *args)

  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0
Traceback (most recent call last):
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 424, in <module>
    main()
  File "/share1/zouff/py_pro/Chinese-Mixtral/scripts/training/run_clm_sft_with_peft.py", line 396, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train
    return inner_training_loop(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/transformers/trainer.py", line 2911, in training_step
    self.accelerator.backward(loss)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1960, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0

  0%|          | 0/3810 [00:03<?, ?it/s]
[2024-03-21 09:55:50,608] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 34314) of binary: /home/zouff/anaconda3/envs/glm/bin/python
Traceback (most recent call last):
  File "/home/zouff/anaconda3/envs/glm/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/zouff/anaconda3/envs/glm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
run_clm_sft_with_peft.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 34315)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 34316)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 34317)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 4 (local_rank: 4)
  exitcode  : 1 (pid: 34318)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 34319)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 6 (local_rank: 6)
  exitcode  : 1 (pid: 34320)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 7 (local_rank: 7)
  exitcode  : 1 (pid: 34321)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-21_09:55:50
  host      : g01
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 34314)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Chinese-Mixtral-Instruct有考虑上传到魔塔社区吗?

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案。
  • 第三方插件问题:例如llama.cppLangChaintext-generation-webui等,同时建议到对应的项目中查找解决方案。

问题类型

None

操作系统

None

详细描述问题

百度和HF这两种方式下载都非常慢,对linux系统支持不友好。

依赖情况(代码类问题务必提供)

# 请在此处粘贴依赖情况(图片请放在代码块外,否则不能显示)

运行日志或截图

# 请在此处粘贴运行日志(图片请放在代码块外,否则不能显示)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.