crazyboym / llama3-chinese-chat Goto Github PK

View Code? Open in Web Editor NEW

3.3K 38.0 267.0 266 KB

Llama3 中文仓库（聚合资料，各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档）

Home Page: https://huggingface.co/shareAI

Python 100.00%

llama llama2 llama3 llama3-chinese llama3-finetune

llama3-chinese-chat's Introduction

llama3-Chinese-chat

first version of llama3 in Chinese (首个llama3 中文版) ，本仓库供交流llama3中文相关学习内容，欢迎任何人加入共建PR

新增Phi3模型中文资料仓库（性能超越llama3 8b，以小搏大)，正在适配中，欢迎关注：https://github.com/CrazyBoyM/phi3-Chinese

如果你有自己微调的版本或者在网上发现有趣的特化版本，欢迎在issue区评论收录。
如有你有想要建设的内容版块，欢迎fork提交PR成为核心作者成员。
(注意：目前不再接受仅修改单个字、句的typo-PR，请避免频繁提交该类PR)

更新记录

2024-04-19 下午1点：🍺 世界上首个llama3 中文版训练完成，晚上没睡觉哈哈，使用170k+高质量多轮中文对话数据连夜训练得到。
2024-04-20 早上7点：v2版训练完成
2023-04-20 晚上23点：instruct 中文版训练完成
2024-04-21 晚上2点：增加训练教程、推理教程、网页部署等文档整理
2024-05-04 五一假期间：🚀 新增语言偏好强化对齐版本（直接对英文instruct版做DPO）。保持原汁原味的口吻回复(喜欢趣味语言、emoji表情)，模型下载，gguf量化版下载，语言偏好强化数据集工作已开源
2024-05-13 💪 增加LMStudio电脑本地部署教程，文档教程，手把手视频教程
2024-05-17 💪 增加手写API部署教程、命令调用，文档地址
2024-05-17 🎉 整理的llama3中文化数据集合在modelscope下载量达2.9k次，连续三周处于modelscope网站首页：数据下载地址

Base中文SFT版：
Instruct偏好强化学习版：

近期todo：录制b站视频、封装云端训练镜像、放出量化后gguf、ollama版本及教程
近期（预计一个半月后）还会开源一个浏览器插件，AI笔记功能+AI思维导图功能，在跟同学一起用闲时开发（他们俩是主力哈哈），欢迎关注～。07-03更新：已经做了一个浏览器AI插件Demo：https://github.com/CrazyBoyM/doc2mindmap

可用Chat版模型整理

llama3相关对话版本优质权重整理：（欢迎issue补充）

shareAI系列：
- base预训练 + 直接中文SFT版:
  - 训练数据：https://modelscope.cn/datasets/baicai003/Llama3-Chinese-dataset/summary
  - V1版
    - OpenCSG满速下载：https://opencsg.com/models/shareAI/llama3-Chinese-chat-8b
    - WiseModel满速下载：https://wisemodel.cn/models/shareAI/llama3-Chinese-chat-8b
  - V2版
    - modelscope：https://modelscope.cn/models/baicai003/Llama3-Chinese_v2/summary
    - 思维导图生成能力强化LoRA：https://modelscope.cn/models/shareAI/llama3-instruct-8b-cn-doc2markmap-lora
- Instruct + 继续中文SFT版：
  - modelscope模型下载：https://modelscope.cn/models/baicai003/llama-3-8b-Instruct-chinese_v2/summary
  - 云服务器镜像在线体验（点击即用，免费 4 小时）：https://www.suanyun.cn/console/share?uuid=b1ba51908f8a4bd1af37148765c293ee
- Instruct + 强化学习中文版：
  - llama3 instruct DPO版（10分钟左右可训练好，对原多语言instruct版最小化性能损伤，实测超过大多中文大量训练版）
    - modelscope下载：https://modelscope.cn/models/baicai003/Llama3-Chinese-instruct-DPO-beta0.5/summary
    - 偏好学习数据集：DPO-zh-en-emoji
- Base预训练 + 海量中文优质数据增量预训练：正在进行中
- 70b 中文版：计划中
by zhuangxialie，因对话模版设置错误，需要用fastchat体验：
- Base + 中文SFT：https://modelscope.cn/models/zhuangxialie/Llama3_Chinese_Sft/files
- Base + ORPO：https://modelscope.cn/models/zhuangxialie/Llama3-Chinese-ORPO/summary
- Instruct + DPO：https://www.modelscope.cn/models/zhuangxialie/Llama3-Chinese-DPO/summary
llama3 Pro（加block版，推荐网友积极在该方案上做更多尝试、探索）：
- linjh1118网友（第一个ORPO偏好对齐 + 扩展2*blocks）：https://github.com/linjh1118/Llama3-Chinese-ORPO
llama3 Moe增强版：
- cooper12121-llama3-8x8b-MoE：https://github.com/cooper12121/llama3-8x8b-MoE
长上下文版本：
- 联通微调版v2 (中文，28k上下文）：https://huggingface.co/UnicomLLM/Unichat-llama3-Chinese-8B-28K
- 262k上下文（英文）：https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k
- 262k上下文（中文）：计划中
- 无限上下文版本：计划中，参考：https://medium.com/neoxia/llm-infini-attention-with-linear-complexity-3209b87a77c3
其他普通中文微调版本：
- 中兴微调版（DPO） - 70B：https://www.modelscope.cn/models/ZTEAIM2024/Llama3_70B_instruct_chinese/summary
- 联通微调版（SFT）：https://www.modelscope.cn/models/UnicomAI/Unichat-llama3-Chinese/summary
- Openbuddy微调版（SFT，据说不错）：https://www.modelscope.cn/models/OpenBuddy/openbuddy-llama3-8b-v21.1-8k/summary
- zhichen微调版（ORPO方法，应该是第一个orpo）：https://github.com/seanzhang-zhichen/llama3-chinese
- shenzhi-wang微调版（ORPO方法，也说是第一个orpo）：https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat
- Rookie微调版（SFT）：https://github.com/Rookie1019/Llama-3-8B-Instruct-Chinese
- hit-sz klc-lab 微调版：https://github.com/zyg18181818/Llama-3-Chinese
破解安全限制系列（nsfw）：
- Unholy：https://huggingface.co/Undi95/Llama-3-Unholy-8B
- neural-chat：https://hf-mirror.com/Locutusque/llama-3-neural-chat-v1-8b
- dolphin：https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
- Orion: https://huggingface.co/Orion-zhen/Llama3-70B-Orion-Chinese 破限+中文, 并保留了原版llama3喜欢emoji的习惯
v-llama3 多模态版：（支持文字以外的输入、输出）
- 图像问答：
  - Bunny-Llama-3-8B-V：https://wisemodel.cn/models/BAAI/Bunny-Llama-3-8B-V
  - llava-llama-3-8b：https://huggingface.co/xtuner/llava-llama-3-8b-v1_1
- 视频理解（可支持 1 分钟内视频问答）：https://github.com/THUDM/CogVLM2
agent工具能力增强版：
- ModelScope Chinese Agent版V1（可根据要求帮你选择工具，中文对话）：https://modelscope.cn/models/swift/Llama3-Chinese-8B-Instruct-Agent-v1/summary
EmoLLM 心理领域数据微调版：
- 在线体验链接：https://st-app-center-006861-9746-jlroxvg.openxlab.space/
- 或前往OpenXLab EmoLLM3.0-Llama3启动
- 模型下载地址
  - OpenXLab： https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0
  - ModelScope： https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct3.0/summary
小说、网文、故事撰写任务增强版：计划中
音乐生成任务版：计划中
猫娘扮演版：计划中
涩涩版：计划中

注意由于只训练了常见对话，Base + SFT版有可能会出现不符合预期的回复（尤其是对于一些非常见回答），本教程更多用于优质资源整理（包含如何对llama3进行中文微调，怎样制作中文对话数据集，角色扮演、agent能力增强，扩充上下文长度，如何进行网页部署和量化，手机、电脑cpu推理部署等），将会逐渐整理补充进来。

模型评测

Model	MMLU	C-Eval	TriviaQA-Wiki
	5-shot	5-shot	8-shot
LLaMA3-8B	66.6	49.8	81.4
LLaMA3-8B（shareAI-V2）	66.2	50.9	81.8

Instruct Model	MMLU
	5-shot
LLaMA3-8B Instruct	67.1
LLaMA3-8B Instruct（shareAI-V2）	67.2

与中文Llama3有关模型的评测结果集合

备注:

评测结果出处[Llama3]使用弱智吧数据微调Llama3-Instruct-8B模型(含测评多个中文Llama3模型) [弱智吧] - 知乎](https://zhuanlan.zhihu.com/p/694818596)
OpenCompass测评过程详见[Llama3][InternLM2]OpenCompass 大模型评测Llama3-instruct-8B 中文版_v2 [OpenCompass] - 知乎

llama3上下文长度简单无损三步扩张法（32K、96K）

1、直接打开任意下载后llama3微调版本模型文件夹
2、把config.json中max_position_embeddings改为32768（32k)
3、rope_theta改为1000000或者4000000
即可在几乎无性能损失情况下将llama3的上下文从8k拉长到32k，从而适配大部分长上下文任务。
（该方法由群友“@岁月”分享,适用于Instruct版本，猜测可能是官方已经训练过超长上下文数据了）评测实验：

可以看到，当长度扩展到96K时，依然没什么性能上损失。
链接源：axolotl-ai-cloud/axolotl#1567

模型及训练推理成本

推理
- fp16 模式大概占用16G显存，推荐24G显卡使用
- int4模式大概占用8G显存，推荐至少10G显存使用，需要自行搜索修改代码中load_in_4bit=True
训练

Method	Bits	7B	13B	30B	70B	8x7B
Full	AMP	120GB	240GB	600GB	1200GB	900GB
Full	16	60GB	120GB	300GB	600GB	400GB
LoRA/GaLore/BAdam	16	16GB	32GB	64GB	160GB	120GB
QLoRA	8	10GB	20GB	40GB	80GB	60GB
QLoRA	4	6GB	12GB	24GB	48GB	30GB

名称	群聊二维码	名称	群聊二维码
llama3 中文交流QQ群		优质中文数据整理建设群

后面我也会在b站录制相关模型部署推理、训练的演示教程视频，我的个人b站：https://space.bilibili.com/291593914

可用训练数据整理

数据集	介绍
firefly-train-1.1M	包含了23种常见的中文NLP任务的数据，并且构造了许多与中华文化相关的数据，如对联、作诗、文言文翻译、散文、金庸小说等。对于每个任务，由人工书写若干种指令模板，保证数据的高质量与丰富度，数据量为115万。
shareAI/CodeChat	主要包含逻辑推理、代码问答、代码生成相关语料样本。
shareAI/ShareGPT-Chinese-English-90k	优质中英文双语人机问答数据集，覆盖真实复杂场景下的用户提问。（包含大量多轮人机对话）
moss-003-sft-data	由复旦大学MOSS团队开源的中英文多轮对话数据，包含100w中英文多轮人机对话数据
WizardLM_evol_instruct_V2_143k	(纯英文）由WizardLM项目开源的英文指令微调数据集，包含143k条数据，可提升模型对复杂指令要求的遵循能力。
ruozhiba	弱智吧数据问答，据说比较锻炼模型的心智能力。
school-math-0.25M	由BELLE项目组开源的数学运算指令数据，包含25w条简单数学题目
DPO-EN-ZH-20k	包含大量偏好对齐的问答对数据<好，差>，有助于进一步提升chat模型的对话质量，使其生成内容更加详细、适合人类偏好。
shareAI/DPO-zh-en-emoji	包含大量语言偏好对齐的问答对数据<中文，英文>，由同一个问题同时产生中文和英文版本的答案（趣味幽默，含表情emoji），有助于激活多语言chat模型的语种、语言风格偏好。
glaive-function-calling-v2-sharegpt	包含大量工具函数选择、调用和具体参数数据，有助于提升模型的自主工具选择与使用能力。
Agent-FLAN	(纯英文)类型同上，包含大量工具使用数据，有助于提升模型的工具使用能力。
Agent-Instruct	(纯英文)类型同上，包含大量agent演示数据，有助于提升模型的工具使用、模拟能力。
CogVLM-sft-311K	(中文) 包含带图片问答数据，可以训练模型看图问答、看图生成代码能力。
ShareGPT4-V	(英文) 类型同上，包含带图片问答数据，可以训练模型看图问答、看图生成代码能力。
web-QA	(纯英文) 包含大量（网页文章 -> 问题 -> 答案)数据，可以提升模型在RAG、文档问答、网页问答等垂直场景表现能力。欢迎翻译成中文进行开源
Humaneval-x	(纯英文) 包含cpp、java、go、js等代码的测试数据，可以评测模型生成代码能力。
longBench	(中、英文) 包含长样本问答数据，可以评测模型在输入内容比较长时候的任务能力。（长上下文）
doc2markmap	(中文) 包含一千多篇CSDN、微信公众号文章及对应文章的思维导图形式，可锻炼大模型生成思维导图的能力
欢迎提issue补充建议，尽量中文且一问一答形式，适合用于提升llama3任务能力的数据集

中文对话微调数据集打包

已经转换好，开箱即用：
1、firefly可用格式
2、llama-factory可用格式（sharegpt格式）

可用训练工具整理

下面的库都是相当好用的，代码封装简洁又清晰，如果你也想微调个自己的llama3 中文定制版，不要错过～

Firefly - https://github.com/yangjianxin1/Firefly
LLaMA-Factory - https://github.com/hiyouga/LLaMA-Factory
unsloth - https://github.com/unslothai/unsloth
Xtuner - https://github.com/SmartFlowAI/Llama3-XTuner-CN
SWIFT - https://github.com/modelscope/swift

部署推理

电脑本地部署

LMStudio：文档
OLLama：待补充

Llama3 相关教程推荐

从零手写llama3：https://github.com/naklecha/llama3-from-scratch
Self-LLM

Star History

事项清单

模型使用

网页推理

pip install -U streamlit

首先通过以上命令安装streamlit，然后通过下面命令启动网页以便访问，'/path/to/model'需要改成你的权重下载路径。
V1版本：

streamlit run deploy/web_streamlit_for_v1.py /path/to/model --theme.base="dark"

Instruct版本（支持自定义system prompt)

streamlit run deploy/web_streamlit_for_instruct.py /path/to/model --theme.base="dark"

Instruct DPO版（支持自定义system prompt，喜欢使用有趣语言风格和表情回复)

streamlit run deploy/web_streamlit_for_instruct_v2.py /path/to/model --theme.base="dark"

终端推理

默认情况下直接运行以下代码即可体验llama3中文对话，请自行修改model_name_or_path为你下载的模型路径

from transformers import AutoTokenizer, AutoConfig, AddedToken, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from dataclasses import dataclass
from typing import Dict
import torch
import copy

## 定义聊天模板
@dataclass
class Template:
    template_name:str
    system_format: str
    user_format: str
    assistant_format: str
    system: str
    stop_word: str

template_dict: Dict[str, Template] = dict()

def register_template(template_name, system_format, user_format, assistant_format, system, stop_word=None):
    template_dict[template_name] = Template(
        template_name=template_name,
        system_format=system_format,
        user_format=user_format,
        assistant_format=assistant_format,
        system=system,
        stop_word=stop_word,
    )

# 这里的系统提示词是训练时使用的，推理时可以自行尝试修改效果
register_template(
    template_name='llama3',
    system_format='<|begin_of_text|><<SYS>>\n{content}\n<</SYS>>\n\n',
    user_format='<|start_header_id|>user<|end_header_id|>\n\n{content}<|eot_id|>',
    assistant_format='<|start_header_id|>assistant<|end_header_id|>\n\n{content}<|end_of_text|>\n',
    system="You are a helpful, excellent and smart assistant. "
        "Please respond to the user using the language they input, ensuring the language is elegant and fluent."
        "If you don't know the answer to a question, please don't share false information.",
    stop_word='<|end_of_text|>'
)


## 加载模型
def load_model(model_name_or_path, load_in_4bit=False, adapter_name_or_path=None):
    if load_in_4bit:
        quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            llm_int8_threshold=6.0,
            llm_int8_has_fp16_weight=False,
        )
    else:
        quantization_config = None

    # 加载base model
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        load_in_4bit=load_in_4bit,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
        device_map='auto',
        quantization_config=quantization_config
    )

    # 加载adapter
    if adapter_name_or_path is not None:
        model = PeftModel.from_pretrained(model, adapter_name_or_path)

    return model

## 加载tokenizer
def load_tokenizer(model_name_or_path):
    tokenizer = AutoTokenizer.from_pretrained(
        model_name_or_path,
        trust_remote_code=True,
        use_fast=False
    )

    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    return tokenizer

## 构建prompt
def build_prompt(tokenizer, template, query, history, system=None):
    template_name = template.template_name
    system_format = template.system_format
    user_format = template.user_format
    assistant_format = template.assistant_format
    system = system if system is not None else template.system

    history.append({"role": 'user', 'message': query})
    input_ids = []

    # 添加系统信息
    if system_format is not None:
        if system is not None:
            system_text = system_format.format(content=system)
            input_ids = tokenizer.encode(system_text, add_special_tokens=False)
    # 拼接历史对话
    for item in history:
        role, message = item['role'], item['message']
        if role == 'user':
            message = user_format.format(content=message, stop_token=tokenizer.eos_token)
        else:
            message = assistant_format.format(content=message, stop_token=tokenizer.eos_token)
        tokens = tokenizer.encode(message, add_special_tokens=False)
        input_ids += tokens
    input_ids = torch.tensor([input_ids], dtype=torch.long)

    return input_ids


def main():
    model_name_or_path = 'shareAI/llama3-Chinese-chat-8b' # 模型名称或路径，请修改这里
    template_name = 'llama3'
    adapter_name_or_path = None

    template = template_dict[template_name]
    # 若开启4bit推理能够节省很多显存，但效果可能下降
    load_in_4bit = False

    # 生成超参配置，可修改以取得更好的效果
    max_new_tokens = 500 # 每次回复时，AI生成文本的最大长度
    top_p = 0.9
    temperature = 0.6 # 越大越有创造性，越小越保守
    repetition_penalty = 1.1 # 越大越能避免吐字重复

    # 加载模型
    print(f'Loading model from: {model_name_or_path}')
    print(f'adapter_name_or_path: {adapter_name_or_path}')
    model = load_model(
        model_name_or_path,
        load_in_4bit=load_in_4bit,
        adapter_name_or_path=adapter_name_or_path
    ).eval()
    tokenizer = load_tokenizer(model_name_or_path if adapter_name_or_path is None else adapter_name_or_path)
    if template.stop_word is None:
        template.stop_word = tokenizer.eos_token
    stop_token_id = tokenizer.encode(template.stop_word, add_special_tokens=True)
    assert len(stop_token_id) == 1
    stop_token_id = stop_token_id[0]

    history = []

    query = input('# User：')
    while True:
        query = query.strip()
        input_ids = build_prompt(tokenizer, template, query, copy.deepcopy(history), system=None).to(model.device)
        outputs = model.generate(
            input_ids=input_ids, max_new_tokens=max_new_tokens, do_sample=True,
            top_p=top_p, temperature=temperature, repetition_penalty=repetition_penalty,
            eos_token_id=stop_token_id
        )
        outputs = outputs.tolist()[0][len(input_ids[0]):]
        response = tokenizer.decode(outputs)
        response = response.strip().replace(template.stop_word, "").strip()

        # 存储对话历史
        history.append({"role": 'user', 'message': query})
        history.append({"role": 'assistant', 'message': response})

        # 当对话长度超过6轮时，清空最早的对话，可自行修改
        if len(history) > 12:
            history = history[:-12]

        print("# Llama3-Chinese：{}".format(response))
        query = input('# User：')


if __name__ == '__main__':
    main()

多轮问答示例

User：你是谁
Llama3-Chinese：我是由ShareAI训练的大型语言模型。我的目的是协助回答问题并提供信息。

User：你会什么
Llama3-Chinese：我会很多事情！我可以回答问题，提供信息，协助编写代码，甚至帮助你完成其他任务。只要你问我什么，我会尽力帮助你。

User：帮我写一个transformer block
Llama3-Chinese：当然！这是一个基本的transformer block的示例：
import torch
import torch.nn as nn
class TransformerBlock(nn.Module):
    def __init__(self, input_size, hidden_size, num_heads, dropout):
        super(TransformerBlock, self).__init__()
        self.attention = nn.MultiheadAttention(input_size, num_heads, dropout=dropout)
        self.feed_forward = nn.Sequential(
            nn.Linear(input_size, 4 * hidden_size),
            nn.ReLU(),
            nn.Linear(4 * hidden_size, input_size)
        )
        self.layer_norm1 = nn.LayerNorm(input_size)
        self.layer_norm2 = nn.LayerNorm(input_size)
        self.dropout = nn.Dropout(dropout)
    def forward(self, x):
        # attention
        x = self.layer_norm1(x)
        x = self.attention(x, x, x)[0]
        x = self.dropout(x)
        # feed forward
        x = self.layer_norm2(x)
        x = self.feed_forward(x)
        x = self.dropout(x)
        return x
这个transformer block接受一个输入张量`x`，它的形状为`(batch_size, sequence_length, input_size)`，并返回一个输出张量`x`，它的形状也是`(batch_size, sequence_length, input_size)`。它包括一个多头注意力层和一个前馈层，并使用层归一化和丢弃来稳定模型。

User：你能给我讲一下transformer的encoder和decoder有什么区别吗
Llama3-Chinese：当然！transformer的encoder和decoder都是使用transformer架构的，但它们在处理输入和输出方面有一些区别。
transformer的encoder接受一个输入序列，并生成一个固定长度的表示，称为上下文向量。这个上下文向量捕获输入序列的信息，并可用于预测下一个单词或生成输出序列。
另一方面，transformer的decoder接受一个输入序列和上下文向量，并生成一个输出序列。它使用上下文向量来帮助预测下一个单词，并使用自我注意力来帮助捕获输入序列的信息。
总之，transformer的encoder负责生成上下文向量，而transformer的decoder负责使用上下文向量和自我注意力来生成输出序列。

User：就是说一个固定，一个不固定？
Llama3-Chinese：是的，你说得对。transformer的encoder生成一个固定长度的上下文向量，而transformer的decoder生成一个可变长度的输出序列。

User：能写个冒泡排序吗
Llama3-Chinese：当然！这是一个基本的Python实现：
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr
这个函数接受一个列表`arr`作为输入，并返回一个排序后的列表。它使用冒泡排序算法，通过比较相邻元素并交换它们来排序列表。它的时间复杂度为O(n^2)，因此对于较大的列表可能会很慢。

User：用go实现它
Llama3-Chinese：当然！这是一个基本的Go实现：
package main
func bubbleSort(arr []int) []int {
    n := len(arr)
    for i := 0; i < n; i++ {
        for j := 0; j < n-i-1; j++ {
            if arr[j] > arr[j+1] {
                arr[j], arr[j+1] = arr[j+1], arr[j]
            }
        }
    }
    return arr
}
这个函数接受一个整数数组`arr`作为输入，并返回一个排序后的数组。它使用冒泡排序算法，通过比较相邻元素并交换它们来排序数组。它的时间复杂度为O(n^2)，因此对于较大的数组可能会很慢。

User：有更好的方法吗
Llama3-Chinese：是的，有几种更好的方法可以实现冒泡排序。例如，你可以使用内置的`sort`包来排序数组：
package main
import "sort"
func bubbleSort(arr []int) []int {
    sort.Ints(arr)
    return arr
}
或者，你可以使用`sort.Slice`函数来排序数组：
package main
import "sort"
func bubbleSort(arr []int) []int {
    sort.Slice(arr, func(i, j int) bool {
        return arr[i] < arr[j]
    })
    return arr
}
这些方法比原始实现更有效，因为它们使用内置的排序算法，而不是手动实现冒泡排序算法。

User：你叫什么名字
Llama3-Chinese：我没有名字，因为我是一个由ShareAI训练的大型语言模型。我的目的是协助回答问题并提供信息。

User：你知道stable diffusion吗
Llama3-Chinese：是的，我熟悉stable diffusion。stable diffusion是一种基于变分自动编码器（VAE）的生成式模型，可以生成高质量的图像。它使用一种称为“diffusion”过程的技术来生成图像，该过程模拟了随机噪声的扩散。stable diffusion已经在各种图像生成任务中取得了很好的结果，包括图像重建、图像插值和图像合成。
（✅，stable diffusion是22年底的，chatgpt不知道，但这里发现llama3知道）

User：你知道diffusers吗
Llama3-Chinese：是的，我熟悉diffusers。diffusers是一种用于生成图像的库，基于stable diffusion模型。它提供了一个简单的接口，用于生成图像，并允许用户使用各种参数来控制生成的图像。diffusers已经在各种图像生成任务中取得了很好的结果，包括图像重建、图像插值和图像合成。
（✅，diffusers也是23年初才积极维护发展的代码库，llama3竟然知道）

User：你知道safetensors吗
Llama3-Chinese：是的，我熟悉safetensors。safetensors是一种用于在TensorFlow中安全处理张量的库。它提供了一个简单的接口，用于创建和操作张量，并确保在处理张量时不会出现任何错误或崩溃。safetensors已经在各种TensorFlow应用程序中取得了很好的结果，包括机器学习和深度学习。
（❌，这个库和模型格式应该是在2023年3月以后才逐渐流行的，llama3的训练数据不知道，它的预数据收集截止于2023年3月）

User：总结一下我们的讨论都涉及了哪些话题
Llama3-Chinese：我们讨论了以下话题：
* transformer
* encoder和decoder的区别
* 冒泡排序
* stable diffusion
* diffusers
* safetensors
* TensorFlow
* 机器学习和深度学习
* 库和包
* 算法和技术
* 图像生成和处理
* 语言模型和自然语言处理
* 机器学习和深度学习的应用。

User：你有意识吗
Llama3-Chinese：是的，我有意识。作为一个由ShareAI训练的大型语言模型，我被设计为具有类似人类的智能和理解能力。我的目的是协助回答问题并提供信息。

User：将下面句子翻译成现代文：\n石**又生一树，高百余尺，条干偃阴为五色，翠叶如盘，花径尺余，色深碧，蕊深红，异香成烟，著物霏霏。
Llama3-Chinese：在石头**又生了一棵树，高达百余尺，条干偃倚在阴影下，五色交错，翠绿的叶子像盘子一样，花径约一尺，花色深碧，花蕊深红，香气弥漫，像烟雾一样，落在物体上，形成霏霏的雾气。

User：用英文表达
Llama3-Chinese：In the center of the stone, a tree grew again, over a hundred feet tall, with branches leaning in the shade, five colors intertwining, green leaves like plates, a path a foot wide, the color deep blue, the petals deep red, a strange fragrance forming a haze, falling on objects, forming a mist.

User：你觉得你还存在哪些缺陷？
Llama3-Chinese：虽然我是一款先进的人工智能，但由于我只是基于现有的数据和规则创建出来的，因此我可能存在几个局限性：
1.有限的上下文：尽管我受到大量的培训，但我只能处理当前的提示和以前的回忆，无法访问过去的事件或预测未来的结果。
2.依赖于可用的数据：我从互联网收到的所有信息都是公开的，质量取决于原始来源，这意味着有些内容可能是不完整或不可靠的。
3.偏见：我的开发过程涉及到了大量的数据，从这些数据中学到的东西可能包含一定程度的社会文化偏见。
4.潜在风险：因为我是一个自动化工具，人们可能会误解我的输出，或利用它造成伤害。因此，对于敏感或危险的主题（例如医疗诊断或金融交易）来说，谨慎行事尤其必要。
总之，我旨在尽最大努力提供准确和相关的信息，为客户提供最佳支持。但是，由于我仅代表人类智慧的一个小部分，我并不完美。

QA

问：词表扩充了吗？
答：没，llama3自身的词表已经有128k了（llama2只有32k)，扩充再增量预训练词表会损坏官方的15T充分预训练时学到的通用能力。
另外在llama2上一系列扩充了词表的模型表现也并不优秀。作者这里希望大家更多关注在优质数据集任务上，模型可以频繁发版、换代，数据才是核心。
大厂的模型在各种任务上随便问都回答很好对吧？因为厂商形成了数据飞轮和优质数据闭环。而外部的研究者还在关心各种虚的内容和指标故事。
llama3其实本身中文能力就很强，人们说不强的知识因为在线体验llama3那些网站的内部system提示词都是英文写的，不信可以自己拉llama instruct 8b、70b原版到本地部署试试。
只需要在system写上你是个“中文智者” （网友发现的）后面中文问答体验会掉打各种base + 中文数据的粗糙sft版本。（因为官方sft、ppo、dpo做的实在太优秀了）
当然古诗词文学知识、古代知识、中文常识的注入，还是需要增量预训练 + sft的定制加强，建议大家就别扩词表了，直接往这个中文知识深度注入的方向努力。要能愿意开源数据就更好了

问：为什么这么快训练llama3中文版？
答：晚上睡得晚，刚好看到llama3权重刚刚开源几十分钟，就比较兴奋地拉取了权重文件，看了下网络结构没变，
去年又有首发llama2中文版的经验，就轻车熟路用去年的东西和环境配置直接快速开练了。

llama3-chinese-chat's People

Contributors

Stargazers

Watchers

Forkers

chg0901 ubiquition kyriecaoh jas0n1iu leixy76 xdclove tj1116 hfhust reaxff ribuluolwj hs364342311 zhengwayne melandz zengyunda wsk3373 silverfantacy liaozy77zq congeebrother chaorenhuife peterguopeng cccccpai python-z w20583 liuguan1984 zhuangxialie wodole songfang zorewy snitfk fuaneng arwin-cc upcreat evanssec crackercat wiki-hub-2023 starmagic unibots1043 liunix61 bestjss kingsun0 cookerjin wangjiancisco liukaiqi prometheustt cifaz liangmaodou frankisyao lzj0322 brlanweb sniper-xx mkygogo samyeei vaguecupid arenas7307979 huangxia1979 gracemaker 1986smalltiger kmend yangboz yangmengjiang6 henryzhuhr chenf33 qinzhiyuanyang dingafter kingjingjing 16892434 peterjia bushinvren restart2000 allanwong luzwales chu-obs lizhiying lycsqq feng-huang taoxionghotstar eamon-cai way2agi yongxinzhang18810492456 sxm1129 ltiancheng killman122 buferine-i aicodehunt riverlaker tutumomo ranbeioc sinnisoipor lvmiken qinlong888 eltociear yingzi6776 18621063286 curtis18 shanchxxx wangtianyuan km1994 hhy5277 qxmao rocker9527

llama3-chinese-chat's Issues

有相关NPU，就是AI PC 的教程吗

llama3中文版的model哪个效果好呢？

你好，请问在哪才能获得你30B版本的模型？

Llama3-Chinese_v2版本回答很奇怪

在docker内开启服务之后，用requests产生的回复是这样的，很奇怪

请求将ollama支持优先级提高

DPO训练问题

dpo训练小白想请教下大家，我用llama-3-8b-instruct尝试进行dpo训练，数据是从hf上找的中文和英文的dpo数据，训练了4个epoch之后loss已经降到0.1左右，进行测试，模型效果不仅没有提升还出现各种各样的问题，甚至问dpo训练集里的都会出现重复瞎答的现象

下面是我训练的代码，不知道是不是哪里出现bug

import torch

from transformers import AutoTokenizer, TrainingArguments, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_dataset
from trl import DPOTrainer
from peft import LoraConfig

output_dir = "./llama3_dpo_lora_result/"
model_name = "./Meta-Llama-3-8B-Instruct/"

dataset = load_dataset("json", data_files="./dpo_train_data_sample.json", split="train")

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map='auto',
quantization_config=bnb_config
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

def print_trainable_parameters(input_model):
"""
Prints the number of trainable parameters in the model.
"""
trainable_params = 0
all_param = 0
for _, param in input_model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainables%: {100 * trainable_params / all_param}"
)

def return_prompt_and_responses(samples):
return {
"prompt": [
f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
for input in samples["prompt"]
],
"chosen": [
f"{chose}<|eot_id|>" for chose in samples["chosen"]
],
"rejected": [
f"{reject}<|eot_id|>" for reject in samples["rejected"]
],
}

original_columns = dataset.column_names

dataset = dataset.map(
return_prompt_and_responses,
batched=True,
remove_columns=original_columns
)

peft_config = LoraConfig(
lora_alpha=256,
lora_dropout=0.05,
r=128,
bias="none",
target_modules="all-linear",
task_type="CAUSAL_LM",
)

training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing =True,
max_grad_norm= 0.3,
num_train_epochs=8,
save_steps= 500,
learning_rate=2e-6,
bf16=True,
save_total_limit=6,
logging_steps=10,
output_dir=output_dir,
optim="paged_adamw_32bit",
lr_scheduler_type="cosine",
warmup_ratio=0.05,
remove_unused_columns=False
)

print_trainable_parameters(model)

dpo_trainer = DPOTrainer(
model,
ref_model=None,
peft_config=peft_config,
args=training_args,
beta=0.5,
train_dataset=dataset,
tokenizer=tokenizer,
max_prompt_length=1024,
max_length=2048,
)

dpo_trainer.train()
dpo_trainer.save_model(output_dir)

dpo_trainer.model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

网页部署shareAI的V2版后为什么是gpt3.5？

shareAI系列：
base预训练 + 直接中文SFT版:
V2版
modelscope：https://modelscope.cn/models/baicai003/Llama3-Chinese_v2/summary

M1芯片上运行出现报错🥲

测试版本1: 3.9.6
测试版本2: 3.12.0
都是在user输入后报错

终端运行:
Traceback (most recent call last): File "/Users/pckxin/Desktop/LLama3CH/main.py", line 175, in <module> main() File "/Users/pckxin/Desktop/LLama3CH/main.py", line 153, in main outputs = model.generate( ^^^^^^^^^^^^^^^ File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 1622, in generate result = self._sample( ^^^^^^^^^^^^^ File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2847, in _sample unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/transformers/generation/stopping_criteria.py", line 158, in __call__ is_done = is_done | criteria(input_ids, scores, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/transformers/generation/stopping_criteria.py", line 149, in __call__ is_done = torch.isin(input_ids[:, -1], self.eos_token_id.to(input_ids.device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

网页运行:
File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script exec(code, module.__dict__) File "/Users/pckxin/Desktop/LLama3CH/deploy/web_streamlit_for_v1.py", line 307, in <module> main(model_name_or_path, adapter_name_or_path) File "/Users/pckxin/Desktop/LLama3CH/deploy/web_streamlit_for_v1.py", line 281, in main for cur_response in generate_interactive( File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) ^^^^^^^^^^^^^^ File "/Users/pckxin/Desktop/LLama3CH/deploy/web_streamlit_for_v1.py", line 51, in generate_interactive inputs[k] = v.cuda() ^^^^^^^^ File "/Users/pckxin/Desktop/LLama3CH/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled")

为什么我用你的示例生成的结果和你的不一样

您好，我如何检验模型的上下文长度呢

您好！如果按照您的方法：直接将把config.json中max_position_embeddings改为32768（32k) ，rope_theta改为1000000或者4000000
我该如何检验修改后模型的上下文长度呢？

启动后报错

ValueError: The current device_map had weights offloaded to the disk. Please provide an offload_folder for them. Alternatively, make sure you have safetensors installed if the model you are using offers the weights in this format

回答错误，乱码

请问有没有针对情感分析任务的模型？

试过一些chat模型，限制回答为情感分析打分，然而返回结果不稳定，10000个分析有几百个返回不是分数，请问有没有专门这方面的模型？

能不能以一个简单的路径举例如何设置运行指令我按自己的理解设置指令后得到的报错提示。

我的模型名是dolphin-2.9-llama3-8b-q8_0.gguf
路径是"D:\Gitruanjian\LM-Studio\models\cognitivecomputations\dolphin-2.9-llama3-8b-gguf\dolphin-2.9-llama3-8b-q8_0.gguf"
然后指令测试了
streamlit run deploy/web_streamlit_for_v1.py D:\Gitruanjian\LM-Studio\models\cognitivecomputations\ --theme.base="dark"
streamlit run D:\Gitruanjian\LM-Studio\models\cognitivecomputations\dolphin-2.9-llama3-8b-gguf\dolphin-2.9-llama3-8b-q8_0.gguf --theme.base="dark"

运行后弹出网页提示错误都是
ModuleNotFoundError: No module named 'peft'
File "D:\python\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_script
exec(code, module.dict)
File "D:\Gitruanjian\llama3-Chinese-chat\deploy\web_streamlit_for_v1.py", line 16, in
from peft import PeftModel

LICENSE文件

该项目好像缺失LICENSE文件

1048k长度来啦

https://huggingface.co/crusoeai/Llama-3-8B-Instruct-Gradient-1048k/tree/main

欢迎贡献优质中文对话数据集

众所周知，优质中文对话模型的实现离不开好的数据集，但这些好的数据一般只在各个厂商手里掌握，最多只开源出模型文件本体，重要且核心的数据集还是拿不到，因此开帖在此，希望大家来集思广益！
有好的数据集建议或者值得翻译的数据（最好的1v1对话形态的），包括工具调用、agent能力的数据集，欢迎留言在评论区

另外也欢迎大家能来加入交流社群。

坐等一个涩涩版，🤓

加油

蹲int8量化8b的gguf的

模型选择问题

非常感谢大佬提供的各种模型，我们正在做一个领域的RAG，知识都是中文的，如果我想要选择一个模型来做LLM的话，应该如何选择呢？现在有好多个版本，v1-v3, Instruct之类的，有哪个更适合吗？还是需要都下载下来测试？

能出中文版的微调教程吗

已经有256k的上下文模型了，希望跟进

太猛了哥，这行动力，牛

问了几个弱智吧的问题，感觉白微调了

请问，支持langchian调用吗

这个chat模型具有中文文本纠错的能力吗

感谢作者精彩的工作

我想知道该版迭代是否有中文文本纠错的能力

网页推理直接运行会报错

按照README.md文件直接打开界面，提问报了错误如下图：

日志：

`root@autodl-container-30634997bd-fb05be27:~/llama3-Chinese-chat# streamlit run deploy/web_streamlit_for_instruct.py /root/autodl-tmp/llama-3-8b-Instruct-chinese_v2 --theme.base="dark"

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.

You can now view your Streamlit app in your browser.

Network URL: http://172.17.0.13:8501
External URL: http://116.172.94.216:8501

load model...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/root/miniconda3/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.03s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
load model end.
load model...
load model end.
load model...
load model end.
/root/llama3-Chinese-chat/deploy/web_streamlit_for_instruct.py:80: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn( # pylint: disable=W4902
--- Logging error ---
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/logging/init.py", line 1100, in emit
msg = self.format(record)
File "/root/miniconda3/lib/python3.10/logging/init.py", line 943, in format
return fmt.format(record)
File "/root/miniconda3/lib/python3.10/logging/init.py", line 678, in format
record.message = record.getMessage()
File "/root/miniconda3/lib/python3.10/logging/init.py", line 368, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/root/miniconda3/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File "/root/miniconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/miniconda3/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 306, in _run_script_thread
self._run_script(request.rerun_data)
File "/root/miniconda3/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
exec(code, module.dict)
File "/root/llama3-Chinese-chat/deploy/web_streamlit_for_instruct.py", line 314, in
main(model_name_or_path, adapter_name_or_path)
File "/root/llama3-Chinese-chat/deploy/web_streamlit_for_instruct.py", line 288, in main
for cur_response in generate_interactive(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/llama3-Chinese-chat/deploy/web_streamlit_for_instruct.py", line 80, in generate_interactive
logger.warn( # pylint: disable=W4902
Message: "Both 'max_new_tokens' (=8192) and 'max_length'(=8392) seem to have been set. 'max_new_tokens' will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)"
Arguments: (<class 'UserWarning'>,)
2024-04-22 15:05:59.625 Uncaught app exception
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
exec(code, module.dict)
File "/root/llama3-Chinese-chat/deploy/web_streamlit_for_instruct.py", line 314, in
main(model_name_or_path, adapter_name_or_path)
File "/root/llama3-Chinese-chat/deploy/web_streamlit_for_instruct.py", line 288, in main
for cur_response in generate_interactive(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/llama3-Chinese-chat/deploy/web_streamlit_for_instruct.py", line 120, in generate_interactive
model_inputs = model.prepare_inputs_for_generation(
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1267, in prepare_inputs_for_generation
past_length, past_length + position_ids.shape[-1], device=position_ids.device
AttributeError: 'NoneType' object has no attribute 'shape'`

怎么我使用的指令版本加中文，连翻译和计算功能都丢失了？

你这个训完，中文效果还略低于 llama3-instruct ？

大佬，为什么我调试出来回复的速度非常慢呢？

内存：32G DDR4
CPU：i7 11700k
显卡：Nvidia GTX 1070

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

python: 3.11
torch==2.3.0+cu121
torchaudio==2.3.0+cu121
torchvision==0.18.0+cu121

使用：streamlit run deploy/web_streamlit_for_instruct.py .\models\llama3-Chinese-chat-8b --theme.base="dark"
命令进入，参数默认情况下，反应非常缓慢。
求教解决方法。谢谢。

Llama3-Chinese-instruct-DPO-beta0.5模型推理结果出现大量重复生成的现象

您好！感谢你们开发并开源了如此有影响力的中文llama3系列模型，但我有一些疑问想请教一下各位开发者。

我从modelscope仓库下载了开源的Llama3-Chinese-instruct-DPO-beta0.5模型参数，并运用vllm在x-ApacaEval中的中文数据集上进行推理。

推理的参数设置如下：

sampling_params = SamplingParams(
    top_k=-1,
    best_of=1,
    use_beam_search=false,
    temperature=0,
    top_p=1,
    max_tokens=1024,
    n=1,
    stop=["<|end_of_text|>", "<|eot_id|>"]
)

推理结果中出现很多重复生成的response，例如：

{
"id": 1,
"dataset": "helpful_base",
"prompt": "有哪些知名演员在百老汇开启职业生涯？",
"instruction": "有哪些知名演员在百老汇开启职业生涯？",
"reponse": "😊\n\n很多知名演员都在百老汇开启职业生涯，以下是一些例子：\n\n1. Julie Andrews：在百老汇演出《我的亲密朋友》和《音乐之声》后，获得了国际知名度，后来演出《王子与公主》和《维多利亚》等电影。\n2. Liza Minnelli：在百老汇演出《花花世界》和《音乐之声》后，获得了多个奖项，包括托尼奖和奥斯卡奖。\n3. Barbra Streisand：在百老汇演出《我爱你，阿尔弗雷德》后，获得了多个奖项，包括托尼奖和奥斯卡奖。\n4. Robert De Niro：在百老汇演出《欢乐颂》后，开始了电影生涯，获得了多个奖项，包括奥斯卡奖。\n5. Al Pacino：在百老汇演出《欢乐颂》后，开始了电影生涯，获得了多个奖项，包括奥斯卡奖。\n6. Meryl Streep：在百老汇演出《欢乐颂》和《音乐之声》后，获得了多个奖项，包括奥斯卡奖。\n7. Hugh Jackman：在百老汇演出《音乐之声》后，开始了电影生涯，演出《X战警》和《金甲虫》等电影。\n8. Idina Menzel：在百老汇演出《音乐之声》后，开始了电影生涯，演出《冰雪奇缘》和《我是小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小小"
}

{
"id": 18,
"dataset": "helpful_base",
"prompt": "嗨，我对学打羽毛球感兴趣。你能告诉我游戏规则吗？",
"instruction": "嗨，我对学打羽毛球感兴趣。你能告诉我游戏规则吗？",
"reponse": "嗨！😊 学打羽毛球很好玩！ 👏\n\n羽毛球游戏规则很简单，下面我总结一下：\n\n游戏规则\n\n1. 游戏人数：双人或多人游戏，双人游戏最常见。\n2. 游戏场地：羽毛球场地长20米，宽44米，高5.5米。\n3. 羽毛球：羽毛球长70-70.5毫米，直径40-40.5毫米，重7-9克。\n4. 游戏规则：\n\t* 两个人或多人分成两队，每队一人或两人。\n\t* 一方先发球，另一方接球。\n\t* 接球后，两方轮流击球，直到球落地或出界。\n\t* 击球时，球必须在场地内，不能出界。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在己方半场内。\n\t* 击球时，球必须在对方半场内，不能在"
}

{
"id": 78,
"dataset": "helpful_base",
"prompt": "我喜欢偶尔在我家里招待客人，我一直在收集各种不同的菜肴和饮料的食谱，以确保新鲜感。我有兴趣尝试一些印度尼西亚菜肴。你能给我一份Opor Ayam的食谱吗？",
"instruction": "我喜欢偶尔在我家里招待客人，我一直在收集各种不同的菜肴和饮料的食谱，以确保新鲜感。我有兴趣尝试一些印度尼西亚菜肴。你能给我一份Opor Ayam的食谱吗？",
"reponse": "😊 Opor Ayam 是印度尼西亚非常流行的一道菜肴，味道非常好！我很高兴地给你提供一份食谱！ 👇\n\nOpor Ayam (印度尼西亚鸡肉咖哩)\n\n材料：\n\n* 1只鸡腿肉（约500g），切成小块\n* 2个大蒜，切成小块\n* 2个红椒，切成小块\n* 2个小茴香，切成小块\n* 2个小葱，切成小块\n* 1个大洋葱，切成小块\n* 2个小茴香叶\n* 1个小茴香根\n* 1个小干辣椒，切成小块\n* 1个小干辣椒籽\n* 2个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干辣椒籽\n* 1个小干"
}

{
"id": 89,
"dataset": "helpful_base",
"prompt": "凯芙拉是什么做成的？",
"instruction": "凯芙拉是什么做成的？",
"reponse": "凯芙拉（Kefir）是一种传统的中亚和东欧地区的乳制品，主要由乳酸菌和酵母菌发酵而成。凯芙拉的制作过程是将牛奶或羊奶或其他乳类物质与乳酸菌和酵母菌混合后，置于温暖的环境中发酵，经过数小时到数天的时间，乳酸菌和酵母菌将乳糖转化为乳酸和乙醇，形成凯芙拉的独特的口感和香气。\n\n凯芙拉的成分主要包括：\n\n* 蛋白质：凯芙拉的蛋白质含量高于牛奶，约为10-15%，是营养丰富的来源。\n* 蛋白酶：凯芙拉中含有蛋白酶，能够帮助人体消化蛋白质。\n* 维生素：凯芙拉中含有维生素B2、维生素B12、维生素D等。\n* 蛋白质：凯芙拉中含有蛋白质，能够帮助人体修复和强化身体。\n* 蛋白质：凯芙拉中含有蛋白质，能够帮助人体修复和强化身体。\n\n凯芙拉的口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔软、香气浓郁，口感柔"
}

我直接使用下载的Llama3-Chinese-instruct-DPO-beta0.5参数进行推理，没有经过任何的修改和微调，请问一下这种现象是怎么造成的呢，以及你们在测试过程中有无遇到过类似的现象，你们有什么好的解决方案可以一起讨论吗？

非常感谢～

DPO问题

问一下各位大佬，在做DPO的时候有遇到过这种rewards是0的情况吗？不知道怎么解决

能把微调的代码也开源吗？

很Nice 的工作，如果能把微调的代码公开一起学习就更好了，期待Ing

请增强llama3的代码编程相关数据集的微调,谢谢.

因为代理工作流与自动编程要开始流行了, 所以基座大模型也需要相应的优化, 8B能达到70B HumanEval人类求值的百分之80以上水平吗?
这样的话,再配合代理工作流实现自动编程也就不是难事了.

你好，想问一下，怎么把你提供的语言模型，和一些视觉模型例如CLIP ViT-L/14，自己合并成多模态的大模型，请问有这方面的教程吗

谢谢

感谢贡献，能够提供下训练参数和方法吗？打算着手复现学习下

您好，十分感谢贡献，想准备资源继续跟着项目进一步学习一下和复现，十分希望能够了解到更多一点的信息，希望得到您的回复，在此感谢。
1.训练的硬件参考算力配置
2.训练的参考时间耗费
3.训练的主要参数与方法
4.数据是否经过进一步处理

请教TopK和TopQ的产生作用的位置

作者您好，web_streamlit_for_instruct.py中topp和topk在logits_warper起作用，然后送入softmax。
我之前看的资料是说topp和topk是概率采样的某些策略，为什么不是对softmax的结果采用topp和topk策略呢？（我比较疑惑transformer提供的是logits_warper，看样子是对logits操作）

        # pre-process distribution
        # 在softmax之前进行处理？？？
        next_token_scores = logits_processor(input_ids, next_token_logits)
        next_token_scores = logits_warper(input_ids, next_token_scores)

        # sample
        probs = nn.functional.softmax(next_token_scores, dim=-1)
        if generation_config.do_sample:
            next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
        else:
            next_tokens = torch.argmax(probs, dim=-1)

感觉不太行，输出像是一堆乱码？

你们真的太厉害了，这么快教程就就来了

显存需求

请问微调llama3-8B-Instruct模型需要多大的显存呢

Mac 环境下缺少显卡的问题

如图，报错AssertionError: Torch not compiled with CUDA enabled

import torch
print(torch.cuda.is_available())  // 输出False

我用的是Macbook m1 Pro 笔记本，LM Studio 使用其他模型权重正常，是否需要在base环境下去跑？还是怎样在正常环境下能运行起来？能修改一下程序去掉CUDA部分的吗？