llamafamily / llama-chinese Goto Github PK

View Code? Open in Web Editor NEW

13.7K 13.7K 1.2K 19.46 MB

Llama中文社区，Llama3在线体验和微调模型已开放，实时汇总最新Llama3学习资料，已将所有代码更新适配Llama3，构建最好的中文Llama大模型，完全开源可商用

Home Page: https://llama.family

Python 94.48% Shell 3.61% Dockerfile 0.62% Jupyter Notebook 1.29%

finetune-llm llama llama3 llm pretraining

llama-chinese's People

Contributors

Stargazers

Watchers

Forkers

cvcuiwei ylsir rayrtfr joshill staccats liucr lycokie tutuna 0x8235 mistyr0se nicbair adambear tttonytan hannah-ymc wuzerun-888 iffy-oo yqnt418 leing2021 henryhesz zhougx88 gitcodev shuanlotus sxm1129 icecube0-0 600ml servucn sylar003 roclee81 geekcheng 1156721874 skyrookieyu lostmanwang shunligo xiyuan-code revavo flyforfreedom lylyone kaye0110 chrisyang2017 linhong00316 tim-taoxq gooqi zomens sevenchao iamkomen itsharex justloveben kurtding feihuamantian so349mng huzp123 fdkl123 tiantianlecheng lidachuan211 tony163163 williamfangca 1530426574 yangguangcccaa saxonzhang2 souloki mrdaoyuan zhanfish taurusduan cryptovbl chenghongyun tufubaba eltociear liolio202070 sojjuu jackoelv ole-e-ole masemxiao maga315 jbluv techthiyanes cryptoman0463636 hyejdy hongdangshao fiyo nuoan ouyangchucai yuanmeng1120 unixcrh sunfj totoro-li erickong2012 web3creator stormdragongardin junking1 acondess liyandan fengzhongye nycleaner mustangcoder vvgjlshn5xk liunix61 maoyikun nanchengjiatu dannyshcn liuhao-0666

llama-chinese's Issues

数据使用分布情况

（1）请问一下几种类型的数据集之前类型差异巨大，如何去保证训练的模型泛化性不被降低？
（2）这边针对README中的各类数据集预处理方法可以详细说明一下吗？
感谢！

Llama2-Chinese 13B以及其他模型在CEVAL表现怎麽样

有没有CEVAL的结果了?

请问可以提供一下huggingface上Llama2-Chinese-13b-Chat模型的sha1吗

缺少config.json文件

does not appear to have a file named config.json.好像是缺少config.json文件。

RuntimeError: The expanded size of the tensor (768) must match the existing size (2048) at non-singleton dimension 0.

哪位大神知道，在sagemaker上部署70b-chat-hf，报下面的错误，是什么问题？

RuntimeError: The expanded size of the tensor (768) must match the existing size (2048) at non-singleton dimension 0.  Target sizes: [768, 8192].  Tensor sizes: [2048, 8192]
 #033[2m#033[3mrank#033[0m#033[2m=#033[0m3#033[0m
#033[2m2023-07-21T07:52:53.999740Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 2 failed to start:
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
    server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 237, in get_model
    return FlashLlamaSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 185, in __init__
    self.load_weights(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 289, in load_weights
    module._parameters[param_name][: tensor.shape[0]] = tensor
RuntimeError: The expanded size of the tensor (768) must match the existing size (2048) at non-singleton dimension 0.  Target sizes: [768, 8192].  Tensor sizes: [2048, 8192]
#033[2m2023-07-21T07:52:53.999793Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards
#033[2m2023-07-21T07:52:54.729163Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 3 terminated
Error: ShardCannotStart

请问怎么将模型与langchain-ChatGLM整合

请问怎么将代码放进langchain-ChatGLM里面呢，配置到model_config.py文件里面启动，目前是不兼容的

module 'bitsandbytes' has no attribute 'nn'

在windows上运行
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('model/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('model/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['~~Human: 介绍一下**\n~~Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

出现错误
Traceback (most recent call last):
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\test.py", line 3, in
model = AutoModelForCausalLM.from_pretrained('model/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\modeling_utils.py", line 2749, in from_pretrained
model = replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 212, in replace_with_bnb_linear
model, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
[Previous line repeated 1 more time]
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 144, in _replace_with_bnb_linear
model._modules[name] = bnb.nn.Linear8bitLt(
AttributeError: module 'bitsandbytes' has no attribute 'nn'

在哪里合并LoRa模型？

开源的13B模型和网站上体验的demo感觉完全不一样

https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat 下载了这个模型，在本地测试了一下他的写诗能力，发现和在线网站上的demo （在线体验：llama.family）完全不一样，请问是什么原因。
本地：

在线：

webui怎么监听地址0.0.0.0？没有找到对应的参数配置

启动命令：python chat_gradio.py --model_name_or_path /root/llama2
监听地址是
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().

练了个不错的中文llama2 13b chat模型

llama2-13b-Chinese-chat

昨天练的，体验对话效果不错，方便的话可以记录一下。

请问下，微调后的合并、推理有示例吗？

模型调用示例代码prompt格式不对

参考 https://github.com/facebookresearch/llama/blob/main/llama/generation.py
https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat/blob/main/model.py

如果我想训练llama2cn写小说，应该如何组织训练集？

比如，提示词是【请续写以下句子】

问题是提示词加上上一句，回答是下一句，这样子吗？

OSError: No such device (os error 19)

from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer( [' \<s\>Human: 介绍一下**\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

按照案例执行，把“meta-llama/Llama-2-7b-chat-hf”改成自己的目录，会报错
model │
│ ing_utils.py:447 in load_state_dict │
│ │
│ 444 │ """ │
│ 445 │ if checkpoint_file.endswith(".safetensors") and is_safetensors_available(): │
│ 446 │ │ # Check format of the archive │
│ ❱ 447 │ │ with safe_open(checkpoint_file, framework="pt") as f: │
│ 448 │ │ │ metadata = f.metadata() │
│ 449 │ │ if metadata.get("format") not in ["pt", "tf", "flax"]: │
│ 450 │ │ │ raise OSError( │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: No such device (os error 19)

请问是为啥？

请问finetune的时候如何设置多卡训练，默认是使用单卡

中文LLAMA测试结果

想问下你们有在目前各大中文测试数据集上测试一下你们微调后的模型性能吗

使用 4bit 參數報錯

command: python chat_gradio.py --model_name_or_path meta-llama/Llama-2-7b-chat-hf --is_4bit
error msg:
Traceback (most recent call last):
File "chat_gradio.py", line 90, in
model = AutoGPTQForCausalLM.from_quantized(args.model_name_or_path, low_cpu_mem_usage=True, device="cuda:0", use_triton=False, inject_fused_attention=False, inject_fused_mlp=False)
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/auto.py", line 105, in from_quantized
return quant_func(
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/_base.py", line 734, in from_quantized
quantize_config = BaseQuantizeConfig.from_pretrained(model_name_or_path, **kwargs)
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/_base.py", line 90, in from_pretrained
with open(resolved_config_file, "r", encoding="utf-8") as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

建个 qq 大群吧, 微信群人数上限500 太少了，而且超过200 还不能主动加入

怎样quantize？

https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat/tree/main
如果我的本机用llama.cpp 跑是不是还得自己quantize？

请问推理时如何batch样本预测

在使用13b模型预测结果时，单条样本运行时间较长，请问如何batch预测结果

安装 Deepspeed时，Unable to pre-compile async_io

Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
No hostfile exists at /job/hostfile, installing locally
Building deepspeed wheel
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
Traceback (most recent call last):
File "C:\Llama2-Chinese\DeepSpeed\setup.py", line 165, in
abort(f"Unable to pre-compile {op_name}")
File "C:\Llama2-Chinese\DeepSpeed\setup.py", line 51, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
Error on line 155
Fail to install deepspeed

有做中文词表扩充吗

请问llama.family什么时候可以更新提供在线测试呢？目前访问超时

如何用Langchain把英语改成汉语？

模型自言自语

推理缓慢

P40显卡
我使用https://github.com/facebookresearch/llama部署原版7b-chat模型时，（下载checklist.chk consolidated.00.pth params.json三个文件）推理速度非常快，大概1秒5单词左右，为什么用本项目推理中文一秒一个字？

cpu运行，加速

3个群都满200人了，想入群

能否提供sha256

下载了国内地址的7b模型后, 用最新的llama.cpp convert会出错, 怀疑是下载过程中出错了, 能否提供下各个模型的sha256 方便对比

cuda 11.7 RuntimeError: Error building extension 'cpu_adam'

cuda 11.7 RuntimeError: Error building extension 'cpu_adam' 求解

安装依赖报错，在爬的情况下

finetune_clm_lora.py中的代码是否可以直接用于T5架构的模型微调？

请问为什么我部署的LLaMa2-chinese-13b-chat的输出结果和您这边在线提供的LLama-family上差距很大？

ERROR: Invalid requirement: 'torch torchvision torchaudio' (from line 13 of requirements.txt)

执行 pip install -r requirements.txt 报 ERROR: Invalid requirement: 'torch torchvision torchaudio' (from line 13 of requirements.txt) 咋回事？

预训练的代码会开源吗

增量训练的也行，期待能早日上线，跑一个自己的base model

模型效果与chatglm2对比如何？

如题，谢谢！

$ pip install -r requirements.txt 报错

$ pip install -r requirements.txt
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, http://pypi.mirrors.ustc.edu.cn/simple/
Collecting git+https://github.com/PanQiWei/AutoGPTQ.git (from -r requirements.txt (line 4))
Cloning https://github.com/PanQiWei/AutoGPTQ.git to c:\users\admin\appdata\local\temp\pip-req-build-2gh473wc
Running command git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc'
fatal: unable to access 'https://github.com/PanQiWei/AutoGPTQ.git/': Recv failure: Connection was reset
fatal: could not fetch d047af6e8e361b71bb7a5b915a8c9cff4f00f1e9 from promisor remote
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

error: subprocess-exited-with-error

git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc' did not run successfully.
exit code: 128

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc' did not run successfully.
exit code: 128

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

预训练

想用自己领域全量预训练需要怎么做

感谢开发者开源，想问一下 huggingface 上两个模型 https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat, https://huggingface.co/FlagAlpha/Llama2-Chinese-7b-Chat finetune 使用的中文数据集是哪些呢？

感谢开发者开源，想问一下 huggingface 上两个模型 https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat, https://huggingface.co/FlagAlpha/Llama2-Chinese-7b-Chat finetune 使用的中文数据集是哪些呢？

训练所需资源？7b-chat，lora

如题，llama2-7b-chat，lora微调所需资源大概是？

Llama-2-13b-chat-hf模型好像有点问题

请问，方便检查一下您分享的Llama-2-13b-chat-hf模型吗？我发现Llama-2-13b-chat-hf和Llama-2-13b-hf的模型权值文件的sha256是一样的。

国内地址上面是1的模型吧

国内地址上面是1的模型吧，看着不像是llama2 跟之前1的大小一模一样

requirements.txt 中的deepspeed==0.10.1 是不是写错了，现在最新版本是deepspeed==0.10.0

请问微调前是否做了中文词表扩展？

情感分析

请问训练模型的时候，有情感分析数据集吗

FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行

请问我尝试使用colab运行示例量化程序，但因内存问题无法启动，这是我的使用的示例程序

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized('FlagAlpha/Llama2-Chinese-13b-Chat-4bit', device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained('FlagAlpha/Llama2-Chinese-13b-Chat-4bit',use_fast=False)
input_ids = tokenizer(['<s>Human: 怎么登上火星\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

colab分配的配置为12.7G内存 T4显卡

bitsandbytes库始终提示You might need to add them to your LD_LIBRARY_PATH.

如图
环境安装好了，但运行始终报错
bitsandbytes库始终提示You might need to add them to your LD_LIBRARY_PATH.
目前系统环境变量已经把LD_LIBRARY_PATH和本地CUDA lib目录添加了，可始终报这个错无法启动

llamafamily / llama-chinese Goto Github PK

llama-chinese's People

Contributors

Stargazers

Watchers

Forkers

llama-chinese's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs