llamafamily / llama-chinese Goto Github PK
View Code? Open in Web Editor NEWLlama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
Home Page: https://llama.family
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
Home Page: https://llama.family
有没有CEVAL的结果了?
does not appear to have a file named config.json.好像是缺少config.json文件。
请问国内下载地址什么时候上线70B的模型
哪位大神知道,在sagemaker上部署70b-chat-hf,报下面的错误,是什么问题?
RuntimeError: The expanded size of the tensor (768) must match the existing size (2048) at non-singleton dimension 0. Target sizes: [768, 8192]. Tensor sizes: [2048, 8192]
#033[2m#033[3mrank#033[0m#033[2m=#033[0m3#033[0m
#033[2m2023-07-21T07:52:53.999740Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 2 failed to start:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 237, in get_model
return FlashLlamaSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 185, in __init__
self.load_weights(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 289, in load_weights
module._parameters[param_name][: tensor.shape[0]] = tensor
RuntimeError: The expanded size of the tensor (768) must match the existing size (2048) at non-singleton dimension 0. Target sizes: [768, 8192]. Tensor sizes: [2048, 8192]
#033[2m2023-07-21T07:52:53.999793Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards
#033[2m2023-07-21T07:52:54.729163Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 3 terminated
Error: ShardCannotStart
请问怎么将代码放进langchain-ChatGLM里面呢,配置到model_config.py文件里面启动,目前是不兼容的
在windows上运行
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('model/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('model/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['Human: 介绍一下**\nAssistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)
出现错误
Traceback (most recent call last):
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\test.py", line 3, in
model = AutoModelForCausalLM.from_pretrained('model/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\modeling_utils.py", line 2749, in from_pretrained
model = replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 212, in replace_with_bnb_linear
model, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
[Previous line repeated 1 more time]
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 144, in _replace_with_bnb_linear
model._modules[name] = bnb.nn.Linear8bitLt(
AttributeError: module 'bitsandbytes' has no attribute 'nn'
https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat 下载了这个模型,在本地测试了一下他的写诗能力,发现和在线网站上的demo (在线体验:llama.family)完全不一样,请问是什么原因。
本地:
启动命令:python chat_gradio.py --model_name_or_path /root/llama2
监听地址是
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True
in launch()
.
昨天练的,体验对话效果不错,方便的话可以记录一下。
请问下,微调后的合并、推理有示例吗?
比如,提示词是【请续写以下句子】
问题是提示词加上上一句,回答是下一句,这样子吗?
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer( [' \<s\>Human: 介绍一下**\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)
按照案例执行,把“meta-llama/Llama-2-7b-chat-hf”改成自己的目录,会报错
model │
│ ing_utils.py:447 in load_state_dict │
│ │
│ 444 │ """ │
│ 445 │ if checkpoint_file.endswith(".safetensors") and is_safetensors_available(): │
│ 446 │ │ # Check format of the archive │
│ ❱ 447 │ │ with safe_open(checkpoint_file, framework="pt") as f: │
│ 448 │ │ │ metadata = f.metadata() │
│ 449 │ │ if metadata.get("format") not in ["pt", "tf", "flax"]: │
│ 450 │ │ │ raise OSError( │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: No such device (os error 19)
请问是为啥?
想问下你们有在目前各大中文测试数据集上测试一下你们微调后的模型性能吗
command: python chat_gradio.py --model_name_or_path meta-llama/Llama-2-7b-chat-hf --is_4bit
error msg:
Traceback (most recent call last):
File "chat_gradio.py", line 90, in
model = AutoGPTQForCausalLM.from_quantized(args.model_name_or_path, low_cpu_mem_usage=True, device="cuda:0", use_triton=False, inject_fused_attention=False, inject_fused_mlp=False)
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/auto.py", line 105, in from_quantized
return quant_func(
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/_base.py", line 734, in from_quantized
quantize_config = BaseQuantizeConfig.from_pretrained(model_name_or_path, **kwargs)
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/_base.py", line 90, in from_pretrained
with open(resolved_config_file, "r", encoding="utf-8") as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType
https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat/tree/main
如果我的本机用llama.cpp 跑是不是还得自己quantize?
在使用13b模型预测结果时,单条样本运行时间较长,请问如何batch预测结果
Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
No hostfile exists at /job/hostfile, installing locally
Building deepspeed wheel
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
Traceback (most recent call last):
File "C:\Llama2-Chinese\DeepSpeed\setup.py", line 165, in
abort(f"Unable to pre-compile {op_name}")
File "C:\Llama2-Chinese\DeepSpeed\setup.py", line 51, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
Error on line 155
Fail to install deepspeed
P40显卡
我使用https://github.com/facebookresearch/llama部署原版7b-chat模型时,(下载checklist.chk consolidated.00.pth params.json三个文件)推理速度非常快,大概1秒5单词左右,为什么用本项目推理中文一秒一个字?
下载了国内地址的7b模型后, 用最新的llama.cpp convert会出错, 怀疑是下载过程中出错了, 能否提供下各个模型的sha256 方便对比
cuda 11.7 RuntimeError: Error building extension 'cpu_adam' 求解
执行 pip install -r requirements.txt 报 ERROR: Invalid requirement: 'torch torchvision torchaudio' (from line 13 of requirements.txt) 咋回事?
增量训练的也行,期待能早日上线,跑一个自己的base model
如题,谢谢!
$ pip install -r requirements.txt
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, http://pypi.mirrors.ustc.edu.cn/simple/
Collecting git+https://github.com/PanQiWei/AutoGPTQ.git (from -r requirements.txt (line 4))
Cloning https://github.com/PanQiWei/AutoGPTQ.git to c:\users\admin\appdata\local\temp\pip-req-build-2gh473wc
Running command git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc'
fatal: unable to access 'https://github.com/PanQiWei/AutoGPTQ.git/': Recv failure: Connection was reset
fatal: could not fetch d047af6e8e361b71bb7a5b915a8c9cff4f00f1e9 from promisor remote
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
error: subprocess-exited-with-error
git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc' did not run successfully.
exit code: 128
See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc' did not run successfully.
exit code: 128
See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
想用自己领域全量预训练需要怎么做
感谢开发者开源,想问一下 huggingface 上两个模型 https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat, https://huggingface.co/FlagAlpha/Llama2-Chinese-7b-Chat finetune 使用的中文数据集是哪些呢?
如题,llama2-7b-chat,lora微调所需资源大概是?
请问,方便检查一下您分享的Llama-2-13b-chat-hf模型吗?我发现Llama-2-13b-chat-hf和Llama-2-13b-hf的模型权值文件的sha256是一样的。
国内地址上面是1的模型吧,看着不像是llama2 跟之前1的大小一模一样
请问训练模型的时候,有情感分析数据集吗
请问我尝试使用colab运行示例量化程序,但因内存问题无法启动,这是我的使用的示例程序
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized('FlagAlpha/Llama2-Chinese-13b-Chat-4bit', device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained('FlagAlpha/Llama2-Chinese-13b-Chat-4bit',use_fast=False)
input_ids = tokenizer(['<s>Human: 怎么登上火星\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)
colab分配的配置为12.7G内存 T4显卡
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.