soulteary / docker-llama2-chat Goto Github PK

View Code? Open in Web Editor NEW

526.0 6.0 81.0 8.75 MB

Play LLaMA2 (official / 中文版 / INT4 / llama2.cpp) Together! ONLY 3 STEPS! ( non GPU / 5GB vRAM / 8~14GB vRAM)

Home Page: https://www.zhihu.com/people/soulteary/posts

License: Apache License 2.0

Python 96.73% Roff 0.18% Shell 3.09%

llama llama2 llm llama2-docker llama2-playground

docker-llama2-chat's Issues

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

如何在容器中把app启动改为api启动

通过容器启动后，我如果想通过api调取模型应该如何处理，是需要将容器进行变更，更改dockerfile中相应的内容么

请问本地部署好cpu环境小模型后，怎么支持restful的API调用？

先感谢作者，让人能快速体验

# 自行下载Chinese-Llama-2-7b-ggml-q4.bin放到`pwd`/soulteary，然后这就跑起来了
docker run --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v `pwd`/soulteary:/app/soulteary soulteary/llama2:runtime bash

# 这就可以开始聊起来了
./main -m /app/soulteary/Chinese-Llama-2-7b-ggml-q4.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

个人感觉，小麻雀实际会有更多的应用落地机会，很多应用场景已经足够应付了
接下来，首当其冲就是通过API调用，得有restful的API，这样才能方便和其他系统应用对接

咋弄呢，还请不吝赐教

接下来一些计划折腾的事（可以评论提你的想法）

接入体验更好的前端客户端，让本地体验更好。
接入可以私有化部署的 “ChatGPT”，在原生 Web Client 中玩 :-D
接入 RSS Can 降低信息流数据处理成本。
“一键包”，部署还是太烦了，有好多同学反馈下载慢，除了之前的场景外：
- MacOS ，M1 / M2
- Windows + GPU https://zhuanlan.zhihu.com/p/646758615
- Linux CPU
接入 Code Interpreter

Traceback (most recent call last):
File "/app/model.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 447, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Any idea ?

safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

I am trying to run the 7b-chat, but getting this error

Traceback (most recent call last):
File "/app/app.py", line 6, in
from model import run
File "/app/model.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 447, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

scripts/run-13b.sh fails with http 401 from huggingface.co URLs

Environment: Google Cloud, Nvidia A100 40 GB, 12vCPU, 100 GB disk
Docker and CUDA 12.1 are installed.

This part is OK:

git clone https://github.com/soulteary/docker-llama2-chat
scripts/make-13b.sh

Access from Google VM to huggingface.co seems to be ok (ping 10-12ms)

This part FAILS.

scripts/run-13b.sh

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 261, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 417, in cached_file
resolved_file = hf_hub_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download
metadata = get_hf_file_metadata(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata
hf_raise_for_status(r)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64c0a70b-7e218e8a7f87e86a5fbfb030;382d0c02-cba0-4312-b459-953c3d6951bb)

Repository Not Found for url: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/app.py", line 6, in
from model import run
File "/app/model.py", line 10, in
config = AutoConfig.from_pretrained(model_id)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 983, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 617, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 672, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 433, in cached_file
raise EnvironmentError(
OSError: meta-llama/Llama-2-13b-chat-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

docker: invalid reference format: repository name must be lowercase

bash scripts/run-7b-cn.sh ,Attempting an error:docker: invalid reference format: repository name must be lowercase.
See 'docker run --help'.

请问要怎么改docker 网站分享的port 从7860换成其他port

我将docker script port 7860换成8000了,
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v pwd/LinkSoul:/app/LinkSoul -p
7860:8000 soulteary/llama2:7b-cn

但是启动之后画面还是显示 7860
Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().

请问这个网站的port mapping 可以更改吗？

执行bash make-7b-cn.sh时报错

$ bash make-7b-cn.sh
ERROR: could not find docker: CreateFile docker: The system cannot find the file specified.
2023/08/01 17:02:02 http2: server: error reading preface from client //./pipe/docker_engine: file has already been closed
ERROR: could not find docker: CreateFile docker: The system cannot find the file specified.

希望取得联系

尊敬的 docker-llama2-chat 开发者您好，我是 InternLM 社区开发者&志愿者尖米, 您的工作非常对我的帮助很大，感觉也可以很好的在 InternLM 中使用，我的微信是 mzm312，希望取得联系

OSError: You seem to have cloned a repository without having git-lfs installed

按照教程里做的：
https://soulteary.com/2023/07/21/use-docker-to-quickly-get-started-with-the-chinese-version-of-llama2-open-source-large-model.html

运行容器：sh scripts/run-7b-cn.sh 报错：

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.1 driver version 530.30.02 with kernel driver version 525.105.17.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 460, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 883, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1101, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/app.py", line 6, in
from model import run
File "/app/model.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 465, in load_state_dict
raise OSError(
OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run git lfs install followed by git lfs pull in the folder you cloned.

ValueError

大佬您好！我参考的用 Docker 容器快速上手 Meta AI 出品的 LLaMA2 开源大模型。这篇文章

但是我出现了ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)这个错误是怎么回事，已经可以本地访问Gradio了

【提醒】如果你没有找到 llama2:base 容器镜像

仔细阅读教程，教程可以通过十分钟阅读完毕，仔细阅读将节约几个小时、甚至一天的时间。

参考教程或文档中的命令，进行镜像构建即可。

报错了，格式有问题

No such file or directory

操作步骤如下：

docker pull soulteary/llama2:converter
docker run --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v pwd/LinkSoul:/app/LinkSoul -v pwd/soulteary:/app/soulteary soulteary/llama2:converter bash
python3 convert.py /app/LinkSoul/Chinese-Llama-2-7b/ --outfile /app/soulteary/Chinese-Llama-2-7b-ggml.bin

报错：
No such file or directory: '/app/LinkSoul/Chinese-Llama-2-7b'

执行bash scripts/make-7b-cn.sh，最终报错ERROR: failed to solve: soulteary/llama2:base: docker.io/soulteary/llama2:base: not found

执行bash scripts/make-7b-cn.sh
跑了大概半个小时，最终报错ERROR: failed to solve: soulteary/llama2:base: docker.io/soulteary/llama2:base: not found

soulteary/llama2:base not found

你好，我在通过bash运行脚本的时候发生了如下错误，我也去了dockerhub上的soulteary上查看了镜像，好像没有tag为base的镜像，请问这个错误是为什么呢

Step 1/3 : FROM soulteary/llama2:base
manifest for soulteary/llama2:base not found: manifest unknown: manifest unknown

有支持70b的计划么，想要体验下。

llama2量化后版本加载报错

llama2-7b-chat-hf，按照提供的量化步骤，得到4bit版本的模型并补齐模型文件，通过AutoModelForCausalLM.from_pretrained方式加载时，报NotImplementedError: Cannot copy out of meta tensor; no data!
环境配置：
accelerate==0.21.0
bitsandbytes==0.40.2
gradio==3.37.0
protobuf==3.20.3
scipy==1.11.1
sentencepiece==0.1.99
transformers==4.31.0
torch==1.13.0a0+340c412
cuda==11.7

soulteary / docker-llama2-chat Goto Github PK

docker-llama2-chat's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs