GithubHelp home page GithubHelp logo

soulteary / docker-llama2-chat Goto Github PK

View Code? Open in Web Editor NEW
522.0 6.0 80.0 8.75 MB

Play LLaMA2 (official / 中文版 / INT4 / llama2.cpp) Together! ONLY 3 STEPS! ( non GPU / 5GB vRAM / 8~14GB vRAM)

Home Page: https://www.zhihu.com/people/soulteary/posts

License: Apache License 2.0

Python 96.73% Roff 0.18% Shell 3.09%
llama llama2 llm llama2-docker llama2-playground

docker-llama2-chat's Introduction

Docker LLaMA2 Chat / 羊驼二代

中文文档 | ENGLISH

三步上手 LLaMA2,一起玩!相关博客教程已更新,同样欢迎“一键三连” 🌟🌟🌟。

使用 Docker 快速上手,本地部署 7B 或 13B 官方模型,或者 7B 中文模型。

博客教程

类型 显存需求 特点 教程地址 教程时间
官方版(英文) 8~14GB 原汁原味 使用 Docker 快速上手官方版 LLaMA2 开源大模型 2023.07.21
LinkSoul 中文版(双语) 8~14GB 支持中文 使用 Docker 快速上手中文版 LLaMA2 开源大模型 2023.07.21
Transformers 量化(中文/官方) 5GB 加速推理、节约显存 使用 Transformers 量化 Meta AI LLaMA2 中文版大模型 2023.07.22
GGML (Llama.cpp) 量化 (中文/官方) 可以不需要显存 CPU 推理 构建能够使用 CPU 运行的 MetaAI LLaMA2 中文大模型 2023.07.23

你可以参考项目代码,举一反三,把模型跑起来,接入到你想玩的地方,包括并不局限于支持 LLaMA 1代的各种开源软件中。

预览图

使用方法

  1. 一条命令,从项目中构建官方版(7B或13B)模型镜像,或中文版镜像(7B或INT4量化版):
# 7B
bash scripts/make-7b.sh

# 或 13B
bash scripts/make-13b.sh

# 或 7B Chinese
bash scripts/make-7b-cn.sh

# 或 7B Chinese 4bit
bash scripts/make-7b-cn-4bit.sh
  1. 选择适合你的命令,从 HuggingFace 下载 LLaMA2 或中文模型:
# MetaAI LLaMA2 Models (10~14GB vRAM)
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
git clone https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

mkdir meta-llama
mv Llama-2-7b-chat-hf meta-llama/
mv Llama-2-13b-chat-hf meta-llama/

# 或 Chinese LLaMA2 (10~14GB vRAM)
git clone https://huggingface.co/LinkSoul/Chinese-Llama-2-7b

mkdir LinkSoul
mv Chinese-Llama-2-7b LinkSoul/

# 或 Chinese LLaMA2 4BIT (5GB vRAM)
git clone https://huggingface.co/soulteary/Chinese-Llama-2-7b-4bit

mkdir soulteary
mv Chinese-Llama-2-7b-4bit soulteary/

将下载好的模型,保持在一个正确的目录结构中。

tree -L 2 meta-llama
soulteary
└── ...
LinkSoul
└── ...
meta-llama
├── Llama-2-13b-chat-hf
│   ├── added_tokens.json
│   ├── config.json
│   ├── generation_config.json
│   ├── LICENSE.txt
│   ├── model-00001-of-00003.safetensors
│   ├── model-00002-of-00003.safetensors
│   ├── model-00003-of-00003.safetensors
│   ├── model.safetensors.index.json
│   ├── pytorch_model-00001-of-00003.bin
│   ├── pytorch_model-00002-of-00003.bin
│   ├── pytorch_model-00003-of-00003.bin
│   ├── pytorch_model.bin.index.json
│   ├── README.md
│   ├── Responsible-Use-Guide.pdf
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
│   └── USE_POLICY.md
└── Llama-2-7b-chat-hf
    ├── added_tokens.json
    ├── config.json
    ├── generation_config.json
    ├── LICENSE.txt
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── model.safetensors.index.json
    ├── models--meta-llama--Llama-2-7b-chat-hf
    ├── pytorch_model-00001-of-00003.bin
    ├── pytorch_model-00002-of-00003.bin
    ├── pytorch_model-00003-of-00003.bin
    ├── pytorch_model.bin.index.json
    ├── README.md
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    ├── tokenizer.json
    ├── tokenizer.model
    └── USE_POLICY.md
  1. 选择使用下面的适合你的命令,一键运行 LLaMA2 模型应用:
# 7B
bash scripts/run-7b.sh
# 或 13B
bash scripts/run-13b.sh
# 或 Chinese 7B
bash scripts/run-7b-cn.sh
# 或 Chinese 7B 4BIT
bash scripts/run-7b-cn-4bit.sh

模型运行之后,在浏览器中访问 http://localhost7860 或者 http://你的IP地址:7860 就可以开始玩了。

相关项目

docker-llama2-chat's People

Contributors

soulteary avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

docker-llama2-chat's Issues

请问本地部署好cpu环境小模型后,怎么支持restful的API调用?

先感谢作者,让人能快速体验

# 自行下载Chinese-Llama-2-7b-ggml-q4.bin放到`pwd`/soulteary,然后这就跑起来了
docker run --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v `pwd`/soulteary:/app/soulteary soulteary/llama2:runtime bash
# 这就可以开始聊起来了
./main -m /app/soulteary/Chinese-Llama-2-7b-ggml-q4.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

image

个人感觉,小麻雀实际会有更多的应用落地机会,很多应用场景已经足够应付了
接下来,首当其冲就是通过API调用,得有restful的API,这样才能方便和其他系统应用对接

咋弄呢,还请不吝赐教

ValueError

大佬您好!我参考的用 Docker 容器快速上手 Meta AI 出品的 LLaMA2 开源大模型。这篇文章

但是我出现了ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)这个错误是怎么回事,已经可以本地访问Gradio了

No such file or directory

操作步骤如下:

  1. docker pull soulteary/llama2:converter
  2. docker run --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v pwd/LinkSoul:/app/LinkSoul -v pwd/soulteary:/app/soulteary soulteary/llama2:converter bash
  3. python3 convert.py /app/LinkSoul/Chinese-Llama-2-7b/ --outfile /app/soulteary/Chinese-Llama-2-7b-ggml.bin

报错:
No such file or directory: '/app/LinkSoul/Chinese-Llama-2-7b'

OSError: You seem to have cloned a repository without having git-lfs installed

OSError: You seem to have cloned a repository without having git-lfs installed

按照教程里做的:
https://soulteary.com/2023/07/21/use-docker-to-quickly-get-started-with-the-chinese-version-of-llama2-open-source-large-model.html

运行容器:sh scripts/run-7b-cn.sh 报错:

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.1 driver version 530.30.02 with kernel driver version 525.105.17.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 460, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 883, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1101, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/app.py", line 6, in
from model import run
File "/app/model.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 465, in load_state_dict
raise OSError(
OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run git lfs install followed by git lfs pull in the folder you cloned.

HeaderTooLarge when testing

I tried deploying it. I had this error:

Traceback (most recent call last):
File "/app/model.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 447, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Any idea ?

执行bash make-7b-cn.sh时报错

$ bash make-7b-cn.sh
ERROR: could not find docker: CreateFile docker: The system cannot find the file specified.
2023/08/01 17:02:02 http2: server: error reading preface from client //./pipe/docker_engine: file has already been closed
ERROR: could not find docker: CreateFile docker: The system cannot find the file specified.
屏幕截图 2023-08-01 171446

llama2量化后版本加载报错

llama2-7b-chat-hf,按照提供的量化步骤,得到4bit版本的模型并补齐模型文件,通过AutoModelForCausalLM.from_pretrained方式加载时,报NotImplementedError: Cannot copy out of meta tensor; no data!
环境配置:
accelerate==0.21.0
bitsandbytes==0.40.2
gradio==3.37.0
protobuf==3.20.3
scipy==1.11.1
sentencepiece==0.1.99
transformers==4.31.0
torch==1.13.0a0+340c412
cuda==11.7

scripts/run-13b.sh fails with http 401 from huggingface.co URLs

Environment: Google Cloud, Nvidia A100 40 GB, 12vCPU, 100 GB disk
Docker and CUDA 12.1 are installed.

This part is OK:

git clone https://github.com/soulteary/docker-llama2-chat
scripts/make-13b.sh

Access from Google VM to huggingface.co seems to be ok (ping 10-12ms)

This part FAILS.

scripts/run-13b.sh

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 261, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 417, in cached_file
resolved_file = hf_hub_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download
metadata = get_hf_file_metadata(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata
hf_raise_for_status(r)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64c0a70b-7e218e8a7f87e86a5fbfb030;382d0c02-cba0-4312-b459-953c3d6951bb)

Repository Not Found for url: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/app.py", line 6, in
from model import run
File "/app/model.py", line 10, in
config = AutoConfig.from_pretrained(model_id)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 983, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 617, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 672, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 433, in cached_file
raise EnvironmentError(
OSError: meta-llama/Llama-2-13b-chat-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

请问要怎么改docker 网站分享的port 从7860换成其他port

我将docker script port 7860换成8000了,
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v pwd/LinkSoul:/app/LinkSoul -p
7860:8000 soulteary/llama2:7b-cn

但是启动之后画面还是显示 7860
Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().

请问这个网站的port mapping 可以更改吗?

希望取得联系

尊敬的 docker-llama2-chat 开发者您好,我是 InternLM 社区开发者&志愿者 尖米, 您的工作非常对我的帮助很大,感觉也可以很好的在 InternLM 中使用,我的微信是 mzm312,希望取得联系

soulteary/llama2:base not found

你好,我在通过bash运行脚本的时候发生了如下错误,我也去了dockerhub上的soulteary上查看了镜像,好像没有tag为base的镜像,请问这个错误是为什么呢

Step 1/3 : FROM soulteary/llama2:base
manifest for soulteary/llama2:base not found: manifest unknown: manifest unknown

safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

I am trying to run the 7b-chat, but getting this error

Traceback (most recent call last):
File "/app/app.py", line 6, in
from model import run
File "/app/model.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 447, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.