akegarasu / chatglm-webui Goto Github PK

View Code? Open in Web Editor NEW

1.9K 17.0 258.0 79 KB

A WebUI for ChatGLM-6B

Python 83.86% Shell 1.35% JavaScript 10.20% CSS 1.67% Batchfile 2.92%

chatglm-webui's Introduction

ChatGLM-webui

A webui for ChatGLM made by THUDM. chatglm-6b

Features

Original Chat like chatglm-6b's demo, but use Gradio Chatbox for better user experience.
One click install script (but you still must install python)
More parameters that can be freely adjusted
Convenient save/load dialog history, presets
Custom maximum context length
Save to Markdown
Use program arguments to specify model and caculation accuracy

Install

requirements

python3.10

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install --upgrade -r requirements.txt

bash install.sh

Run

python webui.py

Arguments

--model-path: specify model path. If this parameter is not specified manually, the default value is THUDM/chatglm-6b. Transformers will automatically download model from huggingface.

--listen: launch gradio with 0.0.0.0 as server name, allowing to respond to network requests

--port: webui port

--share: use gradio to share

--precision: fp32(CPU only), fp16, int4(CUDA GPU only), int8(CUDA GPU only)

--cpu: use cpu

--path-prefix: url root path. If this parameter is not specified manually, the default value is /. Using a path prefix of /foo/bar enables ChatGLM-webui to serve from http://$ip:$port/foo/bar/ rather than http://$ip:$port/.

chatglm-webui's People

Contributors

Stargazers

Watchers

Forkers

watchings lostmanwang madornc viyiviyi watson-will yuzcat01 lijihhh killd jhssugi 250615882 hanniballam yestarone qhsakura arktoschn liangwh2001 kepler-16b weifuzi0217 adkcodexd ukaserge to-be-architect jaedukseo haofanurusai tianmei cyoyo-geek dumpmemory granbluemaster learning-group1 charleszhang2023 wangfy8599 eltociear forkcodeaiyc robin021 justpoke1995 ludashi2020 be5invis sebastian-otto hmfsmile leegggg lala-0x3f jadeluo stu012736 yuncheng66 markschmidty remiliacn fowayorg dingl0312 mendfermi kekewind kenneth104 cuscutae castrol68 yuetu96 busfly aigc404 cupandcup iyunya zhongpei alpgo-ai zzy0302 tomstsai coryyi shmctchina echo719 alphaguo171 alinccc liuqcafe yuanmouren1hao bahamutww qingyanjiu bianzuilun tian64873493 mgwade528 frankzxshen wistone1141 liuyunrui123 leilif zero506 ai-generation dd-rongfa wsh032 itsharex c13145 hayate-hsu susery tanghulutaitian hahazss frangled123 mitayuming arctictlzicu wcg-king adamjue yzxzero dingxudong2017 ac6497com impreza2022 handsomejustin zhaoheng2023 wupeng-engineer flyummicro kprimo

chatglm-webui's Issues

如果像stable diffusion webui 的启动器一样，有个控制台可以看实时运行情况就好了

如题

对话后右上报错 Something went wrong Expecting value: line 1 column 1 (char 0)

加载chatglm-6b-int4-qe会报错

目录如下：

lxy52@YSTYLE-PC MINGW64 /d/Code/Python/ChatGLM-6B (main)
$ tree -d -L 2
.
|-- ChatGLM-webui
|   |-- modules
|   |-- outputs
|   `-- scripts
|-- THUDM
|   |-- chatglm-6b
|   |-- chatglm-6b-int4
|   |-- chatglm-6b-int4-qe
|   `-- chatglm-6b-main
|-- examples
|-- limitations
|-- outputs
|   |-- markdown
|   `-- save
`-- resources

15 directories

在/d/Code/Python/ChatGLM-6B/ChatGLM-webui 下运行 python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\ 会报如下错误，加载chatglm-6b-int4也会报错，是目录不能用回退的方式加载么？还是什么原因，上周某个版本就可以的，更新了后就不行了

(ChatGLM) PS D:\Code\Python\ChatGLM-6B\ChatGLM-webui> python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
GPU memory: 8.59 GB
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Traceback (most recent call last):
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 52, in <module>
    init()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 24, in init
    load_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 61, in load_model
    prepare_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 42, in prepare_model
    model = model.half().quantize(4).cuda()
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\modeling_chatglm.py", line 1281, in quantize
    load_cpu_kernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 390, in load_cpu_kernel
    cpu_kernels = CPUKernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 157, in __init__
    kernels = ctypes.cdll.LoadLibrary(kernel_file)
  File "D:\Application\Miniconda3\envs\ChatGLM\lib\ctypes\__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
    self._handle = _dlopen(self._name, mode)
rnels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.

建议添加前端markdown渲染

官方仓库更新了流式输出，希望webui支持

官方hugging face仓库已支持流式输出，调用接口为model.stream_chat()

模型路径变量建议

model-path 默认值为 THUDM/chatglm-6b，与指引中 model/chatglm-6b 不一致且并未注明，容易造成误解，可否统一或在README中注明

另，目前无模型直接运行会导致在当前系统的 .cache 文件夹下载并使用模型且不遵循源文件命名，可能影响Windows用户系统盘空间（若webui并不打算在系统盘），且存在用户日常用火绒等软件清理临时文件时将全部模型文件清理掉需要重下的风险，直接把https://huggingface.co/THUDM/chatglm-6b git clone到 model-path 应该更好？

希望添加手动标记对话是否添加到上下文的功能

比如连续对话中，有些对话其实没必要添加到上下文里。

可以默认设置为都是添加到上下文，但可以在那一轮对话上取消标记，之后问答时，取消标记的对话就不添加到上下文了。

这样可以把对话历史和上下文分开，减少显卡的压力，应该能实现更多轮次的对话

UI画面不全，输入发送不了

webui功能建议

界面的白天/黑夜模式切换按钮；
聊天记录框的缩放功能；
多行空白的换行符合并为单个换行符——减少显示时占用的页面空间；
添加头像及自定义头像功能（更沉浸式的赛博猫娘）；
常用提示词模板功能以及下拉菜单快速调用预设的提示词。

Something went wrong Expecting value: line 1 column 1 (char 0)

python窗口显示：

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:14<00:00, 1.82s/it]
GPU memory: 12.88 GB
Choosing precision int8 according to your VRAM. If you want to decide precision yourself, please add argument --precision when launching the application.
Running on local URL: http://127.0.0.1:17860

To create a public link, set share=True in launch().

在网页对话时，发送消息不成功，右上角出现如下信息。

Something went wrong
Expecting value: line 1 column 1 (char 0)

请不吝赐教

No module named 'transformers_modules.THUDM/chatglm-6b'

share选项无法使用，反代出现问题

最开始我使用share=ture，但在通过外部IP访问的时候显示拒绝连接，而且对话等待时间变长，然后，我打算使用宝塔的反代功能，在未开启缓存的情况下，能够打开网页，但输入对话后，发送一直没变化，对话也仍然停留在对话框中

能添加联网功能吗?

如标题希望能添加使用不要key的搜寻引擎

如何使用GPU推理？

使用python webui.py --listen启动，GPU不工作。

希望优化模型加载方式

如题，目前的方式对于国内来说几乎必须挂梯子才能下载完整预训练模型，而启动后又必须关闭梯子不然就报错，但即便模型下载完过也必须联网自检，不然就报错，这很鸡肋，不挂梯子不能启动，挂完梯子又要关掉才能用，希望作者可以优化启动模型自检必须要联网自检的冗余步骤，thanks

加入 ChatRWKV 支持，请问开发者有没有联系方式

Hi 大家好，我是 RWKV 的作者，目前有中英文 Chat 模型和小说模型，7B 和 14B：

https://zhuanlan.zhihu.com/p/618011122

RWKV 现在有 pip package 可以直接调用推理，支持 INT8 量化，支持 streaming 模式（可以用很小显存运行），支持拆分到多张显卡：

https://pypi.org/project/rwkv/

大家可以合作加入 RWKV 支持吗？如有兴趣可以加 RWKV 的 QQ 群，谢谢。请问 ChatGLM 有没有群，我也加。

Generation failed: RuntimeError('Library cudart is not initialized')

刚安装，输入文本，点击发送后没有任何反应，提示信息为：Generation failed: RuntimeError('Library cudart is not initialized')

模型存放

請問模型在下載後存放在哪裡?

gradio 出现style报错

请问一下报了这个问题

usr/local/lib/python3.10/site-packages/gradio/components/textbox.py:259:UserWarning: Thestyle` method is deprecated. Please set these arguments in the constructor instead.
warnings.warn(

看起来是style.css加载失败，但是路径已经设置对了还是没问题，不理解

venv setup script

It could better if the the repo could include a setup script that uses VENV

求助，更新最新版本后运行出错

H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\py310\lib\site-packages\gradio\deprecation.py:40: UserWarning: height is deprecated in Interface(), please use it within launch() instead.
warnings.warn(value)
H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\py310\lib\site-packages\gradio\deprecation.py:43: UserWarning: You have unused kwarg parameters in Textbox, please remove them: {'container': False}
warnings.warn(
Traceback (most recent call last):
File "H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\webui.py", line 58, in
main()
File "H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\webui.py", line 45, in main
ui.queue(concurrency_count=5, max_size=64).launch(
TypeError: Blocks.launch() got an unexpected keyword argument 'root_path'

使用流式输出的时候报错&使用中文对话时代码也被翻译了

这是使用流式输出的时候的报错：Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

获取二氧化碳交换机上的设备信息

设备和型号 =二氧化碳_bus.get_card_设备和(二氧化碳_car.get_card_型号)
`
这是生成代码的时候，代码也被翻译了

markdown公式显示问题

公式的行距有些奇怪，且会把 $ ... $ 认为是单行公式，如果是行内公式会更美观些

请问下载模型放在哪儿啊？

我这自动下载放C盘，有点吃不消啊，模型可以放在项目目录嘛。

Web报错Expecting value: line 1 column 1 (char 0)

Expecting value: line 1 column 1 (char 0)

代码块高亮显示

我这里无法高亮显示markdown代码框

下图是控制台输出的原文本

hgg

gfcjnb

对话报错

UI画面不全，输入发送不了

能用.safetensors格式加载chatglm-6b-int4吗?

您好，请问能用.safetensors格式的模型吗?
用Hugging Face的工具转换chatglm-6b-int4。

内存泄漏

我发现webui运行chatglm-6b, 进程内存占用逐步升高，到最后进程退出。

web ui经常丢失回答

windows环境下运行，部分请求要7s以上响应，这时候web ui会频繁出现回答丢失，只有问一些简单的问题，回复后才会把之前的回答带回来。分析了一下，发现去掉 ui.queue().launch(..) 的queue() ，直接 ui.launch()就解决了。
看了下文档，这个queue()功能是为了解决60s以上返回的场景设计的，这里使用反而导致交互上出现问题。

Could you add a screenshot of the webui to the README?

Could you add a screenshot of the webui to the README? I want to get an idea of what is included in this web UI and how it compares to demos of ChatGLM on Hugging Face Spaces. Thanks!

FileNotFoundError

FileNotFoundError: Could not find module 'nvcuda.dll' (or one of its dependencies). Try using the full path with constructor syntax.ue is:open 这个咋整啊家人们

feat: localizations

建议优化markdown渲染，发现在输出数学公式后回复速度会急剧下降，希望修复

实现了保存已量化模型，大幅加快启动速度，望合并

只做了4位的，8位同理。model.py中，相应函数改为以下内容。首次运行，需将firset_run改为1。
可在config中加入开关并自动检测保存状态。

def prepare_model():
    import pickle
    from transformers import AutoModel
    global model
    if cmd_opts.precision == "int4":
        firset_run=0
        if firset_run:
            model = AutoModel.from_pretrained(cmd_opts.model_path, trust_remote_code=True)
            model = model.half().quantize(4)
            print("量化完毕")
            with open(cmd_opts.model_path+"int4", 'wb') as f:
                pickle.dump(model, f)
            print("保存量化完毕")
        else:
            with open(cmd_opts.model_path+"int4", 'rb') as f:
                model = pickle.load(f)
        model = model.cuda()
        model = model.eval()
        return

    model = AutoModel.from_pretrained(cmd_opts.model_path, trust_remote_code=True)
    if cmd_opts.cpu:
        model = model.float()
    else:
        if cmd_opts.precision == "fp16":
            model = model.half().cuda()
        elif cmd_opts.precision == "int8":
            model = model.half().quantize(8).cuda()
    model = model.eval()


def load_model():
    if cmd_opts.ui_dev:
        return

    global tokenizer, model
    from transformers import AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained(cmd_opts.model_path, trust_remote_code=True)
    prepare_model()

RuntimeError: Unknown platform: darwin

安装不了 gradio. 报错ffmpy

在colab里面，使用python3.10安装gradio的时候，出现如下错误，可能是ffmpy 最高只支持到python3.9。不知道这个问题要如何解决
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ffmpy
Using cached ffmpy-0.3.0.tar.gz (4.8 kB)
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

mac urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with LibreSSL 2.8.3.

能加入局域网远程访问功能吗？

能加入局域网远程访问功能吗？不想抱着服务器在跟AI聊天了👋 ，或者说已经有方法能够实现局域网访问？

弄了个交流群，自己弄好多细节不会，大家技术讨论加connection-image 我来拉你

自己搞好多不清楚的，一起来弄吧。。准备搞个部署问题的解决文档出来

Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

启动webyi使用流式输出就会出现问题，请问是哪里出错了呢？
不使用流式输出就不会出现问题

多人同时对话问题

如果一个客户端正在使用，另一个新打开的客户端并不知道，此时如果进行发送，会出现奇怪的问题。
具体是如下图：图1的客户端先问“简单介绍下自己”，AI回复到“我是ChatGLM,是清华大学KEG实验室和”的时候，图2的客户端插入问题：“你可以做什么”（此时图2客户端看到的是空白对话框）。最终产生了如下两张图的神奇效果。

图1

图2

UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f44b' in position 2: illegal multibyte sequence

Something went wrong Expecting value: line 1 column 1 (char 0)

输入文本点击提交就界面就显示如下错误：

Something went wrong
Expecting value: line 1 column 1 (char 0)

看到有人说把 queue去掉，去掉了就会报queue的错误。这个问题如何解决？

启用流式输出报错：Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

尝试启用流式输出报错：Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

环境：Python 3.10.7
pip list：

Package Version

aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 4.2.2
anyio 3.6.2
async-timeout 4.0.2
attrs 22.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
click 8.1.3
colorama 0.4.6
contourpy 1.0.7
cpm-kernels 1.0.11
cycler 0.11.0
entrypoints 0.4
fastapi 0.95.0
ffmpy 0.3.0
filelock 3.10.3
fonttools 4.39.2
frozenlist 1.3.3
fsspec 2023.3.0
gradio 3.23.0
h11 0.14.0
httpcore 0.16.3
httpx 0.23.3
huggingface-hub 0.13.3
icetk 0.0.4
idna 3.4
Jinja2 3.1.2
jsonschema 4.17.3
kiwisolver 1.4.4
linkify-it-py 2.0.0
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
multidict 6.0.4
numpy 1.24.2
orjson 3.8.8
packaging 23.0
pandas 1.5.3
Pillow 9.4.0
pip 22.2.2
protobuf 3.20.0
pydantic 1.10.7
pydub 0.25.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2022.7.1
PyYAML 6.0
regex 2023.3.23
requests 2.28.2
rfc3986 1.5.0
semantic-version 2.10.0
sentencepiece 0.1.97
setuptools 63.2.0
six 1.16.0
sniffio 1.3.0
starlette 0.26.1
tokenizers 0.13.2
toolz 0.12.0
torch 1.13.1+cu117
torchvision 0.14.1+cu117
tqdm 4.65.0
transformers 4.27.3
typing_extensions 4.5.0
uc-micro-py 1.0.1
urllib3 1.26.15
uvicorn 0.21.1
websockets 10.4
yarl 1.8.2

这有一个需求：输出的公式，看不清，如果能得到优化的话，效果会更好

更新最新的webui后一发送就出错

更新最新版webui后每次发送都报错
Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

Traceback (most recent call last):
File "L:\ChatGLM\py310\lib\site-packages\gradio\routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "L:\ChatGLM\py310\lib\site-packages\gradio\blocks.py", line 1059, in process_api
result = await self.call_function(
File "L:\ChatGLM\py310\lib\site-packages\gradio\blocks.py", line 882, in call_function
prediction = await anyio.to_thread.run_sync(
File "L:\ChatGLM\py310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "L:\ChatGLM\py310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "L:\ChatGLM\py310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "L:\ChatGLM\py310\lib\site-packages\gradio\utils.py", line 549, in async_iteration
return next(iterator)
File "L:\ChatGLM\modules\ui.py", line 30, in predict
ctx.refresh_last()
File "L:\ChatGLM\modules\context.py", line 42, in refresh_last
query, output = self.rh[-1]
IndexError: list index out of range

Could you make one for Apple Silicon mac?

Thank you for providing the web UI! However, I am currently using a laptop with an RTX3060, which means that I am only able to use the CPU. Unfortunately, the CPU is too slow and the performance is not ideal. On the other hand, I also have a MacBook Air with an M2 processor and 16GB of RAM. Would it be possible for you to create a version of the UI that is compatible with Apple Silicon macs? I would greatly appreciate it if I could use the UI on my MacBook.