GithubHelp home page GithubHelp logo

chatglm-webui's Introduction

ChatGLM-webui

A webui for ChatGLM made by THUDM. chatglm-6b

image

Features

  • Original Chat like chatglm-6b's demo, but use Gradio Chatbox for better user experience.
  • One click install script (but you still must install python)
  • More parameters that can be freely adjusted
  • Convenient save/load dialog history, presets
  • Custom maximum context length
  • Save to Markdown
  • Use program arguments to specify model and caculation accuracy

Install

requirements

python3.10

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install --upgrade -r requirements.txt

or

bash install.sh

Run

python webui.py

Arguments

--model-path: specify model path. If this parameter is not specified manually, the default value is THUDM/chatglm-6b. Transformers will automatically download model from huggingface.

--listen: launch gradio with 0.0.0.0 as server name, allowing to respond to network requests

--port: webui port

--share: use gradio to share

--precision: fp32(CPU only), fp16, int4(CUDA GPU only), int8(CUDA GPU only)

--cpu: use cpu

--path-prefix: url root path. If this parameter is not specified manually, the default value is /. Using a path prefix of /foo/bar enables ChatGLM-webui to serve from http://$ip:$port/foo/bar/ rather than http://$ip:$port/.

chatglm-webui's People

Contributors

akegarasu avatar be5invis avatar haofanurusai avatar kepler-16b avatar remiliacn avatar tangdou1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatglm-webui's Issues

加载chatglm-6b-int4-qe会报错

目录如下:

lxy52@YSTYLE-PC MINGW64 /d/Code/Python/ChatGLM-6B (main)
$ tree -d -L 2
.
|-- ChatGLM-webui
|   |-- modules
|   |-- outputs
|   `-- scripts
|-- THUDM
|   |-- chatglm-6b
|   |-- chatglm-6b-int4
|   |-- chatglm-6b-int4-qe
|   `-- chatglm-6b-main
|-- examples
|-- limitations
|-- outputs
|   |-- markdown
|   `-- save
`-- resources

15 directories

/d/Code/Python/ChatGLM-6B/ChatGLM-webui 下运行 python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\ 会报如下错误,加载chatglm-6b-int4也会报错, 是目录不能用回退的方式加载么?还是什么原因,上周某个版本就可以的,更新了后就不行了

(ChatGLM) PS D:\Code\Python\ChatGLM-6B\ChatGLM-webui> python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
GPU memory: 8.59 GB
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Traceback (most recent call last):
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 52, in <module>
    init()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 24, in init
    load_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 61, in load_model
    prepare_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 42, in prepare_model
    model = model.half().quantize(4).cuda()
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\modeling_chatglm.py", line 1281, in quantize
    load_cpu_kernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 390, in load_cpu_kernel
    cpu_kernels = CPUKernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 157, in __init__
    kernels = ctypes.cdll.LoadLibrary(kernel_file)
  File "D:\Application\Miniconda3\envs\ChatGLM\lib\ctypes\__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
    self._handle = _dlopen(self._name, mode)
rnels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.

模型路径变量建议

model-path 默认值为 THUDM/chatglm-6b,与指引中 model/chatglm-6b 不一致且并未注明,容易造成误解,可否统一或在README中注明

另,目前无模型直接运行会导致在当前系统的 .cache 文件夹下载并使用模型且不遵循源文件命名,可能影响Windows用户系统盘空间(若webui并不打算在系统盘),且存在用户日常用火绒等软件清理临时文件时将全部模型文件清理掉需要重下的风险,直接把https://huggingface.co/THUDM/chatglm-6b git clone到 model-path 应该更好?

希望添加手动标记对话是否添加到上下文的功能

比如连续对话中,有些对话其实没必要添加到上下文里。

可以默认设置为都是添加到上下文,但可以在那一轮对话上取消标记,之后问答时,取消标记的对话就不添加到上下文了。

这样可以把对话历史和上下文分开,减少显卡的压力,应该能实现更多轮次的对话

webui功能建议

  1. 界面的白天/黑夜模式切换按钮;
  2. 聊天记录框的缩放功能;
  3. 多行空白的换行符合并为单个换行符——减少显示时占用的页面空间;
  4. 添加头像及自定义头像功能(更沉浸式的赛博猫娘);
  5. 常用提示词模板功能以及下拉菜单快速调用预设的提示词。

Something went wrong Expecting value: line 1 column 1 (char 0)

python窗口显示:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:14<00:00, 1.82s/it]
GPU memory: 12.88 GB
Choosing precision int8 according to your VRAM. If you want to decide precision yourself, please add argument --precision when launching the application.
Running on local URL: http://127.0.0.1:17860

To create a public link, set share=True in launch().


在网页对话时,发送消息不成功,右上角出现如下信息。

Something went wrong
Expecting value: line 1 column 1 (char 0)

请不吝赐教

share选项无法使用,反代出现问题

最开始我使用share=ture,但在通过外部IP访问的时候显示拒绝连接,而且对话等待时间变长,然后,我打算使用宝塔的反代功能,在未开启缓存的情况下,能够打开网页,但输入对话后,发送一直没变化,对话也仍然停留在对话框中

希望优化模型加载方式

如题,目前的方式对于国内来说几乎必须挂梯子才能下载完整预训练模型,而启动后又必须关闭梯子不然就报错,但即便模型下载完过也必须联网自检,不然就报错,这很鸡肋,不挂梯子不能启动,挂完梯子又要关掉才能用,希望作者可以优化启动模型自检必须要联网自检的冗余步骤,thanks

加入 ChatRWKV 支持,请问开发者有没有联系方式

Hi 大家好,我是 RWKV 的作者,目前有中英文 Chat 模型和小说模型,7B 和 14B:

https://zhuanlan.zhihu.com/p/618011122

RWKV 现在有 pip package 可以直接调用推理,支持 INT8 量化,支持 streaming 模式(可以用很小显存运行),支持拆分到多张显卡:

https://pypi.org/project/rwkv/

大家可以合作加入 RWKV 支持吗?如有兴趣可以加 RWKV 的 QQ 群,谢谢。请问 ChatGLM 有没有群,我也加。

gradio 出现style报错

请问一下报了这个问题

usr/local/lib/python3.10/site-packages/gradio/components/textbox.py:259:UserWarning: Thestyle` method is deprecated. Please set these arguments in the constructor instead.
warnings.warn(

看起来是style.css加载失败,但是路径已经设置对了还是没问题,不理解
Screenshot from 2023-07-04 11-22-46

venv setup script

It could better if the the repo could include a setup script that uses VENV

求助,更新最新版本后运行出错

H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\py310\lib\site-packages\gradio\deprecation.py:40: UserWarning: height is deprecated in Interface(), please use it within launch() instead.
warnings.warn(value)
H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\py310\lib\site-packages\gradio\deprecation.py:43: UserWarning: You have unused kwarg parameters in Textbox, please remove them: {'container': False}
warnings.warn(
Traceback (most recent call last):
File "H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\webui.py", line 58, in
main()
File "H:\软件\百度云网盘\BaiduNetdiskDownload\ChatGLM(1)\webui.py", line 45, in main
ui.queue(concurrency_count=5, max_size=64).launch(
TypeError: Blocks.launch() got an unexpected keyword argument 'root_path'

使用流式输出的时候报错&使用中文对话时代码也被翻译了

这是使用流式输出的时候的报错:Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

`

获取二氧化碳交换机上的设备信息

设备和型号 =二氧化碳_bus.get_card_设备和(二氧化碳_car.get_card_型号)
`
这是生成代码的时候,代码也被翻译了

markdown公式显示问题

image

公式的行距有些奇怪,且会把 $ ... $ 认为是单行公式,如果是行内公式会更美观些

代码块高亮显示

我这里无法高亮显示markdown代码框
image
下图是控制台输出的原文本
image

内存泄漏

我发现webui运行chatglm-6b, 进程内存占用逐步升高,到最后进程退出。

web ui经常丢失回答

windows环境下运行,部分请求要7s以上响应,这时候web ui会频繁出现回答丢失,只有问一些简单的问题,回复后才会把之前的回答带回来。分析了一下,发现去掉 ui.queue().launch(..) 的queue() ,直接 ui.launch()就解决了。
看了下文档,这个queue()功能是为了解决60s以上返回的场景设计的,这里使用反而导致交互上出现问题。

FileNotFoundError

FileNotFoundError: Could not find module 'nvcuda.dll' (or one of its dependencies). Try using the full path with constructor syntax.ue is:open 这个咋整啊家人们

实现了保存已量化模型,大幅加快启动速度,望合并

只做了4位的,8位同理。model.py中,相应函数改为以下内容。首次运行,需将firset_run改为1。
可在config中加入开关并自动检测保存状态。

def prepare_model():
    import pickle
    from transformers import AutoModel
    global model
    if cmd_opts.precision == "int4":
        firset_run=0
        if firset_run:
            model = AutoModel.from_pretrained(cmd_opts.model_path, trust_remote_code=True)
            model = model.half().quantize(4)
            print("量化完毕")
            with open(cmd_opts.model_path+"int4", 'wb') as f:
                pickle.dump(model, f)
            print("保存量化完毕")
        else:
            with open(cmd_opts.model_path+"int4", 'rb') as f:
                model = pickle.load(f)
        model = model.cuda()
        model = model.eval()
        return

    model = AutoModel.from_pretrained(cmd_opts.model_path, trust_remote_code=True)
    if cmd_opts.cpu:
        model = model.float()
    else:
        if cmd_opts.precision == "fp16":
            model = model.half().cuda()
        elif cmd_opts.precision == "int8":
            model = model.half().quantize(8).cuda()
    model = model.eval()


def load_model():
    if cmd_opts.ui_dev:
        return

    global tokenizer, model
    from transformers import AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained(cmd_opts.model_path, trust_remote_code=True)
    prepare_model()

安装不了 gradio. 报错ffmpy

在colab里面,使用python3.10安装gradio的时候,出现如下错误,可能是ffmpy 最高只支持到python3.9。不知道这个问题要如何解决
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ffmpy
Using cached ffmpy-0.3.0.tar.gz (4.8 kB)
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

多人同时对话问题

如果一个客户端正在使用,另一个新打开的客户端并不知道,此时如果进行发送,会出现奇怪的问题。
具体是如下图:图1的客户端先问“简单介绍下自己”,AI回复到“我是ChatGLM,是清华大学KEG实验室和”的时候,图2的客户端插入问题:“你可以做什么”(此时图2客户端看到的是空白对话框)。最终产生了如下两张图的神奇效果。

图1
image

图2
image

启用流式输出报错:Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

尝试启用流式输出报错:Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

环境:Python 3.10.7
pip list:

Package Version


aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 4.2.2
anyio 3.6.2
async-timeout 4.0.2
attrs 22.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
click 8.1.3
colorama 0.4.6
contourpy 1.0.7
cpm-kernels 1.0.11
cycler 0.11.0
entrypoints 0.4
fastapi 0.95.0
ffmpy 0.3.0
filelock 3.10.3
fonttools 4.39.2
frozenlist 1.3.3
fsspec 2023.3.0
gradio 3.23.0
h11 0.14.0
httpcore 0.16.3
httpx 0.23.3
huggingface-hub 0.13.3
icetk 0.0.4
idna 3.4
Jinja2 3.1.2
jsonschema 4.17.3
kiwisolver 1.4.4
linkify-it-py 2.0.0
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
multidict 6.0.4
numpy 1.24.2
orjson 3.8.8
packaging 23.0
pandas 1.5.3
Pillow 9.4.0
pip 22.2.2
protobuf 3.20.0
pydantic 1.10.7
pydub 0.25.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2022.7.1
PyYAML 6.0
regex 2023.3.23
requests 2.28.2
rfc3986 1.5.0
semantic-version 2.10.0
sentencepiece 0.1.97
setuptools 63.2.0
six 1.16.0
sniffio 1.3.0
starlette 0.26.1
tokenizers 0.13.2
toolz 0.12.0
torch 1.13.1+cu117
torchvision 0.14.1+cu117
tqdm 4.65.0
transformers 4.27.3
typing_extensions 4.5.0
uc-micro-py 1.0.1
urllib3 1.26.15
uvicorn 0.21.1
websockets 10.4
yarl 1.8.2

更新最新的webui后一发送就出错

更新最新版webui后每次发送都报错
Generation failed: AttributeError("'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'")

Traceback (most recent call last):
File "L:\ChatGLM\py310\lib\site-packages\gradio\routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "L:\ChatGLM\py310\lib\site-packages\gradio\blocks.py", line 1059, in process_api
result = await self.call_function(
File "L:\ChatGLM\py310\lib\site-packages\gradio\blocks.py", line 882, in call_function
prediction = await anyio.to_thread.run_sync(
File "L:\ChatGLM\py310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "L:\ChatGLM\py310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "L:\ChatGLM\py310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "L:\ChatGLM\py310\lib\site-packages\gradio\utils.py", line 549, in async_iteration
return next(iterator)
File "L:\ChatGLM\modules\ui.py", line 30, in predict
ctx.refresh_last()
File "L:\ChatGLM\modules\context.py", line 42, in refresh_last
query, output = self.rh[-1]
IndexError: list index out of range

Could you make one for Apple Silicon mac?

Thank you for providing the web UI! However, I am currently using a laptop with an RTX3060, which means that I am only able to use the CPU. Unfortunately, the CPU is too slow and the performance is not ideal. On the other hand, I also have a MacBook Air with an M2 processor and 16GB of RAM. Would it be possible for you to create a version of the UI that is compatible with Apple Silicon macs? I would greatly appreciate it if I could use the UI on my MacBook.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.