camenduru / text-generation-webui-colab Goto Github PK

A colab gradio web UI for running Large Language Models

License: The Unlicense

Jupyter Notebook 100.00%

alpaca colab colab-notebook colaboratory gradio koala lama llama llamas llm vicuna

text-generation-webui-colab's Introduction

🐣 Please follow me for new updates https://twitter.com/camenduru
🔥 Please join our discord server https://discord.gg/k5BwmmvJJU
🥳 Please join my patreon community https://patreon.com/camenduru

🚦 WIP 🚦

🦒 Colab

colab	Info - Model Page
	vicuna-13b-GPTQ-4bit-128g https://vicuna.lmsys.org
	vicuna-13B-1.1-GPTQ-4bit-128g https://vicuna.lmsys.org
	stable-vicuna-13B-GPTQ-4bit-128g https://huggingface.co/CarperAI/stable-vicuna-13b-delta
	gpt4-x-alpaca-13b-native-4bit-128g https://huggingface.co/chavinlo/gpt4-x-alpaca
	pyg-7b-GPTQ-4bit-128g https://huggingface.co/Neko-Institute-of-Science/pygmalion-7b
	koala-13B-GPTQ-4bit-128g https://bair.berkeley.edu/blog/2023/04/03/koala
	oasst-llama13b-GPTQ-4bit-128g https://open-assistant.io
	wizard-lm-uncensored-7b-GPTQ-4bit-128g https://github.com/nlpxucan/WizardLM
	mpt-storywriter-7b-GPTQ-4bit-128g https://www.mosaicml.com
	wizard-lm-uncensored-13b-GPTQ-4bit-128g https://github.com/nlpxucan/WizardLM
	pyg-13b-GPTQ-4bit-128g https://huggingface.co/PygmalionAI/pygmalion-13b
	falcon-7b-instruct-GPTQ-4bit https://falconllm.tii.ae/
	wizard-lm-13b-1.1-GPTQ-4bit-128g https://github.com/nlpxucan/WizardLM
	llama-2-7b-chat-GPTQ-4bit (4bit) https://ai.meta.com/llama/
	llama-2-13b-chat-GPTQ-4bit (4bit) https://ai.meta.com/llama/ 🚦 WIP 🚦 please try llama-2-13b-chat or llama-2-7b-chat or llama-2-7b-chat-GPTQ-4bit
	llama-2-7b-chat (16bit) https://ai.meta.com/llama/
	llama-2-13b-chat (8bit) https://ai.meta.com/llama/
	redmond-puffin-13b-GPTQ-4bit (4bit) https://huggingface.co/NousResearch/Redmond-Puffin-13B
	stable-beluga-7b (16bit) https://huggingface.co/stabilityai/StableBeluga-7B
	doctor-gpt-7b (16bit) https://ai.meta.com/llama/ (https://github.com/llSourcell/DoctorGPT)
	code-llama-7b (16bit) https://github.com/facebookresearch/codellama
	code-llama-instruct-7b (16bit) https://github.com/facebookresearch/codellama
	code-llama-python-7b (16bit) https://github.com/facebookresearch/codellama
	mistral-7b-Instruct-v0.1-8bit (8bit) https://mistral.ai/
	mytho-max-l2-13b-GPTQ (4bit) https://huggingface.co/Gryphe/MythoMax-L2-13b

🦒 Colab Pro

According to the Facebook Research LLaMA license (Non-commercial bespoke license), maybe we cannot use this model with a Colab Pro account. But Yann LeCun said "GPL v3" (https://twitter.com/ylecun/status/1629189925089296386) I am a little confused. Is it possible to use this with a non-free Colab Pro account?

Tutorial

https://www.youtube.com/watch?v=kgA7eKU1XuA

⚠ If you encounter an `IndexError: list index out of range` error, please set the models instruction template.

Text Generation Web UI

https://github.com/oobabooga/text-generation-webui (Thanks to @oobabooga ❤)

Models License

Model	License
vicuna-13b-GPTQ-4bit-128g	From https://vicuna.lmsys.org: The online demo is a research preview intended for non-commercial use only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us If you find any potential violation. The code is released under the Apache License 2.0.
gpt4-x-alpaca-13b-native-4bit-128g	https://huggingface.co/chavinlo/alpaca-native -> https://huggingface.co/chavinlo/alpaca-13b -> https://huggingface.co/chavinlo/gpt4-x-alpaca
llama-2	https://ai.meta.com/llama/ Llama 2 is available for free for research and commercial use. 🥳

Special Thanks

Medical Advice Disclaimer

DISCLAIMER: THIS WEBSITE DOES NOT PROVIDE MEDICAL ADVICE The information, including but not limited to, text, graphics, images and other material contained on this website are for informational purposes only. No material on this site is intended to be a substitute for professional medical advice, diagnosis or treatment. Always seek the advice of your physician or other qualified health care provider with any questions you may have regarding a medical condition or treatment and before undertaking a new health care regimen, and never disregard professional medical advice or delay in seeking it because of something you have read on this website.

text-generation-webui-colab's People

Contributors

Stargazers

Watchers

Forkers

soxunlocks ihorflad octag0no if-ai aguusxdxd2 rooben-me opengg nl021 8389899 xinzhaobuxuan xpertdev jimcary tweetyukky bortus-ai paplorinc ooropuloo fujohnwang kp-forks cesarriat decentralised-ai hsiehpinghan ai-jie01 bighsueh jangocheng ewave33 pils10 guinnessshep rockdog22 yanggf8 tonywhite11 asghar765 techthiyanes rpfilomeno plaethos09 ibakuseb rocalabern geraudloup tobiasoberrauch mattnexxx rudrahh brianjking rachidbenzhair krish240574 shawnisikli un1tz3r0 relsi yvonne-aizawa tallesairan aniketgurav mouleeshwari jcarlosneto weiplanet winnerking-2020 rrrmannn withflyingfree marwanto606 jan-karsten-kuhnke shubhamksingh1 nethajinirmal13 yvillamil-stratio typesdigital ngoc7sao9 blurrydev suacalis webrulon sathyakrishna-sharma aifrenz garistides genostack domswp rajan1994 0hwan revanks adrianwedd perttu praveenvattem mrsaadfazal pjhgh erima2020 vinicius-ianni vdeeplearn pandeybk rajendharmendra ailexdev flyingbearhk keynertyc wonder2k ashish9277 mevengue jmwdpk p8xtgdzy mohan-chinnappan-n barg-curtin-university reachsuman raikikon eribertoo hoooon89 brunobezerracupertino 54457616 heapstair

text-generation-webui-colab's Issues

Trying to put an image at the character but it fail

Hello! I'm using the colab version of this and i'm trying to import an image to the character but nothing seens to works for some reason, i'm trying to do this for a while but I don't know why is giving me so many erros! can someone help me to solve this?

cannot import name 'is_npu_available' from 'accelerate.utils'

│ /usr/local/lib/python3.10/dist-packages/peft/utils/other.py:24 in │
│ │
│ 21 import accelerate │
│ 22 import torch │
│ 23 from accelerate.hooks import add_hook_to_module, remove_hook_from_modu │
│ ❱ 24 from accelerate.utils import is_npu_available, is_xpu_available │
│ 25 │
│ 26 from ..import_utils import is_auto_gptq_available │
│ 27 │
╰──────────────────────────────────────────────────────────────────────────────╯
ImportError: cannot import name 'is_npu_available' from 'accelerate.utils'
(/usr/local/lib/python3.10/dist-packages/accelerate/utils/init.py)

actually it is installed.

Is there a way to use it on Collabs and request stuff via code?

Can it be used on collabs and we request and receive the conversation via python code?
Or is it limited by web-ui?

gptQ 4 Bit does not produce any output on Cloab error: IndexError: list index out of range

gptq 4 Bit does not produce any output on Cloab
error: IndexError: list index out of range

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 427, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1323, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1067, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 336, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 329, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 312, in run_sync_iterator_async
return next(iterator)
File "/content/text-generation-webui/modules/chat.py", line 305, in generate_chat_reply_wrapper
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True)):
File "/content/text-generation-webui/modules/chat.py", line 290, in generate_chat_reply
for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message):
File "/content/text-generation-webui/modules/chat.py", line 194, in chatbot_wrapper
stopping_strings = get_stopping_strings(state)
File "/content/text-generation-webui/modules/chat.py", line 161, in get_stopping_strings
state['turn_template'].split('<|user-message|>')[1].split('<|bot|>')[0] + '<|bot|>',
IndexError: list index out of range

new to free google colab

Being on an NVIDIA T4, Is it possible to utilize xformers, and use exllamav2 as the loader for (mistral flavor of your choice)GPTQ 4bit 32gs ... I have a feeling it would perform blazingly fast with minimal degradation and great context... But you've spent more time on this...

[Feature Request] Support InternLM

Dear text-generation-webui-colab developer,

Greetings! I am vansinhu, a community developer and volunteer at InternLM. Your work has been immensely beneficial to me, and I believe it can be effectively utilized in InternLM as well. Welcome to add Discord https://discord.gg/gF9ezcmtM3 . I hope to get in touch with you.

Best regards,
vansinhu

mpt-7b-chat

Hello can you make on google colab mpt-7b-chat please?
Or you wait anything?

vicuna-13b-GPTQ-4bit-128g.ipynb seems to have dep conflict

Not sure what I'm doing wrong, but it seems transformers might have conflicting version numbers, or PIL.Image.Resampling isn't available for some reason.

Running https://colab.research.google.com/github/camenduru/text-generation-webui-colab/blob/main/vicuna-13b-GPTQ-4bit-128g.ipynb gave me output that ends with:

Status Legend:
(OK):download completed.
/content/text-generation-webui
Traceback (most recent call last):
  File "/content/text-generation-webui/server.py", line 18, in <module>
    from modules import api, chat, shared, training, ui
  File "/content/text-generation-webui/modules/api.py", line 6, in <module>
    from modules.text_generation import generate_reply
  File "/content/text-generation-webui/modules/text_generation.py", line 7, in <module>
    import transformers
ModuleNotFoundError: No module named 'transformers'

which I've traced to the last line !python server.py --share --chat --wbits 4 --groupsize 128

Running the following in a new code block (no version numbers) to address missing deps didn't seem to get me very far either:

!pip install transformers accelerate datasets peft safetensors SentencePiece
!python server.py --share --chat --wbits 4 --groupsize 128

still gave me this error:

AttributeError: module 'PIL.Image' has no attribute 'Resampling'

Other references:

https://github.com/camenduru/text-generation-webui/blob/main/requirements.txt

https://github.com/camenduru/text-generation-webui/blob/main/server.py

Task exception was never retrieved

2023-08-21 06:16:14 ERROR:Task exception was never retrieved
future: <Task finished name='w0d6mwwkndk_173' coro=<Queue.process_events() done, defined at /usr/local/lib/python3.10/dist-packages/gradio/queueing.py:343> exception=1 validation error for PredictBody
event_id
Field required [type=missing, input_value={'fn_index': 173, 'data':...on_hash': 'w0d6mwwkndk'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.1/v/missing>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 347, in process_events
client_awake = await self.gather_event_data(event)
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 220, in gather_event_data
data, client_awake = await self.get_message(event, timeout=receive_timeout)
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 456, in get_message
return PredictBody(**data), True
File "/usr/local/lib/python3.10/dist-packages/pydantic/main.py", line 159, in init
pydantic_self.pydantic_validator.validate_python(data, self_instance=pydantic_self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for PredictBody
event_id
Field required [type=missing, input_value={'fn_index': 173, 'data':...on_hash': 'w0d6mwwkndk'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.1/v/missing

I use pyg-13b and here is the error.

[Bug]: OS Error: No file named pytorch_model.bin

I'm getting an os error
No file named pytorch_model.bin in directory models
when running text generation webui:
stable-vicuna-13B-GPTQ-4bit-128g

Notebook
%cd /content
!apt-get -y install -qq aria2

!git clone -b v1.7 https://github.com/camenduru/text-generation-webui
%cd /content/text-generation-webui
!pip install -r requirements.txt
!pip install -U gradio==3.28.3

!mkdir /content/text-generation-webui/repositories
%cd /content/text-generation-webui/repositories
!git clone -b v1.2 https://github.com/camenduru/GPTQ-for-LLaMa.git
%cd GPTQ-for-LLaMa
!python setup_cuda.py install

!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/stable-vicuna-13B-GPTQ/raw/main/config.json -d /content/text-generation-webui/models/stable-vicuna-13B-GPTQ -o config.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/stable-vicuna-13B-GPTQ/raw/main/generation_config.json -d /content/text-generation-webui/models/stable-vicuna-13B-GPTQ -o generation_config.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/stable-vicuna-13B-GPTQ/raw/main/special_tokens_map.json -d /content/text-generation-webui/models/stable-vicuna-13B-GPTQ -o special_tokens_map.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/stable-vicuna-13B-GPTQ/resolve/main/tokenizer.model -d /content/text-generation-webui/models/stable-vicuna-13B-GPTQ -o tokenizer.model
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/stable-vicuna-13B-GPTQ/raw/main/tokenizer_config.json -d /content/text-generation-webui/models/stable-vicuna-13B-GPTQ -o tokenizer_config.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/stable-vicuna-13B-GPTQ/resolve/main/stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors -d /content/text-generation-webui/models/stable-vicuna-13B-GPTQ -o stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors

%cd /content/text-generation-webui
!python server.py --share --chat --wbits 4 --groupsize 128

Output

/content/text-generation-webui
2023-07-12 06:36:29 INFO:Unwanted HTTP request redirected to localhost :)
2023-07-12 06:36:32 WARNING:The gradio "share link" feature uses a proprietary executable to create a reverse tunnel. Use it with care.
2023-07-12 06:36:35.091770: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
2023-07-12 06:36:38 INFO:Loading stable-vicuna-13B-GPTQ...
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/text-generation-webui/server.py:1154 in │
│ │
│ 1151 │ │ update_model_parameters(model_settings, initial=True) # hija │
│ 1152 │ │ │
│ 1153 │ │ # Load the model │
│ ❱ 1154 │ │ shared.model, shared.tokenizer = load_model(shared.model_name │
│ 1155 │ │ if shared.args.lora: │
│ 1156 │ │ │ add_lora_to_model(shared.args.lora) │
│ 1157 │
│ │
│ /content/text-generation-webui/modules/models.py:74 in load_model │
│ │
│ 71 │ │ │ │ return None, None │
│ 72 │ │
│ 73 │ shared.args.loader = loader │
│ ❱ 74 │ output = load_func_maploader │
│ 75 │ if type(output) is tuple: │
│ 76 │ │ model, tokenizer = output │
│ 77 │ else: │
│ │
│ /content/text-generation-webui/modules/models.py:144 in huggingface_loader │
│ │
│ 141 │ │
│ 142 │ # Load the model in simple 16-bit mode by default │
│ 143 │ if not any([shared.args.cpu, shared.args.load_in_8bit, shared.args │
│ ❱ 144 │ │ model = LoaderClass.from_pretrained(Path(f"{shared.args.model_ │
│ 145 │ │ if torch.has_mps: │
│ 146 │ │ │ device = torch.device('mps') │
│ 147 │ │ │ model = model.to(device) │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factor │
│ y.py:484 in from_pretrained │
│ │
│ 481 │ │ │ ) │
│ 482 │ │ elif type(config) in cls._model_mapping.keys(): │
│ 483 │ │ │ model_class = _get_model_class(config, cls._model_mapping) │
│ ❱ 484 │ │ │ return model_class.from_pretrained( │
│ 485 │ │ │ │ pretrained_model_name_or_path, *model_args, config=con │
│ 486 │ │ │ ) │
│ 487 │ │ raise ValueError( │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:2449 │
│ in from_pretrained │
│ │
│ 2446 │ │ │ │ │ │ " to load this model from those weights." │
│ 2447 │ │ │ │ │ ) │
│ 2448 │ │ │ │ else: │
│ ❱ 2449 │ │ │ │ │ raise EnvironmentError( │
│ 2450 │ │ │ │ │ │ f"Error no file named {_add_variant(WEIGHTS_N │
│ 2451 │ │ │ │ │ │ f" {TF_WEIGHTS_NAME + '.index'} or {FLAX_WEIG │
│ 2452 │ │ │ │ │ │ f" {pretrained_model_name_or_path}." │
╰──────────────────────────────────────────────────────────────────────────────╯
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or
flax_model.msgpack found in directory models/stable-vicuna-13B-GPTQ.

Kinda slow

ik i dont know anything but is there a way to make it faster?

Is there a colab to run SD simultaneously for the sd extension?

Need a way to run SD on a different port so that the extension to generate sd images works

Api not working

How to use the api to interact with the webui with my python terminal in my local pc

[Bug]: ERROR:Task exception was never retrieved

Why am I getting this error though.
No model is loading when I try to in the Webui

WARNING:The gradio "share link" feature downloads a proprietary and unaudited blob to create a reverse tunnel. This is potentially dangerous.
bin /opt/conda/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda110.so
INFO:Loading stable-vicuna-13B-GPTQ...
INFO:Found the following quantized model: models/stable-vicuna-13B-GPTQ/stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors
INFO:Using the following device map for the quantized model:
INFO:Loaded the model in 55.13 seconds.
INFO:Loading the extension "gallery"...
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://f1bdbcf1f947d12f33.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
INFO:HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
ERROR:Task exception was never retrieved
future: <Task finished name='64ppr2666p8_90' coro=<Queue.process_events() done, defined at /opt/conda/envs/textgen/lib/python3.10/site-packages/gradio/queueing.py:343> exception=1 validation error for PredictBody
event_id
Field required [type=missing, input_value={'fn_index': 90, 'data': ...on_hash': '64ppr2666p8'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.1.2/v/missing>
Traceback (most recent call last):
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/gradio/queueing.py", line 347, in process_events
client_awake = await self.gather_event_data(event)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/gradio/queueing.py", line 220, in gather_event_data
data, client_awake = await self.get_message(event, timeout=receive_timeout)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/gradio/queueing.py", line 456, in get_message
return PredictBody(**data), True
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/pydantic/main.py", line 150, in init
pydantic_self.pydantic_validator.validate_python(data, self_instance=pydantic_self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for PredictBody
event_id
Field required [type=missing, input_value={'fn_index': 90, 'data': ...on_hash': '64ppr2666p8'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.1.2/v/missing

pyg7b stop work

ModuleNotFoundError: No module named 'transformers'

The colab doesn't smoothly run through, and finally stalls with "No module named 'transformers'. This is, as I found out due to the requirements.txt beeing outdated with two modules. See fix below.

Load new characters?

When I upload new characters, it always gives me an error. How can we fix this? Also there seems to be no local copy saved in my gdrive.

Endless queue

After clicking the "Running on public URL" link on pygmalion 13b and 7b a queue starts that never ends

Colab generates error

Colab generates error:

ValueError: Loading models/falcon-7b-instruct-GPTQ requires you to execute the
configuration file in that repo on your local machine. Make sure you have read
the code there to avoid malicious use, then set the option
trust_remote_code=True to remove this error.

Details:

2023-08-14 09:42:41 INFO:Unwanted HTTP request redirected to localhost :)
2023-08-14 09:42:44 WARNING:The gradio "share link" feature uses a proprietary executable to create a reverse tunnel. Use it with care.
2023-08-14 09:42:46.457649: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
2023-08-14 09:42:49 INFO:Loading falcon-7b-instruct-GPTQ...
2023-08-14 09:42:49 INFO:The AutoGPTQ params are: {'model_basename': 'gptq_model-4bit-64g', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True}
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/text-generation-webui/server.py:1154 in │
│ │
│ 1151 │ │ update_model_parameters(model_settings, initial=True) # hija │
│ 1152 │ │ │
│ 1153 │ │ # Load the model │
│ ❱ 1154 │ │ shared.model, shared.tokenizer = load_model(shared.model_name │
│ 1155 │ │ if shared.args.lora: │
│ 1156 │ │ │ add_lora_to_model(shared.args.lora) │
│ 1157 │
│ │
│ /content/text-generation-webui/modules/models.py:74 in load_model │
│ │
│ 71 │ │ │ │ return None, None │
│ 72 │ │
│ 73 │ shared.args.loader = loader │
│ ❱ 74 │ output = load_func_maploader │
│ 75 │ if type(output) is tuple: │
│ 76 │ │ model, tokenizer = output │
│ 77 │ else: │
│ │
│ /content/text-generation-webui/modules/models.py:288 in AutoGPTQ_loader │
│ │
│ 285 def AutoGPTQ_loader(model_name): │
│ 286 │ import modules.AutoGPTQ_loader │
│ 287 │ │
│ ❱ 288 │ return modules.AutoGPTQ_loader.load_quantized(model_name) │
│ 289 │
│ 290 │
│ 291 def ExLlama_loader(model_name): │
│ │
│ /content/text-generation-webui/modules/AutoGPTQ_loader.py:56 in │
│ load_quantized │
│ │
│ 53 │ } │
│ 54 │ │
│ 55 │ logger.info(f"The AutoGPTQ params are: {params}") │
│ ❱ 56 │ model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) │
│ 57 │ │
│ 58 │ # These lines fix the multimodal extension when used with AutoGPTQ │
│ 59 │ if hasattr(model, 'model'): │
│ │
│ /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/auto.py:79 in │
│ from_quantized │
│ │
│ 76 │ │ warmup_triton: bool = False, │
│ 77 │ │ **kwargs │
│ 78 │ ) -> BaseGPTQForCausalLM: │
│ ❱ 79 │ │ model_type = check_and_get_model_type(save_dir or model_name_o │
│ 80 │ │ quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantiz │
│ 81 │ │ keywords = {key: kwargs[key] for key in signature(quant_func). │
│ 82 │ │ return quant_func( │
│ │
│ /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_utils.py:123 in │
│ check_and_get_model_type │
│ │
│ 120 │
│ 121 │
│ 122 def check_and_get_model_type(model_dir, trust_remote_code=False): │
│ ❱ 123 │ config = AutoConfig.from_pretrained(model_dir, trust_remote_code=t │
│ 124 │ if config.model_type not in SUPPORTED_MODELS: │
│ 125 │ │ raise TypeError(f"{config.model_type} isn't supported yet.") │
│ 126 │ model_type = config.model_type │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/auto/configurati │
│ on_auto.py:947 in from_pretrained │
│ │
│ 944 │ │ config_dict, unused_kwargs = PretrainedConfig.get_config_dict( │
│ 945 │ │ has_remote_code = "auto_map" in config_dict and "AutoConfig" i │
│ 946 │ │ has_local_code = "model_type" in config_dict and config_dict[" │
│ ❱ 947 │ │ trust_remote_code = resolve_trust_remote_code( │
│ 948 │ │ │ trust_remote_code, pretrained_model_name_or_path, has_loca │
│ 949 │ │ ) │
│ 950 │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py │
│ :553 in resolve_trust_remote_code │
│ │
│ 550 │ │ │ _raise_timeout_error(None, None) │
│ 551 │ │
│ 552 │ if has_remote_code and not has_local_code and not trust_remote_cod │
│ ❱ 553 │ │ raise ValueError( │
│ 554 │ │ │ f"Loading {model_name} requires you to execute the configu │
│ 555 │ │ │ " repo on your local machine. Make sure you have read the │
│ 556 │ │ │ " set the option trust_remote_code=True to remove this e │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: Loading models/falcon-7b-instruct-GPTQ requires you to execute the
configuration file in that repo on your local machine. Make sure you have read
the code there to avoid malicious use, then set the option
trust_remote_code=True to remove this error.

unable to use google_translate_plus extension

using camenduru/text-generation-webui-colab
i used following code to download the extension:
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://github.com/Vasyanator/google_translate_plus/blob/main/requirements.txt -d /content/text-generation-webui/extensions/google_translate_plus -o requirements.txt
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://github.com/Vasyanator/google_translate_plus/blob/main/script.py -d /content/text-generation-webui/extensions/google_translate_plus -o script.py
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://github.com/Vasyanator/google_translate_plus/blob/main/settings.json -d /content/text-generation-webui/extensions/google_translate_plus -o settings.json

i used following code to load the extension：
--extensions google_translate_plus

when i try to run the model, at the stage :
2023-10-15 14:59:48 INFO:Loading the extension "google_translate_plus"...
2023-10-15 14:59:48 ERROR:Failed to load the extension "google_translate_plus".

Traceback (most recent call last):
File "/content/text-generation-webui/modules/extensions.py", line 35, in load_extensions
exec(f"import extensions.{name}.script")
File "", line 1, in
File "/content/text-generation-webui/extensions/google_translate_plus/script.py", line 1, in
{"payload":{"allShortcutsEnabled":false,"fileTree":{"":{"items":[{"name":"README.md","path":"README.md","contentType":"file"},{"name":"requirements.txt","path":"requirements.txt","contentType":"file"},{"name":"script.py","path":"script.py","contentType":"file"},{"name":"settings.json","path":"settings.json","contentType":"file"}],"totalCount":4}},"fileTreeProcessingTime":1.890233,"foldersToFetch":[],"reducedMotionEnabled":null,"repo":........
NameError: name 'false' is not defined

Simplify some instructions

Hi,

I've just tried to make it simpler by putting some values in variables:

USER="TheBloke"
MODEL="WizardCoder-Python-13B-V1.0-GPTQ"
FILES=("config.json" "generation_config.json" "special_tokens_map.json" "tokenizer.model" "tokenizer_config.json" "model.safetensors")

then make loop to download required files:
If you don't want to distinguish between raw and resolve:

%%bash

for FILE in "${FILES[@]}"; do
  !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M "https://huggingface.co/$USER/$MODEL/resolve/main/$FILE" -d "/content/text-generation-webui/models/$MODEL" -o $FILE
done

Use this if you want to distinguish between raw and resolve:

%%bash

for FILE in "${FILES[@]}"; do
  if [[ $FILE == "tokenizer.model" || $FILE == "model.safetensors" ]]; then
    !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M "https://huggingface.co/$USER/$MODEL/resolve/main/$FILE" -d "/content/text-generation-webui/models/$MODEL" -o $FILE
  else
    !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M "https://huggingface.co/$USER/$MODEL/raw/main/$FILE" -d "/content/text-generation-webui/models/$MODEL" -o $FILE
  fi
done

didn't work 😢

import os

user_name = "anon8231489123" #@param {"type": "string"}

model_name = "gpt4-x-alpaca-13b-native-4bit-128g" #@param {"type": "string"}

!apt-get -y install -qq aria2
!git clone -b v1.0 https://github.com/camenduru/text-generation-webui
%cd /content/text-generation-webui
!pip install -r requirements.txt

models_path = "/content/text-generation-webui/models/"
model_path = os.path.join(models_path, model_name)
os.makedirs(model_path, exist_ok=True)

!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/{user_name}/{model_name}/raw/main/config.json -d {model_path} -o config.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/{user_name}/{model_name}/raw/main/generation_config.json -d {model_path} -o generation_config.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/{user_name}/{model_name}/raw/main/pytorch_model.bin.index.json -d {model_path} -o pytorch_model.bin.index.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/{user_name}/{model_name}/raw/main/special_tokens_map.json -d {model_path} -o special_tokens_map.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/{user_name}/{model_name}/resolve/main/tokenizer.model -d {model_path} -o tokenizer.model
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/{user_name}/{model_name}/raw/main/tokenizer_config.json -d {model_path} -o tokenizer_config.json
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/{user_name}/{model_name}/resolve/main/gpt-x-alpaca-13b-native-4bit-128g-cuda.pt -d {model_path} -o {model_name}.pt
%cd /content/text-generation-webui
!python server.py --share --chat --wbits 4 --groupsize 128 --model {model_name}

got:

Loading gpt4-x-alpaca-13b-native-4bit-128g...
Loading model ...
^C

didn't try on colab pro, is there a way to optimize this?

edit: i just found this:

tsumeone/gpt4-x-alpaca-13b-native-4bit-128g-cuda

Readme Assets

falcon-7b-instruct-GPTQ-4bit.ipynb

INFO:Gradio HTTP request redirected to localhost :)
WARNING:trust_remote_code is enabled. This is dangerous.
WARNING:The gradio "share link" feature uses a proprietary executable to create a reverse tunnel. Use it with care.
2023-06-06 21:55:46.220247: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
INFO:Loading falcon-7b-instruct-GPTQ...
INFO:The AutoGPTQ params are: {'model_basename': 'gptq_model-4bit-64g', 'device': 'cuda:0', 'use_triton': False, 'use_safetensors': True, 'trust_remote_code': True, 'max_memory': None, 'quantize_config': None}
WARNING:CUDA extension not installed.
WARNING:The safetensors archive passed at models/falcon-7b-instruct-GPTQ/gptq_model-4bit-64g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
WARNING:can't get model's sequence length from model config, will set to 4096.
WARNING:RWGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention.
WARNING:RWGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp.
INFO:Loaded the model in 36.17 seconds.

INFO:Loading the extension "gallery"...
Running on local URL:  http://127.0.0.1:7860/
Running on public URL: https://ccd3202fc68d7be036.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
ERROR:Task exception was never retrieved
future: <Task finished name='hszag9ma4as_118' coro=<Queue.process_events() done, defined at /usr/local/lib/python3.10/dist-packages/gradio/queueing.py:343> exception=ValidationError(model='PredictBody', errors=[{'loc': ('data',), 'msg': 'field required', 'type': 'value_error.missing'}])>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 347, in process_events
    client_awake = await self.gather_event_data(event)
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 220, in gather_event_data
    data, client_awake = await self.get_message(event, timeout=receive_timeout)
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 456, in get_message
    return PredictBody(**data), True
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PredictBody
data
  field required (type=value_error.missing)
Output generated in 15.99 seconds (0.94 tokens/s, 15 tokens, context 67, seed 1207267814)

The problem didn't go away even after the fix

I checked redmoond puffin 13b and vicuna 13b the problems remained that the messages were repeated, I changed the instruct to llama v2, but nothing helped. During a conversation with a character after several messages, he just repeats the messages each time.

New ehartford/dolphin-2.0-mistral-7b

The new [ehartford/dolphin-2.0-mistral-7b] is ready, would be great if you can upgrade the google colab book, thank you very much!

SyntaxError: illegal target for annotation

actually I am trying to run in on kaggle but it's giving me this error can somebody help me out

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
cab6f1|OK  |    14KiB/s|/content/text-generation-webui/models/pyg-13b-4bit-128g/tokenizer_config.json

Status Legend:
(OK):download completed.
[#846e7e 6.7GiB/6.9GiB(96%) CN:16 DL:233MiB]0m]m
Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
846e7e|OK  |   226MiB/s|/content/text-generation-webui/models/pyg-13b-4bit-128g/4bit-128g.safetensors

Status Legend:
(OK):download completed.
/content/text-generation-webui
Gradio HTTP request redirected to localhost :)
Traceback (most recent call last):
  File "/content/text-generation-webui/server.py", line 43, in <module>
    import modules.extensions as extensions_module
  File "/content/text-generation-webui/modules/extensions.py", line 6, in <module>
    import extensions
  File "/opt/conda/lib/python3.10/site-packages/extensions/__init__.py", line 7
    "bufferView": 5,
    ^^^^^^^^^^^^
SyntaxError: illegal target for annotation

message repeat of characters

When talking with the bot, I have repeated messages, nothing helps, even changing the temperature and other settings. After updating I got this error. I use redmond puffin 13b. Any help please? or advice

pyg7b and other models stop work

5.5.2023 everything is okey, now i have this error. pyg7b model.

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/text-generation-webui/server.py:927 in │
│ │
│ 924 │ │ }) │
│ 925 │ │
│ 926 │ # Launch the web UI │
│ ❱ 927 │ create_interface() │
│ 928 │ while True: │
│ 929 │ │ time.sleep(0.5) │
│ 930 │ │ if shared.need_restart: │
│ │
│ /content/text-generation-webui/server.py:514 in create_interface │
│ │
│ 511 │ if shared.args.extensions is not None and len(shared.args.extensio │
│ 512 │ │ extensions_module.load_extensions() │
│ 513 │ │
│ ❱ 514 │ with gr.Blocks(css=ui.css if not shared.is_chat() else ui.css + ui │
│ 515 │ │ │
│ 516 │ │ # Create chat mode interface │
│ 517 │ │ if shared.is_chat(): │
│ │
│ /usr/local/lib/python3.10/dist-packages/gradio/blocks.py:1285 in exit │
│ │
│ 1282 │ │ │ Context.root_block = None │
│ 1283 │ │ else: │
│ 1284 │ │ │ self.parent.children.extend(self.children) │
│ ❱ 1285 │ │ self.config = self.get_config_file() │
│ 1286 │ │ self.app = routes.App.create_app(self) │
│ 1287 │ │ self.progress_tracking = any(block_fn.tracks_progress for blo │
│ 1288 │ │ self.exited = True │
│ │
│ /usr/local/lib/python3.10/dist-packages/gradio/blocks.py:1261 in │
│ get_config_file │
│ │
│ 1258 │ │ │ │ assert isinstance(block, serializing.Serializable) │
│ 1259 │ │ │ │ block_config["serializer"] = serializer │
│ 1260 │ │ │ │ block_config["info"] = { │
│ ❱ 1261 │ │ │ │ │ "input": list(block.input_api_info()), # type: i │
│ 1262 │ │ │ │ │ "output": list(block.output_api_info()), # type: │
│ 1263 │ │ │ │ } │
│ 1264 │ │ │ config["components"].append(block_config) │
│ │
│ /usr/local/lib/python3.10/dist-packages/gradio_client/serializing.py:40 in │
│ input_api_info │
│ │
│ 37 │ # For backwards compatibility │
│ 38 │ def input_api_info(self) -> tuple[str, str]: │
│ 39 │ │ api_info = self.api_info() │
│ ❱ 40 │ │ return (api_info["serialized_input"][0], api_info["serialized_ │
│ 41 │ │
│ 42 │ # For backwards compatibility │
│ 43 │ def output_api_info(self) -> tuple[str, str]: │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyError: 'serialized_input'

and here is vicuna-13B-GPTQ
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/text-generation-webui/server.py:591 in │
│ │
│ 588 │ │ shared.gradio['interface'].launch(prevent_thread_lock=True, sh │
│ 589 │
│ 590 │
│ ❱ 591 create_interface() │
│ 592 │
│ 593 while True: │
│ 594 │ time.sleep(0.5) │
│ │
│ /content/text-generation-webui/server.py:320 in create_interface │
│ │
│ 317 │ if shared.args.extensions is not None and len(shared.args.extensio │
│ 318 │ │ extensions_module.load_extensions() │
│ 319 │ │
│ ❱ 320 │ with gr.Blocks(css=ui.css if not shared.is_chat() else ui.css + ui │
│ 321 │ │ if shared.is_chat(): │
│ 322 │ │ │ shared.gradio['Chat input'] = gr.State() │
│ 323 │ │ │ with gr.Tab("Text generation", elem_id="main"): │
│ │
│ /usr/local/lib/python3.10/dist-packages/gradio/blocks.py:1200 in exit │
│ │
│ 1197 │ │ │ Context.root_block = None │
│ 1198 │ │ else: │
│ 1199 │ │ │ self.parent.children.extend(self.children) │
│ ❱ 1200 │ │ self.config = self.get_config_file() │
│ 1201 │ │ self.app = routes.App.create_app(self) │
│ 1202 │ │ self.progress_tracking = any(block_fn.tracks_progress for blo │
│ 1203 │ │ self.exited = True │
│ │
│ /usr/local/lib/python3.10/dist-packages/gradio/blocks.py:1176 in │
│ get_config_file │
│ │
│ 1173 │ │ │ │ assert isinstance(block, serializing.Serializable) │
│ 1174 │ │ │ │ block_config["serializer"] = serializer │
│ 1175 │ │ │ │ block_config["info"] = { │
│ ❱ 1176 │ │ │ │ │ "input": list(block.input_api_info()), # type: i │
│ 1177 │ │ │ │ │ "output": list(block.output_api_info()), # type: │
│ 1178 │ │ │ │ } │
│ 1179 │ │ │ config["components"].append(block_config) │
│ │
│ /usr/local/lib/python3.10/dist-packages/gradio_client/serializing.py:40 in │
│ input_api_info │
│ │
│ 37 │ # For backwards compatibility │
│ 38 │ def input_api_info(self) -> tuple[str, str]: │
│ 39 │ │ api_info = self.api_info() │
│ ❱ 40 │ │ return (api_info["serialized_input"][0], api_info["serialized_ │
│ 41 │ │
│ 42 │ # For backwards compatibility │
│ 43 │ def output_api_info(self) -> tuple[str, str]: │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyError: 'serialized_input'

problem - the character doesn't say anything

Traceback (most recent call last):
File "/content/text-generation-webui/modules/callbacks.py", line 56, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
File "/content/text-generation-webui/modules/text_generation.py", line 311, in generate_with_callback
shared.model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py", line 443, in generate
return self.model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1642, in generate
return self.sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2724, in sample
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 697, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 426, in forward
hidden_states = self.mlp(hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 210, in forward
[F.linear(x, gate_proj_slices[i]) for i in range(self.config.pretraining_tp)], dim=-1
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 210, in
[F.linear(x, gate_proj_slices[i]) for i in range(self.config.pretraining_tp)], dim=-1
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1848x5120 and 13824x640)
Output generated in 4.45 seconds (0.00 tokens/s, 0 tokens, context 1848, seed 1194575447)

Unable to run the API extention

After checking the "api" option under the Session tab, I clicked the "Apply flags/extension and Restart" button as shown below:

This generated the following logs in the colab console:

> 2023-09-06 16:30:28 WARNING:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.
2023-09-06 16:30:28 INFO:Loaded the model in 51.97 seconds.

2023-09-06 16:30:28 INFO:Loading the extension "gallery"...
Running on local URL:  http://127.0.0.1:7860/
Running on public URL: https://<my_old_live_link>/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

---------------------------------<Below is the log after I restarted the the server with api option>---------------------------

ERROR:    Exception in ASGI application

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/websockets/websockets_impl.py", line 247, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 149, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 75, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 341, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 82, in app
    await func(session)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 289, in app
    await dependant.call(**values)
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 536, in join_queue
    session_info = await asyncio.wait_for(
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/dist-packages/starlette/websockets.py", line 133, in receive_json
    self._raise_on_disconnect(message)
  File "/usr/local/lib/python3.10/dist-packages/starlette/websockets.py", line 105, in _raise_on_disconnect
    raise WebSocketDisconnect(message["code"])
starlette.websockets.WebSocketDisconnect: 1012
Closing server running on port: 7860
2023-09-06 16:31:32 INFO:Loading the extension "gallery"...
2023-09-06 16:31:32 ERROR:Failed to load the extension "api".
Traceback (most recent call last):
  File "/content/text-generation-webui/modules/extensions.py", line 40, in load_extensions
    extension.setup()
  File "/content/text-generation-webui/extensions/api/script.py", line 10, in setup
    if shared.public_api:
AttributeError: module 'modules.shared' has no attribute 'public_api'
Starting API at http://127.0.0.1:5000/api
Running on local URL:  http://127.0.0.1:7860/
Running on public URL: https://<my_new_live_link>/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
Output generated in 7.95 seconds (4.90 tokens/s, 39 tokens, context 45, seed 932200172)

I tried the following code to get the response after that. However, I am getting 404 error.
Could you please tell me how do I start the API correctly and get the responses?


# For local streaming, the websockets are hosted without ssl - http://
HOST = '<my_new_live_link>'
URI = f'https://{HOST}/api/v1/generate'


# For reverse-proxied streaming, the remote will likely host with ssl - https://
# URI = 'https://your-uri-here.trycloudflare.com/api/v1/generate'


def run(prompt):
    request = {
        'prompt': prompt,
        'max_new_tokens': 250,
        'auto_max_new_tokens': False,
        'max_tokens_second': 0,

        # Generation params. If 'preset' is set to different than 'None', the values
        # in presets/preset-name.yaml are used instead of the individual numbers.
        'preset': 'None',
        'do_sample': True,
        'temperature': 0.7,
        'top_p': 0.1,
        'typical_p': 1,
        'epsilon_cutoff': 0,  # In units of 1e-4
        'eta_cutoff': 0,  # In units of 1e-4
        'tfs': 1,
        'top_a': 0,
        'repetition_penalty': 1.18,
        'repetition_penalty_range': 0,
        'top_k': 40,
        'min_length': 0,
        'no_repeat_ngram_size': 0,
        'num_beams': 1,
        'penalty_alpha': 0,
        'length_penalty': 1,
        'early_stopping': False,
        'mirostat_mode': 0,
        'mirostat_tau': 5,
        'mirostat_eta': 0.1,
        'guidance_scale': 1,
        'negative_prompt': '',

        'seed': -1,
        'add_bos_token': True,
        'truncation_length': 2048,
        'ban_eos_token': False,
        'skip_special_tokens': True,
        'stopping_strings': []
    }
    print(URI)
    response = requests.post(URI, json=request)
    print(response)

    if response.status_code == 200:
        result = response.json()['results'][0]['text']
        print(prompt + result)


if __name__ == '__main__':
    prompt = "In order to make homemade bread, follow these steps:\n1)"
    run(prompt)```

Will llama2 70B be supported in future?

Hi there, many thanks for this wonderful sharing!
Just wonder will there be a 70b running on colab?
Have tried Petals' work however the chat worked not that right.
Best,

How can i make the collab connect to API like SillyTavern or TavernAI?

i was tring to launch it with --api and --publicapi flags but they not work. Can you help me?

why i can't use downloaded models from huggingface?

in the wbui there an option to download a model from hf but it won't work it says no model loaded?

Something wrong with the colab

Hi camenduru
Firt of all thanks for your work.
My problem is , everi tim i want to run any modell in colab i got the same issue.
vicuna works fine but pyg 7b or pyg 13b not and the wizard unces not working too.

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/text-generation-webui/server.py:44 in │
│ │
│ 41 from PIL import Image │
│ 42 │
│ 43 import modules.extensions as extensions_module │
│ ❱ 44 from modules import chat, shared, training, ui │
│ 45 from modules.html_generator import chat_html_wrapper │
│ 46 from modules.LoRA import add_lora_to_model │
│ 47 from modules.models import load_model, load_soft_prompt, unload_model │
│ │
│ /content/text-generation-webui/modules/training.py:13 in │
│ │
│ 10 import torch │
│ 11 import transformers │
│ 12 from datasets import Dataset, load_dataset │
│ ❱ 13 from peft import (LoraConfig, get_peft_model, prepare_model_for_int8_t │
│ 14 │ │ │ │ set_peft_model_state_dict) │
│ 15 │
│ 16 from modules import shared, ui │
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/init.py:22 in │
│ │
│ 19 │
│ 20 version = "0.4.0.dev0" │
│ 21 │
│ ❱ 22 from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CON │
│ 23 from .peft_model import ( │
│ 24 │ PeftModel, │
│ 25 │ PeftModelForCausalLM, │
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/mapping.py:16 in │
│ │
│ 13 # See the License for the specific language governing permissions and │
│ 14 # limitations under the License. │
│ 15 │
│ ❱ 16 from .peft_model import ( │
│ 17 │ PeftModel, │
│ 18 │ PeftModelForCausalLM, │
│ 19 │ PeftModelForSeq2SeqLM, │
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:31 in │
│ │
│ 28 from transformers.modeling_outputs import SequenceClassifierOutput, T │
│ 29 from transformers.utils import PushToHubMixin │
│ 30 │
│ ❱ 31 from .tuners import ( │
│ 32 │ AdaLoraModel, │
│ 33 │ AdaptionPromptModel, │
│ 34 │ LoraModel, │
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/tuners/init.py:21 in │
│ │
│ │
│ 18 # limitations under the License. │
│ 19 │
│ 20 from .adaption_prompt import AdaptionPromptConfig, AdaptionPromptModel │
│ ❱ 21 from .lora import LoraConfig, LoraModel │
│ 22 from .adalora import AdaLoraConfig, AdaLoraModel │
│ 23 from .p_tuning import PromptEncoder, PromptEncoderConfig, PromptEncoder │
│ 24 from .prefix_tuning import PrefixEncoder, PrefixTuningConfig │
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py:735 in │
│ │
│ 732 │ │ │ │ result += output │
│ 733 │ │ │ return result │
│ 734 │ │
│ ❱ 735 │ class Linear4bit(bnb.nn.Linear4bit, LoraLayer): │
│ 736 │ │ # Lora implemented in a dense layer │
│ 737 │ │ def init( │
│ 738 │ │ │ self, │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'bitsandbytes.nn' has no attribute 'Linear4bit'

i dont know its something wit me or the code.

Thanks for your atention and work