Overview
The documentation states that it is possible to use any OpenAI-compatible API.
As I have a local working installation of text-generation-webui
I attempted to use this with my already installed models, using the OpenAI compatible API it provides (text-generation-webui OpenAI Extension), but encountered issues with chat completion and file embeddings. I was only able to manually fix the chat completion.
Currently there is no documentation at all how to approach it. and I do not know if my method would be the correct one.
Changes made for deployment
Not needing the llm-api
(as I am providing my own) I removed that from the docker compose file.
After skimming through the code, to see what I potentially need to change, I identified the envoy configuration for proxying / combining the several services.
To be able to have a different configuration, I used the following docker service instead, which maps my own config file:
# Handles routing between the application, barricade and the LLM API
envoy:
image: ghcr.io/purton-tech/bionicgpt-envoy:1.0.3
ports:
- "7800:7700"
- "7801:7701"
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
I kept the envoy.yaml
file which is provided in the .devcontainer
mostly unchanged besides manually running the sed
commands which are defined in the Earthfile
.
I only changed the last section for the LLM API besides that. My changed configuration is as follows:
# The LLM API
- name: llm-api
connect_timeout: 10s
type: strict_dns
lb_policy: round_robin
dns_lookup_family: V4_ONLY
load_assignment:
cluster_name: llm-api
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: host.docker.internal
port_value: 5001
I am using host.docker.internal
as text-generation-webui
is running on the host system, with 5001
being the default port for the OpenAI compatible API.
With these changes, the docker-compose stack correctly boots and all components are seemingly accessible (using the default auth URL, I can reach the main UI).
Problems Occuring
- When using the Chat Console and sending a message, the UI is stuck at
Processing prompt...
, it is not possible to cancel this process
- In the Network view of the browser, I can see that a
completions
API request is correctly done.
- In the console log of
text-generation-webui
I see that the request is processed, and a response is generated
- When using Team Documents, you can upload files and it will start creating embeddings, but at the end of the progress it will show that all embeddings have failed
Expectation
Both chat completion and embeddings should work.
The cause of the problem
I do not know why the embeddings do not work, when manually calling the API (also using the envoy proxy) a correct response gets returned.
But I did find the problem for the chat completion. Which is casued by the response containing \r
In the file crates/asset-pipeline/web-components/streaming-chat.ts
lines are currently split by just \n
:
https://github.com/purton-tech/bionicgpt/blob/91ba40467d011b0d7fc998e78c85f2a663812fae/crates/asset-pipeline/web-components/streaming-chat.ts#L39
Replacing this with:
const arr = value.split(/\r?\n/);
fixes the problem of chat completion. (Which I locally verified by creating an override for the generated index.js)
Conclusion
Chat completion with text-generation-webui
as LLM backend doesn't work (at least on Windows), as the Chat responses include carriage returns [which might be a issue specific to text-generation-webui
]. Embeddings also do not work, although I could not identify what the cause for that is, as neither text-generation-webui
nor BionicGPT log anything.