bionic-gpt / bionic-gpt Goto Github PK

BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality

Home Page: https://bionic-gpt.com

License: Apache License 2.0

Shell 0.64% Dockerfile 0.07% Rust 78.97% HTML 7.10% CSS 3.32% TypeScript 3.74% SCSS 1.94% PLpgSQL 1.61% Earthly 1.86% JavaScript 0.48% Just 0.28%

architecture full-stack llmops llms rust

bionic-gpt's People

Stargazers

Watchers

Forkers

weinyn netroscript jessicazr86 codingonion skrashevich jesusoctavioas kulbinderdio samjaninf octag0no anguszh talentaa whiteadam hbcbh1999 ccdatatraits bizai3000 johnkeng kekewind haj papodaca danmarauda joe2hpimn emillindfors eastwooder ksaritek sdplatt rajendharmendra linecode clic-ethiopia einfachalf oceanbio robertsmaoui duan-nguyen kahirokunn stemmun getong sosbgit kkcoder sireskay jhalljhall eonsonicblue jjlee6496 kbisneau webrulon tangtc1981 dearborn-open-ai seniaxz truehumandesign hsundar-git datxuantran munitras mukesh-chaudhari dnmorris7 guoqiangjia oliverbob gongwork forks sorokinvld felbdogg tostino ryangorman1987 zined henri-edh dolife lliwcwill elven2016 frost19k matandomuertos capdevc drolu martian-coder syaikhipin zperzendetta gaecom rossman22590 mounta11n mthola moymoussan haoyang09 azrailbeat kivalm drasaadmoosa shabbirhasan1 gitrjaa nrvo schmik touristshaun goodwillken arunkumar-patange asherbond centragate m-c-frank mdwoicke sqlexceptionphil silverdevelopper squareandcompass augml clustercrypto gqadonis youngsecurity thigas88

bionic-gpt's Issues

Storage and use of chat history to answer questions

Currently, no context or history of chat is used for subsequent questions, hence every questions needs to contain the full context. Can and how would we add the ability to maintain context throughout the chat session.

Example.

When was the LIfetime learning act 2023 enacted?

RESPONSE

Which act did it amend?

Use hugging face text embeddings inference

https://github.com/huggingface/text-embeddings-inference

How can we set default model?

model=BAAI/bge-small-en-v1.5
volume=$PWD/data

docker run -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-0.2.2 --model-id $model

curl 127.0.0.1:8080/embed \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'

Medium Samples

Book Themes

Todo

Split the website into zola and mdbooks
Generate docs with mdbooks
Some how combine the 2.

Allow users to add models from LocalAI available

The model gallery allows us to see available models. https://localai.io/models/

We could provide a UI for this, ideally showing download progress.

We can do this via the API.

Call it system prompt and make it non-mandatory
What does localai do with /chat/completions
Remove errors from history

Move to LLama 2

Get the container
Change llm-api to local-ai
Update the migrations
Try it

Implement Testing UI

Allow people to enter a series of prompts and run them.

Allow people to evaluate and score the results.

Keep track of model, prompt, chunking strategy etc.

Use Cases
Test Runs
Run unstructured process in the background
Detach prompt and dataset.
User can select dataset at the console.
Use case has 1 or more questions - Attach to a prompt and a dataset.
Test Runs capture all variables.
Run in background
After the run user can evaluate the results.

Data Pipelines

Upload

Store/retrieve document meta data

Store meta data with chunks in DB, document name, page number

When questions for RAG are answered allow user to see the original source/provide details of the original source

Markdown in the console

Is somewhat erratic.

Models see to blow up if there are error in the history.

Model for Default prompt is bge-small-en-v1.5

By having this as the default model chat doesn't return anything.
SHould default to llama model

Small Website change

Change to text

data governance to 'your data'

Add Oobabooga API to Models

How do I add my locally running Oobabooga textgen installation with api and openapi extensions to the Model Setup tab? Thanks.

Crowd Funding

Look at sites, such as

https://www.crowdcube.com/

https://www.syndicateroom.com/

https://www.seedrs.com/

Better error handling for oversize files

Currently the form will just stick, better to redirect with a notification.

Document uploading not working

Downloaded latest config file
Created a new embedding https://api.openai.com/v1/embeddings

tried to upload pdf document, getting the following errors showing in log (nothing on front end)

fine-tuna-barricade-1 | [2023-11-01T11:38:55Z INFO actix_web::middleware::logger] 172.23.0.4 "POST /app/team/2/dataset/1/doc_upload HTTP/1.1" 200 0 "-" "-" 0.000627
fine-tuna-barricade-1 | [2023-11-01T11:38:55Z INFO sqlx::query] /* SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 94.555µs
fine-tuna-app-1 | 2023-11-01T11:38:55.147453Z INFO axum_server::documents::upload_doc: Sending document to unstructured
fine-tuna-unstructured-1 | 2023-11-01 11:38:56,767 unstructured_api DEBUG pipeline_api input params: {"filename": "MSFT_FY23Q4_10K.pdf", "response_type": "application/json", "m_coordinates": [], "m_encoding": [], "m_hi_res_model_name": [], "m_include_page_breaks": [], "m_ocr_languages": [], "m_pdf_infer_table_structure": [], "m_skip_infer_table_types": [], "m_strategy": [], "m_xml_keep_tags": [], "languages": ["eng"], "m_chunking_strategy": ["by_title"], "m_multipage_sections": ["true"], "m_combine_under_n_chars": ["500"], "new_after_n_chars": ["1000"]}
fine-tuna-unstructured-1 | 2023-11-01 11:38:56,768 unstructured_api DEBUG filetype: application/pdf
fine-tuna-unstructured-1 | 2023-11-01 11:38:56,835 unstructured_api DEBUG partition input data: {"content_type": "application/pdf", "strategy": "auto", "ocr_languages": null, "coordinates": false, "pdf_infer_table_structure": false, "include_page_breaks": false, "encoding": null, "model_name": null, "xml_keep_tags": false, "skip_infer_table_types": ["pdf", "jpg", "png"], "languages": ["eng"], "chunking_strategy": "by_title", "multipage_sections": true, "combine_under_n_chars": 500, "new_after_n_chars": 1000}
fine-tuna-unstructured-1 | 2023-11-01 11:38:57,253 172.23.0.8:58440 POST /general/v0/general HTTP/1.1 - 500 Internal Server Error
fine-tuna-app-1 | 2023-11-01T11:38:57.254773Z ERROR axum_server::errors: response="status = 422 Unprocessable Entity, message = error decoding response body: invalid type: map, expected a sequence at line 1 column 0"
MSFT_FY23Q4_10K.pdf

Documentation for using an alternative Open AI compatible API / not working with text-generation-webui

Overview

The documentation states that it is possible to use any OpenAI-compatible API.
As I have a local working installation of text-generation-webui I attempted to use this with my already installed models, using the OpenAI compatible API it provides (text-generation-webui OpenAI Extension), but encountered issues with chat completion and file embeddings. I was only able to manually fix the chat completion.

Currently there is no documentation at all how to approach it. and I do not know if my method would be the correct one.

Changes made for deployment

Not needing the llm-api (as I am providing my own) I removed that from the docker compose file.

After skimming through the code, to see what I potentially need to change, I identified the envoy configuration for proxying / combining the several services.

To be able to have a different configuration, I used the following docker service instead, which maps my own config file:

  # Handles routing between the application, barricade and the LLM API
  envoy:
    image: ghcr.io/purton-tech/bionicgpt-envoy:1.0.3
    ports:
      - "7800:7700"
      - "7801:7701"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml

I kept the envoy.yaml file which is provided in the .devcontainer mostly unchanged besides manually running the sed commands which are defined in the Earthfile.

I only changed the last section for the LLM API besides that. My changed configuration is as follows:

  # The LLM API
  - name: llm-api
    connect_timeout: 10s
    type: strict_dns
    lb_policy: round_robin
    dns_lookup_family: V4_ONLY
    load_assignment:
      cluster_name: llm-api
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: host.docker.internal
                port_value: 5001

I am using host.docker.internal as text-generation-webui is running on the host system, with 5001 being the default port for the OpenAI compatible API.

With these changes, the docker-compose stack correctly boots and all components are seemingly accessible (using the default auth URL, I can reach the main UI).

Problems Occuring

When using the Chat Console and sending a message, the UI is stuck at Processing prompt..., it is not possible to cancel this process
- In the Network view of the browser, I can see that a completions API request is correctly done.
- In the console log of text-generation-webui I see that the request is processed, and a response is generated
When using Team Documents, you can upload files and it will start creating embeddings, but at the end of the progress it will show that all embeddings have failed

Expectation

Both chat completion and embeddings should work.

The cause of the problem

I do not know why the embeddings do not work, when manually calling the API (also using the envoy proxy) a correct response gets returned.

But I did find the problem for the chat completion. Which is casued by the response containing \r

In the file crates/asset-pipeline/web-components/streaming-chat.ts lines are currently split by just \n :

https://github.com/purton-tech/bionicgpt/blob/91ba40467d011b0d7fc998e78c85f2a663812fae/crates/asset-pipeline/web-components/streaming-chat.ts#L39

Replacing this with:

const arr = value.split(/\r?\n/);

fixes the problem of chat completion. (Which I locally verified by creating an override for the generated index.js)

Conclusion

Chat completion with text-generation-webui as LLM backend doesn't work (at least on Windows), as the Chat responses include carriage returns [which might be a issue specific to text-generation-webui]. Embeddings also do not work, although I could not identify what the cause for that is, as neither text-generation-webui nor BionicGPT log anything.

MistralAI

Implement Mistral as the LLM

https://huggingface.co/mistralai

https://docs.mistral.ai/quickstart/

UI/UX Mixed Bag

Latest version (3/10) not working

Tried a query with out loading up any datasets and chat returns error, cannot see any error thrown on the backend.

Added a largish file and the deleted it before it had processed
Added a dataset, very small. It was stuck in processing. Never changed status.
Looked at backend 5 minutes later and it looks to be processing some flarge ile

fine-tuna-barricade-1 | [2023-10-03T17:22:36Z INFO actix_web::middleware::logger] 172.21.0.4 "GET /app/team/doc_status/3 HTTP/1.1" 200 0 "-" "-" 0.000807
fine-tuna-barricade-1 | [2023-10-03T17:22:36Z INFO sqlx::query] /* SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 413.872µs
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:38.913998Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:38.914602Z INFO embeddings_job: Processing embedding id 39
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:50.747231Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:50.747865Z INFO embeddings_job: Processing embedding id 40
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:03.201833Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:03.202704Z INFO embeddings_job: Processing embedding id 41
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:15.077959Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:15.078916Z INFO embeddings_job: Processing embedding id 42
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:27.573897Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:27.574657Z INFO embeddings_job: Processing embedding id 43
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:39.289973Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:39.290602Z INFO embeddings_job: Processing embedding id 44

Utf8Error trying to upload file

On MacMini (intel)

Error msg

fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO sqlx::query] /* SQLx ping /; rows affected: 0, rows returned: 0, elapsed: 822.209µs
fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO sqlx::query] SELECT id, user_id, session_verifier, …; rows affected: 0, rows returned: 1, elapsed: 1.085ms
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | SELECT
fine-tuna-barricade-1 | id,
fine-tuna-barricade-1 | user_id,
fine-tuna-barricade-1 | session_verifier,
fine-tuna-barricade-1 | otp_code_confirmed,
fine-tuna-barricade-1 | otp_code_encrypted,
fine-tuna-barricade-1 | otp_code_attempts,
fine-tuna-barricade-1 | otp_code_sent
fine-tuna-barricade-1 | FROM
fine-tuna-barricade-1 | sessions
fine-tuna-barricade-1 | WHERE
fine-tuna-barricade-1 | id = $1
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /app/team/1/dataset/1/doc_upload HTTP/1.1" 200 0 "-" "-" 0.003813
fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO sqlx::query] / SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 454.951µs
fine-tuna-app-1 | 2023-09-07T08:47:13.777191Z INFO axum_server::documents::upload_doc: Sending document to unstructured
fine-tuna-unstructured-1 | 2023-09-07 08:47:13,831 unstructured_api DEBUG pipeline_api input params: {"request": "<starlette.requests.Request object at 0x7fb7a1891550>", "filename": "test-text.txt", "file_content_type": "text/plain", "response_type": "application/json", "m_coordinates": [], "m_encoding": [], "m_hi_res_model_name": [], "m_include_page_breaks": [], "m_ocr_languages": [], "m_pdf_infer_table_structure": [], "m_skip_infer_table_types": [], "m_strategy": [], "m_xml_keep_tags": []}
fine-tuna-unstructured-1 | 2023-09-07 08:47:13,852 unstructured_api DEBUG partition input data: {"content_type": "text/plain", "strategy": "auto", "ocr_languages": "eng", "coordinates": false, "pdf_infer_table_structure": false, "include_page_breaks": false, "encoding": null, "model_name": null, "xml_keep_tags": false, "skip_infer_table_types": ["pdf", "jpg", "png"]}
fine-tuna-unstructured-1 | 2023-09-07 08:47:17,821 172.18.0.6:52640 POST /general/v0/general HTTP/1.1 - 200 OK
fine-tuna-app-1 | 2023-09-07T08:47:17.827533Z INFO axum_server::documents::upload_doc: Generating embeddings
fine-tuna-app-1 | thread 'tokio-runtime-worker' panicked at 'called Result::unwrap() on an Err value: Utf8Error { valid_up_to: 1022, error_len: None }', crates/axum-server/src/open_api.rs:51:10

test-text.txt

Integrate with cria

They have docker images.

https://github.com/AmineDiro/cria

Llama2 broken

comdockerdevenvironmentscode-app-1 | 2023-10-11T12:08:19.262865Z ERROR axum_server::errors: response="status = 422 Unprocessable Entity, message = error sending request for url (http://local-ai:8080/v1/embeddings): error trying to connect: dns error: failed to lookup address information: Try again"

File upload will timeout after 1 minute

Currently we timeout after 1 minute. https://github.com/purton-tech/fine-tuna/blob/main/.devcontainer/envoy.yaml#L45

Options are

Convert to a batch process (but how do we do this in docker compose)
Make the upload time unlimited
Use tokio to run the process in the background.

Multiplatform fast inferrence - Is it possible?

The Problem

We want users to be able to test the system on hardware they already have. Give them the ability to do a proof of concept on-premise. Users may have the following setups

Windows x86
MacOs (Intel)
MacOs (Apple Silicon)
Linux

We want a minimal impact on the users machine so ideally install via docker or perhaps an executable.

Current Solution

We use a docker-compose.yml and the user cuts and pastes it to their local machine and does docker-compose up.

This is nice because we use the same containers for a PoC as we would for deployment to production.

This has been testing on Linux and works well.

The LocalAI API we use also recommends docker https://localai.io/basics/getting_started/ Although for Apple Silicon they recommend building from scratch.

Steps to reproduce

Try this, we'll run localai on it's own

docker run -it --rm -p 8080:8080 ghcr.io/purton-tech/fine-tuna-model-api

The following just prints out the models we have loaded just to see if it is running

curl http://localhost:8080/v1/models

Here we do some text generation (fans should spin up and takes a while)

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j",
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'

Test embeddings work

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Your text string goes here",
    "model": "text-embedding-ada-002"
  }'

Steps to reproduce (windows)

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d "{ \"model\": \"ggml-gpt4all-j\", \"prompt\": \"A long time ago in a galaxy far, far away\", \"temperature\": 0.7 }"

curl http://localhost:8080/v1/embeddings -H "Content-Type: application/json" -d "{\"input\": \"Your text string goes here\",\"model\": \"text-embedding-ada-002\" }"

Minimum Hardware Requirements

Can we get a minimum hardware spec for each platform where inference is fast enough to give a good user experience?

Hardware we've tested

OS	Architecture	Processor	Ram	Inference	Embeddings
PopOs (Linux)	x86	AMD 2700x 8 Core	16gb	Usable	Working
MacOs	x86	2.8GHz dual core i5	16gb	Very Slow	Working
Windows 10	x86	i3-1005G1 @ 1.2GHz	8gb	Not Working	Working but slow
Windows 11	x86	i5-2400 CPU @ 3.10GHz	8gb	Not Working	Not Working

Sanity Check

There are a few local GPT projects, how do they handle installation

Private GPT - https://github.com/imartinez/privateGPT You have to have a C++ compiler and python installed. Then build it.
GPT4All - https://gpt4all.io/index.html - Has installers for each platform

Areas of Investigation

Docker supports buildx where we build a container for each platform. So if we build a container for Apple Silicon does it then run native?
LocalAI has binaries for Darwin https://github.com/go-skynet/LocalAI/releases/tag/v1.25.0 not sure if these are for Intel or Apple Silicon, do they run fast?
GPT4All has binaries for Mac, what kind of performance do they get?

General UX issue/changes

Embeddings

What strategy is cria using https://github.com/AmineDiro/cria
https://github.com/purton-tech/bionicgpt/blob/main/crates/open-api/src/lib.rs is hard coded
How many dimensions hould we support?

Add web integration tests to CI/CD pipeline

System Setup

First user is sys admin, they can setup the model, i.e. edit the llama one.
Set EXTERNAL API secret in github
Set embeddings with env var

Mock Unstructured API

Use https://github.com/alexliesenfeld/httpmock

General

Problems

The model is huge
Unstructured is huge

When commiting website only changes can we not run full build script

Can we use HuggingFace Text Generation Infererence

They support batch continuation and cpu so our local and prod stacks would match.

https://github.com/huggingface/text-generation-inference

Integrate with OIDC

How to manage end session endpoint - oauth2-proxy/oauth2-proxy#2372

On registration set up per team models and prompts.

When the user registers or creates a team set up the default model and templates.

Prompts should be connected to models.

A prompt can be connected to Zero, All or selected datasets.
Tenancy isolation, after integration test, I seel all prompts
A prompt is connected to a model
Update prompt screen
Can we do authorization on inserts?

Implement chunking with unstructured

https://github.com/Unstructured-IO/unstructured-api#chunking-elements

Test container as is
Try the chunking
Allow user to set params
On login registration redirect to console

Test chunking output

curl -X POST http://unstructured:8000/general/v0/general \
 -H 'accept: application/json'  \
 -H 'Content-Type: multipart/form-data' \
 -F '[email protected]' \
 -F 'chunking_strategy=by_title' \
 | jq -C . | less -R

Consider Pipeline Functionality

Haystack has the concept of ready made pipelines https://docs.haystack.deepset.ai/docs/ready_made_pipelines

A pipeline is the following

The prompt template
The batching strategy
Retrieval strategy (i.e. embeddings)
Prompt strategy i.e. add history etc.
Parameters passed to the model i.e. temperature.

To start we would only have one pipeline option i.e.
ExtractiveQAPipeline

Areas of investigation

Can we use the haystack api?
What prompts does it use for ExtractiveQAPipeline
Find the batch strategy i.e. sentence splitter with size = 1000
Does it return everything or only REALLY relevant articles.
What do they do with parametrs to the model

Test Data/Uses Cases

In order to demonstrate and test the system's capabilities we need to have some use cases and datasets.

Currently trying
https://www.kaggle.com/datasets/harshsinghal/aws-case-studies-and-blogs

RAG - What's the best way to handle context?

LocalGPT

uses this - https://github.com/PromtEngineer/localGPT/blob/main/prompt_template_utils.py

Context: {history} \n {context}
User: {question}
Answer:

LLama 2

[INST]<<SYS>>You are a helpful assistant, you will use the provided context to answer user questions. Read the given context before answering questions and think step by step. If you can not answer a user question based on the provided context, inform the user. Do not use any other information for answering user<</SYS>>
Context: {history}
{context}
User: {question}
[/INST]

Local AI

Has a models repo with yaml that has config for each model.

https://github.com/go-skynet/model-gallery/blob/main/gpt4all-j.yaml

The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
### Prompt:
{{.Input}}
### Response:

Registration doesn't work

We moved all the demo setup into migrations. Move it back again.

docker-compose.yml race condition

When the db is ready, all services run, however the database migrations may not have run at that point so services will fail as db users don't exist yet.

Connection issue on port 41321

Running docker compose up on Ubuntu, tried to upload a single line text file and got the following

fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO sqlx::query] /* SQLx ping /; rows affected: 0, rows returned: 0, elapsed: 204.650µs
fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO sqlx::query] SELECT id, user_id, session_verifier, …; rows affected: 0, rows returned: 1, elapsed: 222.934µs
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | SELECT
fine-tuna-barricade-1 | id,
fine-tuna-barricade-1 | user_id,
fine-tuna-barricade-1 | session_verifier,
fine-tuna-barricade-1 | otp_code_confirmed,
fine-tuna-barricade-1 | otp_code_encrypted,
fine-tuna-barricade-1 | otp_code_attempts,
fine-tuna-barricade-1 | otp_code_sent
fine-tuna-barricade-1 | FROM
fine-tuna-barricade-1 | sessions
fine-tuna-barricade-1 | WHERE
fine-tuna-barricade-1 | id = $1
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO actix_web::middleware::logger] 172.19.0.2 "POST /app/team/1/dataset/1/doc_upload HTTP/1.1" 200 0 "-" "-" 0.000727
fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO sqlx::query] / SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 151.458µs
fine-tuna-app-1 | 2023-09-07T10:49:23.519647Z INFO axum_server::documents::upload_doc: Sending document to unstructured
fine-tuna-unstructured-1 | 2023-09-07 10:49:23,522 unstructured_api DEBUG pipeline_api input params: {"request": "<starlette.requests.Request object at 0x7f3d901e1400>", "filename": "test1.txt", "file_content_type": "text/plain", "response_type": "application/json", "m_coordinates": [], "m_encoding": [], "m_hi_res_model_name": [], "m_include_page_breaks": [], "m_ocr_languages": [], "m_pdf_infer_table_structure": [], "m_skip_infer_table_types": [], "m_strategy": [], "m_xml_keep_tags": []}
fine-tuna-unstructured-1 | 2023-09-07 10:49:23,522 unstructured_api DEBUG partition input data: {"content_type": "text/plain", "strategy": "auto", "ocr_languages": "eng", "coordinates": false, "pdf_infer_table_structure": false, "include_page_breaks": false, "encoding": null, "model_name": null, "xml_keep_tags": false, "skip_infer_table_types": ["pdf", "jpg", "png"]}
fine-tuna-unstructured-1 | 2023-09-07 10:49:23,524 172.19.0.6:48718 POST /general/v0/general HTTP/1.1 - 200 OK
fine-tuna-app-1 | 2023-09-07T10:49:23.525172Z INFO axum_server::documents::upload_doc: Generating embeddings
fine-tuna-llm-api-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41321: connect: connection refused"

Can't see any container listening on this port

We're not batching into 1024 byte chunks

This seems to end up with batches > 1024 then the embeddings API bombs out as it seems to max out at 1100 bytes

for text_bytes in text.as_bytes().chunks(1024) {
            let text_utf8 = String::from_utf8_lossy(text_bytes).to_string();
            transaction
                .execute(
                    "
                    INSERT INTO embeddings (
                        document_id,
                        text
                    ) 
                    VALUES 
                        ($1, $2)",
                    &[&document_id, &text_utf8],
                )
                .await?;
        }

Error setting up dev env containers - nvidia needed??

Clone repository and set up VSC
Upon starting VSC asked to start in containers, after several minutes of start up failed with following output

mote-containers/data/docker-compose/docker-compose.devcontainer.containerFeatures-1694508858577.yml up -d
[+] Running 8/8
✔ Network fine-tuna_devcontainer_default Created 0.1s
✔ Volume "fine-tuna_devcontainer_target" Created 0.0s
✔ Container fine-tuna_devcontainer-unstructured-1 Started 0.0s
✔ Container fine-tuna_devcontainer-envoy-1 Started 0.0s
✔ Container fine-tuna_devcontainer-db-1 Healthy 0.0s
✔ Container fine-tuna_devcontainer-llm-api-1 Started 0.0s
✔ Container fine-tuna_devcontainer-development-1 Created 0.0s
✔ Container fine-tuna_devcontainer-barricade-1 Started 0.0s
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]
[233903 ms] Error: Command failed: docker compose --project-name fine-tuna_devcontainer -f /home/kdio/dev/fine-tuna/fine-tuna/.devcontainer/docker-compose.yml -f /home/kdio/.config/Code/User/globalStorage/ms-vscode-remote.remote-containers/data/docker-compose/docker-compose.devcontainer.build-1694508639397.yml -f /home/kdio/.config/Code/User/globalStorage/ms-vscode-remote.remote-containers/data/docker-compose/docker-compose.devcontainer.containerFeatures-1694508858577.yml up -d
[233903 ms] at tAA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:427:3052)
[233903 ms] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[233903 ms] at async eAA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:409:3167)
[233903 ms] at async FAA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:479:3833)
[233903 ms] at async GC (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:479:4775)
[233904 ms] at async VeA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:611:12240)
[233904 ms] at async WeA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:611:11981)
[233907 ms] Exit code 1
[233911 ms] Command failed: /usr/share/code/code --ms-enable-electron-run-as-node /home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js up --user-data-folder /home/kdio/.config/Code/User/globalStorage/ms-vscode-remote.remote-containers/data --container-session-data-folder /tmp/devcontainers-51feb12c-fcba-47b6-8b22-d434e6e3d6941694508635609 --workspace-folder /home/kdio/dev/fine-tuna/fine-tuna --workspace-mount-consistency cached --id-label devcontainer.local_folder=/home/kdio/dev/fine-tuna/fine-tuna --id-label devcontainer.config_file=/home/kdio/dev/fine-tuna/fine-tuna/.devcontainer/devcontainer.json --log-level debug --log-format json --config /home/kdio/dev/fine-tuna/fine-tuna/.devcontainer/devcontainer.json --default-user-env-probe loginInteractiveShell --mount type=volume,source=vscode,target=/vscode,external=true --skip-post-create --update-remote-user-uid-default on --mount-workspace-git-root true
[233911 ms] Exit code 1

Not working on windows

Deployed with latest config file.
Deploys and runs ok, allows me to create user and upload a document, create a prompt

When asked a question, takes a very long time and then returns 'Error occurred while generating'
Backend shows
dev-barricade-1 |
dev-barricade-1 | [2023-10-18T21:52:01Z INFO actix_web::middleware::logger] 172.18.0.4 "GET /app/team/2/teams_popup HTTP/1.1" 200 0 "-" "-" 0.001444
dev-barricade-1 | [2023-10-18T21:52:01Z INFO sqlx::query] /* SQLx ping /; rows affected: 0, rows returned: 0, elapsed: 391.193µs
dev-barricade-1 | [2023-10-18T21:52:01Z INFO sqlx::query] / SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 2.973ms
dev-app-1 | 2023-10-18T21:52:26.065883Z INFO open_api: Processing 384 bytes
dev-app-1 | 2023-10-18T21:52:26.069197Z INFO axum_server::prompt: About to call
dev-app-1 | 2023-10-18T21:52:26.073883Z INFO axum_server::prompt: Retrieved 1 chunks
dev-app-1 | 2023-10-18T21:52:26.075891Z INFO axum_server::prompt: Retrieved 0 history items
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45753: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44263: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35263: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40867: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40097: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36629: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45451: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36365: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46289: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46299: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39867: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41493: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46165: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38533: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45215: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39897: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33209: connect: connection refused"
dev-db-1 | 2023-10-18 21:54:07.857 UTC [76] LOG: checkpoint starting: time
dev-db-1 | 2023-10-18 21:54:10.123 UTC [76] LOG: checkpoint complete: wrote 23 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.215 s, sync=0.020 s, total=2.267 s; sync files=19, longest=0.011 s, average=0.002 s; distance=6 kB, estimate=1094 kB

Deletion of documents facility

Could we also have a facility that highlighted any new document being uploaded that was 'semantically' similar to existing documents. Rationale : possible different versions of documents that have updated contents and then the user could decide if they wanted to delete 'any similar' documents

Embeddings job should use the model attached to the embedding

Remove the ENV var and use the model.

Add an API

The API would basically give users inside an organisation a OpenAI compatible rest API.

As each team has documents the API would give users access to LLM output via RAG.

Format incoming LLM stream

Model seems to do code like this...

python import math fib_seq = [0, 1] n=5 # number of iterations to complete i=0 # current iteration number, starting at 0 print(f"The fibonacci sequence for {n} iterations is:") while i < n: # check if we have reached the end fib_seq[i] = math.pow(fib_seq[i-1], 2) + fib_seq[i-2] i = i + 1 print(f"{i} {', '.join([str(x) for x in fib_seq])}")

We need to format it in the console.

How are other people handling incoming streams
What code syntax highlight library to use

Configurability

Options we can configure

Datasets

Batch size (has to be less than 1kb token size or embeddings API doesn't process them)
Batch overlap.

Vector algorithm

we currently use text-embedding-ada-002

Claims to have a context size of 8k but if I pass more than 1k it blows up.

Note if they change the vector algorithm that may change the dimension of the vectors returned.

Prompt

suffix - The suffix that comes after a completion of inserted text.
max_tokens - The maximum number of tokens
temperature - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
top_p An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n - How many completions to generate for each prompt.

SEO for blog articles.

The inages don't get picked up when I share. Add og meta?

Integrate with async-openai

Remove our code and use this 3rd part library.

Can we also use it in the reverse proxy?

Move to pure rust architecture

Look at building open ai compatible inference with candle
Look at hugging face chat ui - https://github.com/huggingface/chat-ui
Hugging face has docker image for embeddings https://github.com/huggingface/text-embeddings-inference
What's accelerate
How does mkl work

Unstructured

Split into smaller modules? Is there a rust equivalent.

embedding Model drop down empty

With existing yml file I ran - docker compose down -v
Downloaded latest yml file using new curl command
brought up containers (required download of new containers)
Created new user and tried to add new dataset

Embedding Model dropdown empty

OCR of Documents

Testing OCR

Process documents in the background
How does unstructured know when to OCR
Set at the dataset level?