GithubHelp home page GithubHelp logo

bionic-gpt / bionic-gpt Goto Github PK

View Code? Open in Web Editor NEW
1.7K 20.0 165.0 72.83 MB

BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality

Home Page: https://bionic-gpt.com

License: Apache License 2.0

Shell 0.64% Dockerfile 0.07% Rust 78.97% HTML 7.10% CSS 3.32% TypeScript 3.74% SCSS 1.94% PLpgSQL 1.61% Earthly 1.86% JavaScript 0.48% Just 0.28%
architecture full-stack llmops llms rust

bionic-gpt's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bionic-gpt's Issues

Storage and use of chat history to answer questions

Currently, no context or history of chat is used for subsequent questions, hence every questions needs to contain the full context. Can and how would we add the ability to maintain context throughout the chat session.

Example.

When was the LIfetime learning act 2023 enacted?

RESPONSE

Which act did it amend?

Improve the static site

  • Choose a template for the docs
  • Choose a template for the blog
  • Get something for the site pages i.e. /, /pricing, /etc
  • Home page on mobile
  • Use dioxus to generate pages?
  • Hook into zola to parse front matter etc.
  • Fix the book menu issue
  • Fix the book side menu to keep scroll position.

Medium Samples

Book Themes

Todo

  • Split the website into zola and mdbooks
  • Generate docs with mdbooks
  • Some how combine the 2.

Allow users to add models from LocalAI available

The model gallery allows us to see available models. https://localai.io/models/

We could provide a UI for this, ideally showing download progress.

We can do this via the API.

  • Call it system prompt and make it non-mandatory
  • What does localai do with /chat/completions
  • Remove errors from history

Move to LLama 2

  • Get the container
  • Change llm-api to local-ai
  • Update the migrations
  • Try it

Implement Testing UI

Allow people to enter a series of prompts and run them.

Allow people to evaluate and score the results.

Keep track of model, prompt, chunking strategy etc.

  • Use Cases
  • Test Runs
  • Run unstructured process in the background
  • Detach prompt and dataset.
  • User can select dataset at the console.
  • Use case has 1 or more questions - Attach to a prompt and a dataset.
  • Test Runs capture all variables.
  • Run in background
  • After the run user can evaluate the results.

Data Pipelines

Upload

  • Api end point for upload?
  • Create DB tables
  • create queries
  • connect ui to queries
  • Add forms etc
  • Upload to documents table
  • Load file direct to database
  • Move code for chunking
  • Setup web testing

Store/retrieve document meta data

Store meta data with chunks in DB, document name, page number

When questions for RAG are answered allow user to see the original source/provide details of the original source

Add Oobabooga API to Models

How do I add my locally running Oobabooga textgen installation with api and openapi extensions to the Model Setup tab? Thanks.

Document uploading not working

Downloaded latest config file
Created a new embedding https://api.openai.com/v1/embeddings

tried to upload pdf document, getting the following errors showing in log (nothing on front end)

fine-tuna-barricade-1 | [2023-11-01T11:38:55Z INFO actix_web::middleware::logger] 172.23.0.4 "POST /app/team/2/dataset/1/doc_upload HTTP/1.1" 200 0 "-" "-" 0.000627
fine-tuna-barricade-1 | [2023-11-01T11:38:55Z INFO sqlx::query] /* SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 94.555µs
fine-tuna-app-1 | 2023-11-01T11:38:55.147453Z INFO axum_server::documents::upload_doc: Sending document to unstructured
fine-tuna-unstructured-1 | 2023-11-01 11:38:56,767 unstructured_api DEBUG pipeline_api input params: {"filename": "MSFT_FY23Q4_10K.pdf", "response_type": "application/json", "m_coordinates": [], "m_encoding": [], "m_hi_res_model_name": [], "m_include_page_breaks": [], "m_ocr_languages": [], "m_pdf_infer_table_structure": [], "m_skip_infer_table_types": [], "m_strategy": [], "m_xml_keep_tags": [], "languages": ["eng"], "m_chunking_strategy": ["by_title"], "m_multipage_sections": ["true"], "m_combine_under_n_chars": ["500"], "new_after_n_chars": ["1000"]}
fine-tuna-unstructured-1 | 2023-11-01 11:38:56,768 unstructured_api DEBUG filetype: application/pdf
fine-tuna-unstructured-1 | 2023-11-01 11:38:56,835 unstructured_api DEBUG partition input data: {"content_type": "application/pdf", "strategy": "auto", "ocr_languages": null, "coordinates": false, "pdf_infer_table_structure": false, "include_page_breaks": false, "encoding": null, "model_name": null, "xml_keep_tags": false, "skip_infer_table_types": ["pdf", "jpg", "png"], "languages": ["eng"], "chunking_strategy": "by_title", "multipage_sections": true, "combine_under_n_chars": 500, "new_after_n_chars": 1000}
fine-tuna-unstructured-1 | 2023-11-01 11:38:57,253 172.23.0.8:58440 POST /general/v0/general HTTP/1.1 - 500 Internal Server Error
fine-tuna-app-1 | 2023-11-01T11:38:57.254773Z ERROR axum_server::errors: response="status = 422 Unprocessable Entity, message = error decoding response body: invalid type: map, expected a sequence at line 1 column 0"
MSFT_FY23Q4_10K.pdf

Documentation for using an alternative Open AI compatible API / not working with text-generation-webui

Overview

The documentation states that it is possible to use any OpenAI-compatible API.
As I have a local working installation of text-generation-webui I attempted to use this with my already installed models, using the OpenAI compatible API it provides (text-generation-webui OpenAI Extension), but encountered issues with chat completion and file embeddings. I was only able to manually fix the chat completion.

Currently there is no documentation at all how to approach it. and I do not know if my method would be the correct one.

Changes made for deployment

Not needing the llm-api (as I am providing my own) I removed that from the docker compose file.

After skimming through the code, to see what I potentially need to change, I identified the envoy configuration for proxying / combining the several services.

To be able to have a different configuration, I used the following docker service instead, which maps my own config file:

  # Handles routing between the application, barricade and the LLM API
  envoy:
    image: ghcr.io/purton-tech/bionicgpt-envoy:1.0.3
    ports:
      - "7800:7700"
      - "7801:7701"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml

I kept the envoy.yaml file which is provided in the .devcontainer mostly unchanged besides manually running the sed commands which are defined in the Earthfile.

I only changed the last section for the LLM API besides that. My changed configuration is as follows:

  # The LLM API
  - name: llm-api
    connect_timeout: 10s
    type: strict_dns
    lb_policy: round_robin
    dns_lookup_family: V4_ONLY
    load_assignment:
      cluster_name: llm-api
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: host.docker.internal
                port_value: 5001

I am using host.docker.internal as text-generation-webui is running on the host system, with 5001 being the default port for the OpenAI compatible API.

With these changes, the docker-compose stack correctly boots and all components are seemingly accessible (using the default auth URL, I can reach the main UI).

Problems Occuring

  • When using the Chat Console and sending a message, the UI is stuck at Processing prompt..., it is not possible to cancel this process
    • In the Network view of the browser, I can see that a completions API request is correctly done.
    • In the console log of text-generation-webui I see that the request is processed, and a response is generated
  • When using Team Documents, you can upload files and it will start creating embeddings, but at the end of the progress it will show that all embeddings have failed

Expectation

Both chat completion and embeddings should work.

The cause of the problem

I do not know why the embeddings do not work, when manually calling the API (also using the envoy proxy) a correct response gets returned.

But I did find the problem for the chat completion. Which is casued by the response containing \r

In the file crates/asset-pipeline/web-components/streaming-chat.ts lines are currently split by just \n :

https://github.com/purton-tech/bionicgpt/blob/91ba40467d011b0d7fc998e78c85f2a663812fae/crates/asset-pipeline/web-components/streaming-chat.ts#L39

Replacing this with:

const arr = value.split(/\r?\n/);

fixes the problem of chat completion. (Which I locally verified by creating an override for the generated index.js)

Conclusion

Chat completion with text-generation-webui as LLM backend doesn't work (at least on Windows), as the Chat responses include carriage returns [which might be a issue specific to text-generation-webui]. Embeddings also do not work, although I could not identify what the cause for that is, as neither text-generation-webui nor BionicGPT log anything.

UI/UX Mixed Bag

  • Implement Company, Private, Team
  • Implement sys admin.
  • Those tabs are quirky on the prompt
  • Also for code
  • Test all fields and different combinations
  • We should parse the markdown on the fly
  • It doesn't remember the select.
  • Show selected datasets.
  • We don't show mandatory fields
  • Field labels should be bold.
  • No errors in console, can we log every error?
  • Try syntax highlighting again.
  • Fix api key field
  • Show which model was used in the console
  • Don't call it prompt template and the example is wrong

Latest version (3/10) not working

Tried a query with out loading up any datasets and chat returns error, cannot see any error thrown on the backend.

Added a largish file and the deleted it before it had processed
Added a dataset, very small. It was stuck in processing. Never changed status.
Looked at backend 5 minutes later and it looks to be processing some flarge ile

fine-tuna-barricade-1 | [2023-10-03T17:22:36Z INFO actix_web::middleware::logger] 172.21.0.4 "GET /app/team/doc_status/3 HTTP/1.1" 200 0 "-" "-" 0.000807
fine-tuna-barricade-1 | [2023-10-03T17:22:36Z INFO sqlx::query] /* SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 413.872µs
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:38.913998Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:38.914602Z INFO embeddings_job: Processing embedding id 39
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:50.747231Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:22:50.747865Z INFO embeddings_job: Processing embedding id 40
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:03.201833Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:03.202704Z INFO embeddings_job: Processing embedding id 41
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:15.077959Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:15.078916Z INFO embeddings_job: Processing embedding id 42
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:27.573897Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:27.574657Z INFO embeddings_job: Processing embedding id 43
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:39.289973Z INFO open_api: Processing 384 bytes
fine-tuna-embeddings-job-1 | 2023-10-03T17:23:39.290602Z INFO embeddings_job: Processing embedding id 44

Screenshot from 2023-10-03 18-19-08
Screenshot from 2023-10-03 18-19-43

Utf8Error trying to upload file

On MacMini (intel)

Error msg

fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO sqlx::query] /* SQLx ping /; rows affected: 0, rows returned: 0, elapsed: 822.209µs
fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO sqlx::query] SELECT id, user_id, session_verifier, …; rows affected: 0, rows returned: 1, elapsed: 1.085ms
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | SELECT
fine-tuna-barricade-1 | id,
fine-tuna-barricade-1 | user_id,
fine-tuna-barricade-1 | session_verifier,
fine-tuna-barricade-1 | otp_code_confirmed,
fine-tuna-barricade-1 | otp_code_encrypted,
fine-tuna-barricade-1 | otp_code_attempts,
fine-tuna-barricade-1 | otp_code_sent
fine-tuna-barricade-1 | FROM
fine-tuna-barricade-1 | sessions
fine-tuna-barricade-1 | WHERE
fine-tuna-barricade-1 | id = $1
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /app/team/1/dataset/1/doc_upload HTTP/1.1" 200 0 "-" "-" 0.003813
fine-tuna-barricade-1 | [2023-09-07T08:47:13Z INFO sqlx::query] /
SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 454.951µs
fine-tuna-app-1 | 2023-09-07T08:47:13.777191Z INFO axum_server::documents::upload_doc: Sending document to unstructured
fine-tuna-unstructured-1 | 2023-09-07 08:47:13,831 unstructured_api DEBUG pipeline_api input params: {"request": "<starlette.requests.Request object at 0x7fb7a1891550>", "filename": "test-text.txt", "file_content_type": "text/plain", "response_type": "application/json", "m_coordinates": [], "m_encoding": [], "m_hi_res_model_name": [], "m_include_page_breaks": [], "m_ocr_languages": [], "m_pdf_infer_table_structure": [], "m_skip_infer_table_types": [], "m_strategy": [], "m_xml_keep_tags": []}
fine-tuna-unstructured-1 | 2023-09-07 08:47:13,852 unstructured_api DEBUG partition input data: {"content_type": "text/plain", "strategy": "auto", "ocr_languages": "eng", "coordinates": false, "pdf_infer_table_structure": false, "include_page_breaks": false, "encoding": null, "model_name": null, "xml_keep_tags": false, "skip_infer_table_types": ["pdf", "jpg", "png"]}
fine-tuna-unstructured-1 | 2023-09-07 08:47:17,821 172.18.0.6:52640 POST /general/v0/general HTTP/1.1 - 200 OK
fine-tuna-app-1 | 2023-09-07T08:47:17.827533Z INFO axum_server::documents::upload_doc: Generating embeddings
fine-tuna-app-1 | thread 'tokio-runtime-worker' panicked at 'called Result::unwrap() on an Err value: Utf8Error { valid_up_to: 1022, error_len: None }', crates/axum-server/src/open_api.rs:51:10

test-text.txt

Llama2 broken

comdockerdevenvironmentscode-app-1 | 2023-10-11T12:08:19.262865Z ERROR axum_server::errors: response="status = 422 Unprocessable Entity, message = error sending request for url (http://local-ai:8080/v1/embeddings): error trying to connect: dns error: failed to lookup address information: Try again"

Multiplatform fast inferrence - Is it possible?

The Problem

We want users to be able to test the system on hardware they already have. Give them the ability to do a proof of concept on-premise. Users may have the following setups

  • Windows x86
  • MacOs (Intel)
  • MacOs (Apple Silicon)
  • Linux

We want a minimal impact on the users machine so ideally install via docker or perhaps an executable.

Current Solution

We use a docker-compose.yml and the user cuts and pastes it to their local machine and does docker-compose up.

This is nice because we use the same containers for a PoC as we would for deployment to production.

This has been testing on Linux and works well.

The LocalAI API we use also recommends docker https://localai.io/basics/getting_started/ Although for Apple Silicon they recommend building from scratch.

Steps to reproduce

Try this, we'll run localai on it's own

docker run -it --rm -p 8080:8080 ghcr.io/purton-tech/fine-tuna-model-api

The following just prints out the models we have loaded just to see if it is running

curl http://localhost:8080/v1/models

Here we do some text generation (fans should spin up and takes a while)

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j",
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'

Test embeddings work

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Your text string goes here",
    "model": "text-embedding-ada-002"
  }'

Steps to reproduce (windows)

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d "{ \"model\": \"ggml-gpt4all-j\", \"prompt\": \"A long time ago in a galaxy far, far away\", \"temperature\": 0.7 }"

curl http://localhost:8080/v1/embeddings -H "Content-Type: application/json" -d "{\"input\": \"Your text string goes here\",\"model\": \"text-embedding-ada-002\" }"

Minimum Hardware Requirements

Can we get a minimum hardware spec for each platform where inference is fast enough to give a good user experience?

Hardware we've tested

OS Architecture Processor Ram Inference Embeddings
PopOs (Linux) x86 AMD 2700x 8 Core 16gb Usable Working
MacOs x86 2.8GHz dual core i5 16gb Very Slow Working
Windows 10 x86 i3-1005G1 @ 1.2GHz 8gb Not Working Working but slow
Windows 11 x86 i5-2400 CPU @ 3.10GHz 8gb Not Working Not Working

Sanity Check

There are a few local GPT projects, how do they handle installation

Areas of Investigation

  • Docker supports buildx where we build a container for each platform. So if we build a container for Apple Silicon does it then run native?
  • LocalAI has binaries for Darwin https://github.com/go-skynet/LocalAI/releases/tag/v1.25.0 not sure if these are for Intel or Apple Silicon, do they run fast?
  • GPT4All has binaries for Mac, what kind of performance do they get?

General UX issue/changes

  • On documents page can we have a timestamp of when document was uploaded and one for when finished processing (maybe just the latter will be good enough)
  • Do we still need the Action button on documents page, if so what could it be used for?
  • Cancel a chat request - needs this mudler/LocalAI#974
  • Give user feedback when making a reuest i.e. spiining wheel
  • When making a chat request can we get progress i.e. are we tokenizing or what?
  • Lock the console and don't allow users to make multiple requests
  • By default add all dataset data to the prompt.
  • Set the upload button on upload documents to disabled when uplaoding, set the text to "Document uploading, this may take some time"
  • When creating a prompt, be able to say (No datasets, All datasets, Select Datasets)
  • Connect a prompt to a model

Add web integration tests to CI/CD pipeline

System Setup

  • First user is sys admin, they can setup the model, i.e. edit the llama one.
  • Set EXTERNAL API secret in github
  • Set embeddings with env var

Mock Unstructured API

General

  • Add to earthly
  • Need to make endpoint for unstructured configurable so we can set it to localhost
  • Need to make endpoint for llm-api configurable so we can set it to localhost
  • Add tests for file uploads
  • Add tests for prompts
  • Make sure video is created
  • Add video to artifacts.
  • Use a hosted API for testing
  • Use a hosted embeddings API for testing
  • Don't load unstructured

Problems

  • The model is huge
  • Unstructured is huge

Integrate with OIDC

  • Take working docker compose over
  • Update envoy
  • Use bionic as db not finetuna
  • When we a new user get a create team etc (DB trigger)
  • Don't try to save first and last name
  • Test all this.
  • How do we get logout working
  • Logout page not always avaialable (Does it need to be a turbo frame?)
  • Pass through first and last name
  • Logout needs to be configurable. OIDC_ENDPOINT

How to manage end session endpoint - oauth2-proxy/oauth2-proxy#2372

On registration set up per team models and prompts.

When the user registers or creates a team set up the default model and templates.

Prompts should be connected to models.

  • A prompt can be connected to Zero, All or selected datasets.
  • Tenancy isolation, after integration test, I seel all prompts
  • A prompt is connected to a model
  • Update prompt screen
  • Can we do authorization on inserts?

Consider Pipeline Functionality

Haystack has the concept of ready made pipelines https://docs.haystack.deepset.ai/docs/ready_made_pipelines

A pipeline is the following

  • The prompt template
  • The batching strategy
  • Retrieval strategy (i.e. embeddings)
  • Prompt strategy i.e. add history etc.
  • Parameters passed to the model i.e. temperature.

To start we would only have one pipeline option i.e.
ExtractiveQAPipeline

Areas of investigation

  • Can we use the haystack api?
  • What prompts does it use for ExtractiveQAPipeline
  • Find the batch strategy i.e. sentence splitter with size = 1000
  • Does it return everything or only REALLY relevant articles.
  • What do they do with parametrs to the model

RAG - What's the best way to handle context?

LocalGPT

uses this - https://github.com/PromtEngineer/localGPT/blob/main/prompt_template_utils.py

Context: {history} \n {context}
User: {question}
Answer:

LLama 2

[INST]<<SYS>>You are a helpful assistant, you will use the provided context to answer user questions. Read the given context before answering questions and think step by step. If you can not answer a user question based on the provided context, inform the user. Do not use any other information for answering user<</SYS>>
Context: {history}
{context}
User: {question}
[/INST]

Local AI

Has a models repo with yaml that has config for each model.

https://github.com/go-skynet/model-gallery/blob/main/gpt4all-j.yaml

The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
### Prompt:
{{.Input}}
### Response:

docker-compose.yml race condition

When the db is ready, all services run, however the database migrations may not have run at that point so services will fail as db users don't exist yet.

Connection issue on port 41321

Running docker compose up on Ubuntu, tried to upload a single line text file and got the following

fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO sqlx::query] /* SQLx ping /; rows affected: 0, rows returned: 0, elapsed: 204.650µs
fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO sqlx::query] SELECT id, user_id, session_verifier, …; rows affected: 0, rows returned: 1, elapsed: 222.934µs
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | SELECT
fine-tuna-barricade-1 | id,
fine-tuna-barricade-1 | user_id,
fine-tuna-barricade-1 | session_verifier,
fine-tuna-barricade-1 | otp_code_confirmed,
fine-tuna-barricade-1 | otp_code_encrypted,
fine-tuna-barricade-1 | otp_code_attempts,
fine-tuna-barricade-1 | otp_code_sent
fine-tuna-barricade-1 | FROM
fine-tuna-barricade-1 | sessions
fine-tuna-barricade-1 | WHERE
fine-tuna-barricade-1 | id = $1
fine-tuna-barricade-1 |
fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO actix_web::middleware::logger] 172.19.0.2 "POST /app/team/1/dataset/1/doc_upload HTTP/1.1" 200 0 "-" "-" 0.000727
fine-tuna-barricade-1 | [2023-09-07T10:49:23Z INFO sqlx::query] /
SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 151.458µs
fine-tuna-app-1 | 2023-09-07T10:49:23.519647Z INFO axum_server::documents::upload_doc: Sending document to unstructured
fine-tuna-unstructured-1 | 2023-09-07 10:49:23,522 unstructured_api DEBUG pipeline_api input params: {"request": "<starlette.requests.Request object at 0x7f3d901e1400>", "filename": "test1.txt", "file_content_type": "text/plain", "response_type": "application/json", "m_coordinates": [], "m_encoding": [], "m_hi_res_model_name": [], "m_include_page_breaks": [], "m_ocr_languages": [], "m_pdf_infer_table_structure": [], "m_skip_infer_table_types": [], "m_strategy": [], "m_xml_keep_tags": []}
fine-tuna-unstructured-1 | 2023-09-07 10:49:23,522 unstructured_api DEBUG partition input data: {"content_type": "text/plain", "strategy": "auto", "ocr_languages": "eng", "coordinates": false, "pdf_infer_table_structure": false, "include_page_breaks": false, "encoding": null, "model_name": null, "xml_keep_tags": false, "skip_infer_table_types": ["pdf", "jpg", "png"]}
fine-tuna-unstructured-1 | 2023-09-07 10:49:23,524 172.19.0.6:48718 POST /general/v0/general HTTP/1.1 - 200 OK
fine-tuna-app-1 | 2023-09-07T10:49:23.525172Z INFO axum_server::documents::upload_doc: Generating embeddings
fine-tuna-llm-api-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41321: connect: connection refused"

Can't see any container listening on this port

We're not batching into 1024 byte chunks

This seems to end up with batches > 1024 then the embeddings API bombs out as it seems to max out at 1100 bytes

for text_bytes in text.as_bytes().chunks(1024) {
            let text_utf8 = String::from_utf8_lossy(text_bytes).to_string();
            transaction
                .execute(
                    "
                    INSERT INTO embeddings (
                        document_id,
                        text
                    ) 
                    VALUES 
                        ($1, $2)",
                    &[&document_id, &text_utf8],
                )
                .await?;
        }

Error setting up dev env containers - nvidia needed??

Clone repository and set up VSC
Upon starting VSC asked to start in containers, after several minutes of start up failed with following output

mote-containers/data/docker-compose/docker-compose.devcontainer.containerFeatures-1694508858577.yml up -d
[+] Running 8/8
✔ Network fine-tuna_devcontainer_default Created 0.1s
✔ Volume "fine-tuna_devcontainer_target" Created 0.0s
✔ Container fine-tuna_devcontainer-unstructured-1 Started 0.0s
✔ Container fine-tuna_devcontainer-envoy-1 Started 0.0s
✔ Container fine-tuna_devcontainer-db-1 Healthy 0.0s
✔ Container fine-tuna_devcontainer-llm-api-1 Started 0.0s
✔ Container fine-tuna_devcontainer-development-1 Created 0.0s
✔ Container fine-tuna_devcontainer-barricade-1 Started 0.0s
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]
[233903 ms] Error: Command failed: docker compose --project-name fine-tuna_devcontainer -f /home/kdio/dev/fine-tuna/fine-tuna/.devcontainer/docker-compose.yml -f /home/kdio/.config/Code/User/globalStorage/ms-vscode-remote.remote-containers/data/docker-compose/docker-compose.devcontainer.build-1694508639397.yml -f /home/kdio/.config/Code/User/globalStorage/ms-vscode-remote.remote-containers/data/docker-compose/docker-compose.devcontainer.containerFeatures-1694508858577.yml up -d
[233903 ms] at tAA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:427:3052)
[233903 ms] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[233903 ms] at async eAA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:409:3167)
[233903 ms] at async FAA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:479:3833)
[233903 ms] at async GC (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:479:4775)
[233904 ms] at async VeA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:611:12240)
[233904 ms] at async WeA (/home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js:611:11981)
[233907 ms] Exit code 1
[233911 ms] Command failed: /usr/share/code/code --ms-enable-electron-run-as-node /home/kdio/.vscode/extensions/ms-vscode-remote.remote-containers-0.309.0/dist/spec-node/devContainersSpecCLI.js up --user-data-folder /home/kdio/.config/Code/User/globalStorage/ms-vscode-remote.remote-containers/data --container-session-data-folder /tmp/devcontainers-51feb12c-fcba-47b6-8b22-d434e6e3d6941694508635609 --workspace-folder /home/kdio/dev/fine-tuna/fine-tuna --workspace-mount-consistency cached --id-label devcontainer.local_folder=/home/kdio/dev/fine-tuna/fine-tuna --id-label devcontainer.config_file=/home/kdio/dev/fine-tuna/fine-tuna/.devcontainer/devcontainer.json --log-level debug --log-format json --config /home/kdio/dev/fine-tuna/fine-tuna/.devcontainer/devcontainer.json --default-user-env-probe loginInteractiveShell --mount type=volume,source=vscode,target=/vscode,external=true --skip-post-create --update-remote-user-uid-default on --mount-workspace-git-root true
[233911 ms] Exit code 1

Not working on windows

Deployed with latest config file.
Deploys and runs ok, allows me to create user and upload a document, create a prompt

When asked a question, takes a very long time and then returns 'Error occurred while generating'
Backend shows
dev-barricade-1 |
dev-barricade-1 | [2023-10-18T21:52:01Z INFO actix_web::middleware::logger] 172.18.0.4 "GET /app/team/2/teams_popup HTTP/1.1" 200 0 "-" "-" 0.001444
dev-barricade-1 | [2023-10-18T21:52:01Z INFO sqlx::query] /* SQLx ping /; rows affected: 0, rows returned: 0, elapsed: 391.193µs
dev-barricade-1 | [2023-10-18T21:52:01Z INFO sqlx::query] /
SQLx ping */; rows affected: 0, rows returned: 0, elapsed: 2.973ms
dev-app-1 | 2023-10-18T21:52:26.065883Z INFO open_api: Processing 384 bytes
dev-app-1 | 2023-10-18T21:52:26.069197Z INFO axum_server::prompt: About to call
dev-app-1 | 2023-10-18T21:52:26.073883Z INFO axum_server::prompt: Retrieved 1 chunks
dev-app-1 | 2023-10-18T21:52:26.075891Z INFO axum_server::prompt: Retrieved 0 history items
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45753: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44263: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35263: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40867: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40097: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36629: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45451: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36365: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46289: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46299: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39867: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41493: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46165: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38533: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45215: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39897: connect: connection refused"
dev-local-ai-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33209: connect: connection refused"
dev-db-1 | 2023-10-18 21:54:07.857 UTC [76] LOG: checkpoint starting: time
dev-db-1 | 2023-10-18 21:54:10.123 UTC [76] LOG: checkpoint complete: wrote 23 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.215 s, sync=0.020 s, total=2.267 s; sync files=19, longest=0.011 s, average=0.002 s; distance=6 kB, estimate=1094 kB

Deletion of documents facility

Could we also have a facility that highlighted any new document being uploaded that was 'semantically' similar to existing documents. Rationale : possible different versions of documents that have updated contents and then the user could decide if they wanted to delete 'any similar' documents

Add an API

The API would basically give users inside an organisation a OpenAI compatible rest API.

As each team has documents the API would give users access to LLM output via RAG.

  • Parse the incoming json and get the question only for /completions
  • Punch the prompt into the json (only for completions)
  • Create prompt based on API key.
  • Retrieve the base_url from the prompt
  • How can we get the user_id associated with the API key?
  • organisation_id form the API key
  • API - Connect to backend specified in by prompt attached to api key
  • Front end - should pass in the model_id
  • Front End - Connect to backend specified in prompt in the body?
  • Central place for prompt code
  • Currently prompt is created by send_message, can we do it all in reverse proxy?
  • Get the user id for /completions
  • For /completions, prompt has already been created.
  • Get the API key out of the message if path != /completions
  • Fix streaming
  • Configure envoy to pass through API calls (only for completions?)
  • No API key, then must be authenticated by barricade
  • How do we intercept /v1 routes as a proxy?
  • Try as only a reverse proxy
  • Test with curl
  • Add to the documentation

Format incoming LLM stream

Model seems to do code like this...

python import math fib_seq = [0, 1] n=5 # number of iterations to complete i=0 # current iteration number, starting at 0 print(f"The fibonacci sequence for {n} iterations is:") while i < n: # check if we have reached the end fib_seq[i] = math.pow(fib_seq[i-1], 2) + fib_seq[i-2] i = i + 1 print(f"{i} {', '.join([str(x) for x in fib_seq])}")

We need to format it in the console.

  • How are other people handling incoming streams
  • What code syntax highlight library to use

Configurability

Options we can configure

Datasets

  • Batch size (has to be less than 1kb token size or embeddings API doesn't process them)
  • Batch overlap.

Vector algorithm

we currently use text-embedding-ada-002

Claims to have a context size of 8k but if I pass more than 1k it blows up.

Note if they change the vector algorithm that may change the dimension of the vectors returned.

Prompt

  • suffix - The suffix that comes after a completion of inserted text.
  • max_tokens - The maximum number of tokens
  • temperature - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
  • top_p An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
  • n - How many completions to generate for each prompt.

embedding Model drop down empty

With existing yml file I ran - docker compose down -v
Downloaded latest yml file using new curl command
brought up containers (required download of new containers)
Created new user and tried to add new dataset

Embedding Model dropdown empty

OCR of Documents

Testing OCR

  • Process documents in the background
  • How does unstructured know when to OCR
  • Set at the dataset level?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.