GithubHelp home page GithubHelp logo

helixml / helix Goto Github PK

View Code? Open in Web Editor NEW
206.0 6.0 14.0 36.75 MB

Multi-node production AI stack. Run the best of open source AI easily on your own servers. Create your own AI by fine-tuning open source models

Home Page: https://docs.helix.ml

License: Other

Shell 0.65% Go 56.36% TypeScript 39.71% HTML 0.13% Dockerfile 0.14% Python 2.68% Mako 0.05% Smarty 0.28%
golang llama llm mistral openai self-hosted codellama mixtral qwen sdxl

helix's Introduction

logo

SaaSPrivate DeploymentDocsDiscord

HelixML

Discord

Private GenAI platform. Deploy the best of open AI in your own data center or VPC and retain complete data security & control.

Including support for fine-tuning models that's as easy as drag'n'drop.

Looking for a private GenAI platform? From language models to image models and more, Helix brings the best of open source AI to your business in an ergonomic, scalable way, while optimizing the tradeoff between GPU memory and latency.

Docker

git clone https://github.com/helixml/helix.git
cd helix

Create an .env file with settings based on the example values and edit it:

cp .env.example-prod .env

Ensure keycloak realm settings are up to date with your .env file

./update-realm-settings.sh

To start the services:

docker-compose up -d

The dashboard will be available on http://localhost.

Attach GPU runners: see runners docs

License

Helix is licensed under a similar license to Docker Desktop. You can run the source code (in this repo) for free for:

  • Personal Use: individuals or people personally experimenting
  • Educational Use: schools/universities
  • Small Business Use: companies with under $10M annual revenue and less than 250 employees

If you fall outside of these terms, please contact us to discuss purchasing a license for large commercial use. If you are an individual at a large company interested in experimenting with Helix, that's fine under Personal Use until you deploy to more than one GPU on company-owned or paid-for infrastructure.

You are not allowed to use our code to build a product that competes with us.

Contributions to the source code are welcome, and by contributing you confirm that your changes will fall under the same license.

Why these clauses in your license?

  • We generate revenue to support the development of Helix. We are an independent software company.
  • We don't want cloud providers to take our open source code and build a rebranded service on top of it.

If you would like to use some part of this code under a more permissive license, please get in touch.

helix's People

Contributors

bigadamknight avatar binocarlos avatar chocobar avatar lukemarsden avatar philwinder avatar rusenask avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

helix's Issues

fine tuning hangs

why? do we need to automatically restart things if they haven't started in a timeout?

switch to isStale everywhere

the logic for whether a model instance is stale is currently in 3 places (search for stale := and nonStale :=)

move it to one

Use huggingface tokenizer chat template for inference

In the llm model go code (e.g. here) we build up a prompt that is a formatted string based on the chat template associated with the model.

We could instead store a generic json-ised version of the chat history in task.Prompt, like:

[{"role": "user", "content": "What's the capital of France'?"}, {"role": "assistant", "content": "It's Paris."}]

and the use the model's tokenizer to format the message for us inside axolotl at inference time:

messages = json.loads(json_messages)
tokenizer = AutoTokenizer.from_pretrained(model_name)
encoded_messages = tokenizer.apply_chat_template(new_messages, tokenize=False)

This will reduce the effort needed to add subsequent models with potentially different chat templates.

Old list "done"

Things we did whilst using the old "list"

  • when continuing a cloned session, the messages are missing
  • if there are no files - the "view files" button shows an error
  • "add new documents" button at bottom of text session (add more documents, dataprep new ones into jsonl qa-pairs, concatenate qa-pairs, retrain model)
  • retry button for errors
  • plugin sentry
  • share mode where original training data is not copied
  • auto-scroll broken
  • put the name of the session in topbar
  • rather than system as the username, put the name of the session
  • sessions are updating other sessions https://mlops-community.slack.com/archives/C0675EX9V2Q/p1702476943225859
  • add a restart button whilst doing a fine-tune so if things get stuck we can restart
    • possibly only show this if we've not seen any progress for > 30 seconds (fixed by the error throwing an error if runner reports job still active)
  • dashboard not showing finetune interactions
  • performance of auto-save before login (image fine tune text is slow)
  • for session updates check we are on the same page
    • whilst we are on one page and another session is processing - it's updating the page we are on with the wrong session
  • react is rendering streaming updates to the sessions slowly
  • progress bars on text fine tuning
  • fork session (fork from an interaction)
  • add data after the model is trained
  • pdfs are broken in production
  • for HTML conversion, use pupetteer to render the page into a PDF then convert the PDF into plain text
  • reliable and fast, scale to 5 concurrent users (Luke)
    • Dockerize the runner & deploy some on vast.ai / runpod.io
  • finish and deploy dashboard
  • logged out state when trying to do things - show a message "please register"
  • fix bug with "create image" dropdown etc not working
  • fix bug with openAI responding with "GPT 4 Answer: Without providing a valid context, I am unable to generate 50 question and answer pairs as requested"
    • make it so user can see whole message from OpenAI
  • replace the thinking face with a spinning progress (small horizontal bouncing three dots)
  • there is a dashboard bug where where runner model job history reverses itself
  • you lose keyboard focus when the chat box disables and re-enables
  • make the chatbox have keyboard focus the first time you load the page
  • pasting a long chunk of text into training text box makes the box go taller than the screen and you cannot scroll
  • create images says “chat with helix” should say “describe what you want to see in an image”
  • enforce min-width on left sidebar
  • the event cancel handler on drop downs is not letting you click the same mode
  • hide technical details behind "technical details" button ?
    • where it currently says "Session ...." - put the session title
    • put a link next to "View Files" called "Info" that will open a model window with more session details
    • e.g. we put the text summary above in the model along with the ID and other things we want to show
    • in the text box say "Chat with Helix" <- for txt models
    • in the text box say "Make images with Helix" <- for image models
  • edit session name (pencil icon to left of bin icon)
  • obvious buttons (on fine tuning)
    • in default starting state - make both buttons (add docs / text) - blue and outlined
    • the the default starting state - make the files button say "or choose files"
    • when you start typing in the box make the "Add Text" button pink and make the upload files not pink
    • once there are > 0 files - make the "choose more files" button outlined so the "upload docs" is the main button
  • performance on text fine tuning (add concurrency to openAI calls)
  • URL to fetch text for text fine tuning
  • homepage uncomment buttons
  • re-train, will add more interactions to add files to
  • we should keep previous Lora files at the interaction level
  • we hoist lora_dir from the latest interaction to the session

new activity dot

show a dot next to sessions that are currently active or have new replies

scheduler not hitting spun up model

quite often there's a model ready to serve and a new one gets spun up on the other node - maybe the clocks are drifting between the machines so the 2 second head start doesn't work? or the python processes aren't polling every 100ms or something?

the session page scrolls to the bottom randomly

There is some useMemo that is reloading (possibly from keycloak) that is causing the "the session has changed scroll to the bottom" behaviour even when the session clearly has not changed - it's annoying because you are actively scrolling up and down just reading and then it will just jump to the bottom of the page

check URL type

make it clear that URLs need to be of text content - for example a youtube URL will not work

place in the queue indiciation

if it's more than 5 seconds

we already have the "this is taking a while" window - this is to show the place in the queue also

Model seems obsessed with more fine tuning of dataset

Having submitted a document (random doc, outline of a fictional story), then asking what a character should do in the story, I keep being met with "Character should continue fine-tuning the data to improve the accuracy of the model." This seemed to be an inescapable answer, no matter how I posed the question.

It also does not appear to learn from any further conversation I have after the dataset is submitted.

url box mime type detection

if you put a URL to a file in the URL box - detect the bloody mime type so we don't split docs that are downloaded

the URL box should download files first

Multi GPU support

Support multiple GPUs on a single node. Initially we can workaround this just by running N runners with CUDA_VISIBLE_DEVICES passed through to the runner python processes

Jsonl input data

If the user uploads their own qapairs, skip the qapair generation phase

too few questions in small dataset

If you put a small bit of text like:

Bob lives at 6 Crow Terrace

It will generate a single question / answer pair and then axotl complains there are too few questions in the training data set

analyse all sessions in the database

for each one:

  • are there errors? if so, add issues to github. calculate which issues caused the most errors
  • is there a trained model with no interactions? if so add to chris's spreadsheet and ping him. also, #43
  • were they successful at doing anything?

overall: what % of sessions were successful and what were the biggest pain points? categorise the use cases

non-english language qapairs

currently the qapair thing seems to translate non-english input data to english, however we have users who want to be able to do it all in, say, french

when working get back to french user on crisp

show API calls to replicate many actions

(e.g. text & image inference to start with)

basically show the curl equivalent of the UI action - i.e. make it clear that you can use the API for each of these actions

url error reporting

detect when we did not manage to extract any text and tell the user that is the error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.