GithubHelp home page GithubHelp logo

adamelliotfields / chat Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 602 KB

Chat with llamas in your browser πŸ’¬

Home Page: https://aef.me/chat

License: MIT License

Dockerfile 0.99% JavaScript 1.38% TypeScript 85.63% HTML 8.63% CSS 0.21% Shell 3.16%
github-pages react tailwind llama web-llm

chat's Introduction

chat

Open in GitHub Codespaces

Important

No longer maintained. 😒 When I first made this, there was no UI for WebLLM. The official app at chat.webllm.ai is now the best UI for WebLLM and is actively maintained. Use that or one of Xenova's WebGPU spaces instead! πŸ¦™

React chat UI for Web LLM on GitHub Pages. Built with Tailwind and Jotai. Inspired by Perplexity Labs.

demo.mp4

Introduction

Web LLM is a project under the MLC (machine learning compilation) organization. It allows you to run large language models in the browser using WebGPU and WebAssembly. Check out the example and read the introduction to learn more.

In addition to @mlc-ai/web-llm, the app uses TypeScript, React, Jotai, and Tailwind. It's built with Vite and SWC.

Usage

# localhost:5173
npm install
npm start

Known issues

I'm currently using Windows/Edge stable on a Lenovo laptop with a RTX 2080 6GB.

Using the demo app at webllm.mlc.ai, I did not have to enable any flags to get the q4f32 quantized models to work (f16 requires a flag). Go to webgpureport.org to inspect your system's WebGPU capabilities.

Fetch errors

For whatever reason, I have to be behind a VPN to fetch the models from Hugging Face on Windows. πŸ€·β€β™‚οΈ

Cannot find global function

Usually a cache issue.

You can delete an individual cache:

await caches.delete('webllm/wasm')

Or all caches:

await caches.keys().then(keys => Promise.all(keys.map(key => caches.delete(key))))

Reference

There is only 1 class you need to know to get started: ChatModule

const chat = new ChatModule()

// callback that fires on progress updates during initialization (e.g., fetching chunks)
type ProgressReport = { progress: number; text: string; timeElapsed: number }
type Callback = (report: ProgressReport) => void
const onProgress: Callback = ({ text }) => console.log(text)
chat.setInitProgressCallback(onProgress)

// load/reload with new model
// customize `temperature`, `repetition_penalty`, `top_p`, etc. in `options`
// set system message in `options.conv_config.system`
// defaults are in conversation.ts and the model's mlc-chat-config.json
import type { ChatOptions } from '@mlc-ai/web-llm'
import config from './src/config'
const id = 'TinyLlama-1.1B-Chat-v0.4-q4f32_1-1k'
const options: ChatOptions = { temperature: 0.9, conv_config: { system: 'You are a helpful assistant.' } }
await chat.reload(id, options, config)

// generate response from prompt
// callback fired on each generation step
// returns the complete response string when resolved
type Callback = (step: number, message: string) => void
const onGenerate: Callback = (_, message) => console.log(message)
const response = await chat.generate('What would you like to talk about?', onGenerate)

// get last response (sync)
const message: string = chat.getMessage()

// interrupt generation if in progress (sync)
// resolves the Promise returned by `generate`
chat.interruptGenerate()

// check if generation has stopped (sync)
// shorthand for `chat.getPipeline().stopped()`
const isStopped: boolean = chat.stopped()

// reset chat, optionally keep stats (defaults to false)
const keepStats = true
await chat.resetChat(keepStats)

// get stats
// shorthand for `await chat.getPipeline().getRuntimeStatsText()`
const statsText: string = await chat.runtimeStatsText()

// unload model from memory
await chat.unload()

// get GPU vendor
const vendor: string = await chat.getGPUVendor()

// get max storage buffer binding size
// used to determine the `low_resource_required` flag
const bufferBindingSize: number = await chat.getMaxStorageBufferBindingSize()

// getPipeline is private (useful for debugging in dev tools)
const pipeline = chat.getPipeline()

Cache management

The library uses the browser's CacheStorage API to store models and their configs.

There is an exported helper function to check if a model is in the cache.

import { hasModelInCache } from '@mlc-ai/web-llm'
import config from './config'
const inCache = hasModelInCache('Phi2-q4f32_1', config) // throws if model ID is not in the config

VRAM requirements

See utils/vram_requirements in the Web LLM repo.

TODO

  • Dark mode
  • Settings menu (temperature, system message, etc.)
  • Inference on web worker
  • Offline/PWA
  • Cache management
  • Image upload for multimodal like LLaVA
  • Tailwind class sorting by Biome 🀞

chat's People

Contributors

adamelliotfields avatar

Stargazers

 avatar  avatar

Watchers

 avatar

chat's Issues

Assistants

Assistants are just presets with a system prompt.

Need to redesign the layout first and think about how to implement.

Adapters

Currently the app only works for people with Android phones or gaming laptops, which is pretty sad.

Implement adapters so different backends can be used. The first adapter will be for the Hugging Face Inference API.

To support streaming text, you’ll have to use the Hugging Face JS library or write something similar. Edit: TGI Messages API.

Also want to support OpenAI, Perplexity, Anthropic, and Goose.ai. (Mistral and Replicate don’t offer credit-based billing yet).

Remove Web LLM

Focus on web APIs until Web GPU is more widespread.

Can add an "offline" toggle in the future. Also need to run inference on a worker thread and possibly use million.

Import/export conversations

Allow importing an entire conversation object, complete with settings, system prompt, and messages. Likewise, allow exporting the current conversation with settings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.