GithubHelp home page GithubHelp logo

dearliuliu0522 / keras-llm-robot Goto Github PK

View Code? Open in Web Editor NEW

This project forked from smalltong02/keras-llm-robot

0.0 0.0 0.0 42.03 MB

A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.

License: Apache License 2.0

Python 99.95% Batchfile 0.05%

keras-llm-robot's Introduction

Keras-llm-robot Web UI

🌍 READ THIS IN CHINESE

The project inherits from the Langchain-Chatchat project(https://github.com/chatchat-space/Langchain-Chatchat) The underlying architecture uses open-source frameworks such as Langchain and Fastchat, with the top layer implemented in Streamlit. The project is completely open-source, aiming for offline deployment and testing of most open-source models from the Hugging Face website. Additionally, it allows combining multiple models through configuration to achieve multimodal, RAG, Agent, and other functionalities.

image


Table of Contents

Quick Start

Please first prepare the runtime environment, refer to Environment Setup

If deploying locally, you can start the Web UI using Python with an HTTP interface at http://127.0.0.1:8818

python __webgui_server__.py --webui

If deploying on a cloud server and accessing the Web UI locally, Please use reverse proxy and start the Web UI with HTTPS. Access using https://127.0.0.1:4480 on locally, and use the https interface at https://[server ip]:4480 on remotely:

// By default, the batch file uses the virtual environment named keras-llm-robot,
// Modify the batch file if using a different virtual environment name.

// windows platform
webui-startup-windows.bat

// ubuntu(linux) platform
python __webgui_server__.py --webui
chmod +x ./tools/ssl-proxy-linux
./tools/ssl-proxy-linux -from 0.0.0.0:4480 -to 127.0.0.1:8818

// MacOS platform
python __webgui_server__.py --webui
chmod +x ./tools/ssl-proxy-darwin
./tools/ssl-proxy-darwin -from 0.0.0.0:4480 -to 127.0.0.1:8818

As an example with Ubuntu, You can access the Server from other PCs on the local network after starting the reverse proxy with ssl-proxy-darwin:

Image1

Start Server on Ubuntu.

Image1

Start Reverse Proxy on Ubuntu.

Image1

Access Server on Windows PC by https service.

Feature Demonstration

  1. The demonstration utilizes a multimodal online model GPT-4-vision-preview along with Azure Speech to Text services:

Alt text

  1. The demonstration gpt-4-vision-preview VS Gemini-pro-vision:

Alt text

  1. The demonstration of the Retrieval Augmented Generation (RAG) feature:

Alt text

  1. Demonstration of Image Recognition and Image Generation Features:

Presentation of text to image, translating natural language into the CLIP for image generation models:

Image | Image

Image | Image

Creating Handicrafts Based on Items in the Picture:

Image

Project Introduction

Consists of three main interfaces: the chat interface for language models, the configuration interface for language models, and the tools and agent interface for auxiliary models.

Chat Interface: Image1 The language model is the foundation model that can be used in chat mode after loading. It also serves as the brain in multimodal features. Auxiliary models, such as voice, image, and retrieval models, require language models to process their input or output text. The voice model like to ear and mouth, the image model like to eye, and the retrieval model provides long-term memory. The project currently supports dozens of language models.

Configuration Interface: Image1 Models can be loaded based on requirements, categorized into general, multimodal, special, and online models.

Tools & Agent Interface: Image1 Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-text, image recognition, and image generation, it can be loaded based on requirements. The tools section includes settings for function calls (requires language model support for function calling).

Environment Setup

  1. Install Anaconda or Miniconda and Git. Windows users also need to install the CMake tool, Ubuntu users need to install gcc tools.
// In a clean environment on Ubuntu, follow the steps below to pre-install the packages:
// install gcc
  sudo apt update
  sudo apt install build-essential

// install for ffmpeg
  sudo apt install ffmpeg

// install for pyaudio
  sudo apt-get install portaudio19-dev

// The default installation of requestment is for the faiss-cpu. If you need to install the faiss-gpu
  pip3 install faiss-gpu
  1. Create a virtual environment named keras-llm-robot using conda and install Python of 3.10 or 3.11:
conda create -n keras-llm-robot python==3.11.5
  1. Clone the repository:
git clone https://github.com/smalltong02/keras-llm-robot.git
cd keras-llm-robot
  1. Activate the virtual environment:
conda activate keras-llm-robot
  1. If you have an NVIDIA GPU, Please install the CUDA Toolkit from (https://developer.nvidia.com/cuda-toolkit-archive), and install the PyTorch CUDA version in the virtual environment (same to the CUDA Toolkit version https://pytorch.org/):
// such as install version 12.1
conda install pytorch=2.1.2 torchvision=0.16.2 torchaudio=2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
  1. Install dependencies, Please choose the appropriate requirements file based on your platform, On the Windows, if encounter compilation errors for llama-cpp-python or tts during the installation, please remove these two packages from the requirements:
// windows
pip install -r requirements-windows.txt
// Ubuntu
pip install -r requirements-ubuntu.txt
// MacOS
pip install -r requirements-macos.txt
  1. If speech feature is required, you also need to install the ffmpeg tool.
// For Windows:
Download the Windows binary package of ffmpeg from (https://www.gyan.dev/ffmpeg/builds/).
Add the bin directory to the system PATH environment variable.

// for ubuntu, install ffmpeg and pyaudio
sudo apt install ffmpeg
sudo apt-get install portaudio19-dev

// For MacOS
```bash
# Using libav
brew install libav

####    OR    #####

# Using ffmpeg
brew install ffmpeg
```
  1. If you need to download models from Hugging Face for offline execution, please download the models yourself and place them in the "models" directory. If the models have not been downloaded in advance, the WebUI will automatically download them from the Hugging Face website to the local system cache.
// such as the folder of llama-2-7b-chat model:
models\llm\Llama-2-7b-chat-hf

// such as the folder of XTTS-v2 speech-to-text model:
models\voices\XTTS-v2

// such as the folder of faster-whisper-large-v3 text-to-speech model:
models\voices\faster-whisper-large-v3
  1. When using the OpenDalleV1.1 model to generate images, if using 16-bit precision, please download the sdxl-vae-fp16-fix model from Huggingface and place it in the models\imagegeneration folder. If enabling the Refiner, please download the stable-diffusion-xl-refiner-1.0 model from Huggingface and place it in the models\imagegeneration folder beforehand.

  2. When using the Model stable-video-diffusion-img2vid and stable-video-diffusion-img2vid-xt, it is necessary to install ffmpeg and the corresponding dependencies first:

    1. download generative-models from https://github.com/Stability-AI/generative-models in project root folder.
    2. cd generative-models & pip install .
    3. pip install pytorch-lightning
       pip install kornia
       pip install open_clip_torch
  3. If run locally, start the Web UI using Python at http://127.0.0.1:8818:

python __webgui_server__.py --webui
  1. If deploying on a cloud server and accessing the Web UI locally, use reverse proxy and start the Web UI with HTTPS. Access using https://127.0.0.1:4480 on locally, and use the https interface at https://[server ip]:4480 on remotely:
// By default, the batch file uses the virtual environment named keras-llm-robot,
// Modify the batch file if using a different virtual environment name.

webui-startup-windows.bat

// ubuntu(linux)平台

python __webgui_server__.py --webui
chmod +x ./tools/ssl-proxy-linux
./tools/ssl-proxy-linux -from 0.0.0.0:4480 -to 127.0.0.1:8818

// MacOS平台

python __webgui_server__.py --webui
chmod +x ./tools/ssl-proxy-darwin
./tools/ssl-proxy-darwin -from 0.0.0.0:4480 -to 127.0.0.1:8818

Feature Overview

Interface Overview

  • Configuration Interface

    In the configuration interface, you can choose suitable language models to load, categorized as Foundation Models, Multimodal Models, Special Models, and Online Models.

    1. Foundation Models Untouched models published on Hugging Face, supporting models with chat templates similar to OpenAI.
    2. Multimodal Models (Not implemented): Models supporting both voice and text or image and text at the lower level.
    3. Code Models Code generation model.
    4. Special Models Quantized models (GGUF) published on Hugging Face or models requiring special chat templates.
    5. Online Models Supports online language models from OpenAI and Google, such as GPT4-Turbo, Gemini-Pro, GPT4-vision, and Gemini-Pro-vision. Requires OpenAI API Key and Google API Key, which can be configured in the system environment variables or in the configuration interface.

Image1

  • Tools & Agent Interface

    In the tools & agent interface, you can load auxiliary models such as retrieval, code execution, text-to-speech, speech-to-text, image recognition, image generation, or function calling.

    1. Retrieval Supports both local and online vector databases, local and online embedding models, and various document types. Can provide long-term memory for the Foundation model.
    2. Code Interpreter (Not implemented)
    3. Text-to-Speech Supports local model XTTS-v2 and Azure online text-to-speech service. Requires Azure API Key, which can be configured in the system environment variables SPEECH_KEY and SPEECH_REGION, or in the configuration interface.
    4. Speech-to-Text Supports local models whisper and fast-whisper and Azure online speech-to-text service. Requires Azure API Key, which can be configured in the system environment variables SPEECH_KEY and SPEECH_REGION, or in the configuration interface.
    5. Image Recognition Supports local model blip-image-captioning-large.
    6. Image Generation Supports local model OpenDalleV1.1 for static image generation and local model stable-video-diffusion-img2vid-xt for dynamic image generation.
    7. Function Calling (Not implemented)

Image

Once the speech-to-text model is loaded, voice and video chat controls will appear in the chat interface. Click the START button to record voice via the microphone and the STOP button to end the voice recording. The speech model will automatically convert the speech to text and engage in conversation with the language model. When the text-to-speech model is loaded, the text output by the language model will automatically be converted to speech and output through speakers and headphones.

Image

Once the Multimodal model is loaded(such as Gemini-Pro-Vision),upload controls will appear in the chat interface, The restrictions on uploading files depend on the loaded model. After sending text in the chatbox, both uploaded files and text will be forwarded to the multimodal model for processing.

Image

  • Language Model Features

    1. Load Model

      Foundation Models can be loaded with CPU or GPU, and with 8-bits loading (4-bits is invalid). Set the appropriate CPU Threads to improve token output speed when using CPU. When encountering the error 'Using Exllama backend requires all the modules to be on GPU' while loading the GPTQ model, please add "'disable_exllama': true" in the 'quantization_config' section of the model's config.json.

      Multimodal models can be loaded with CPU or GPU. For Vision models, users can upload images and text for model interaction. For Voice models, users can interact with the model using a microphone (without the need for auxiliary models). (Not implemented)

      Special models can be loaded with CPU or GPU, Please prioritize CPU loading of GGUF models.

      Online models do not require additional local resources and currently support online language models from OpenAI and Google.

      NOTE When the TTS library is not installed, XTTS-2 local speech models cannot be loaded, but other online speech services can still be used. If the llama-cpp-python library is not installed, the GGUF model cannot be loaded. Without a GPU device, AWQ and GPTQ models cannot be loaded.

      Supported Models Model Type Size
      fastchat-t5-3b-v1.0 LLM Model 3B
      llama-2-7b-hf LLM Model 7B
      llama-2-7b-chat-hf LLM Model 7B
      chatglm2-6b LLM Model 7B
      chatglm2-6b-32k LLM Model 7B
      chatglm3-6b LLM Model 7B
      tigerbot-7b-chat LLM Model 7B
      openchat_3.5 LLM Model 7B
      Qwen-7B-Chat-Int4 LLM Model 7B
      fuyu-8b LLM Model 7B
      Yi-6B-Chat-4bits LLM Model 7B
      neural-chat-7b-v3-1 LLM Model 7B
      Mistral-7B-Instruct-v0.2 LLM Model 7B
      llama-2-13b-hf LLM Model 13B
      llama-2-13b-chat-hf LLM Model 13B
      tigerbot-13b-chat LLM Model 13B
      Qwen-14B-Chat LLM Model 13B
      Qwen-14B-Chat-Int4 LLM Model 13B
      Yi-34B-Chat-4bits LLM Model 34B
      llama-2-70b-hf LLM Model 70B
      llama-2-70b-chat-hf LLM Model 70B
      cogvlm-chat-hf Multimodal Model (image) 7B
      Qwen-VL-Chat Multimodal Model (image) 7B
      Qwen-VL-Chat-Int4 Multimodal Model (image) 7B
      stable-video-diffusion-img2vid Multimodal Model (image) 7B
      stable-video-diffusion-img2vid-xt Multimodal Model (image) 7B
      Qwen-Audio-Chat Multimodal Model (image) 7B
      phi-2-gguf Special Model 3B
      phi-2 Special Model 3B
      Yi-6B-Chat-gguf Special Model 7B
      OpenHermes-2.5-Mistral-7B Special Model 7B
      Yi-34B-Chat-gguf Special Model 34B
      Mixtral-8x7B-v0.1-gguf Special Model 8*7B
      gpt-3.5-turbo Online Model *B
      gpt-3.5-turbo-16k Online Model *B
      gpt-4 Online Model *B
      gpt-4-32k Online Model *B
      gpt-4-1106-preview Online Model *B
      gpt-4-vision-preview Online Model *B
      gemini-pro Online Model *B
      gemini-pro-vision Online Model *B
      chat-bison-001 Online Model *B
      text-bison-001 Online Model *B
      whisper-base Voice Model *B
      whisper-medium Voice Model *B
      whisper-large-v3 Voice Model *B
      faster-whisper-large-v3 Voice Model *B
      AzureVoiceService Voice Model *B
      XTTS-v2 Speech Model *B
      AzureSpeechService Speech Model *B
      OpenAISpeechService Speech Model *B

      Notes for Multimodal Models

      • The Model cogvlm-chat-hf, Qwen-VL-Chat, and Qwen-VL-Chat-Int4 support single-image file input with text input, capable of recognizing image content and answering questions about the image based on natural language.

      • The Model stable-video-diffusion-img2vid and stable-video-diffusion-img2vid-xt support single-image file input and generate video based on the image.

        When using these two models, it is necessary to install ffmpeg and the corresponding dependencies first:

        1. download generative-models from https://github.com/Stability-AI/generative-models in project root folder.
        2. cd generative-models & pip install .
        3. pip install pytorch-lightning
           pip install kornia
           pip install open_clip_torch
      • The Model Qwen-Audio-Chat supports single audio file input with text input and provides responses to the content of the audio file based on natural language.

    2. Quantization

      Use open-source tools like llama.cpp to create quantized versions of general models with 2, 3, 4, 5, 6, and 8 bits. Not implemented

    3. Fine-tuning

      You can fine-tune the language model using a private dataset. Not implemented

    4. Prompt Templates

      Set up a template for prompting the language model in specific scenarios. Not implemented

  • Auxiliary Model Features

    1. Retrieval

      RAG functionality requires a vector database and embedding models to provide long-term memory capabilities to the language model.

      Support the following Vector Database:

      Databases Type
      Faiss Local
      Milvus Local
      PGVector Local
      ElasticsearchStore Local
      ZILLIZ Online

      Support the following Embedding Models:

      Model Type Size
      bge-small-en-v1.5 Local 130MB
      bge-base-en-v1.5 Local 430MB
      bge-large-en-v1.5 Local 1.3GB
      bge-small-zh-v1.5 Local 93MB
      bge-base-zh-v1.5 Local 400MB
      bge-large-zh-v1.5 Local 1.3GB
      m3e-small Local 93MB
      m3e-base Local 400MB
      m3e-large Local 1.3GB
      text2vec-base-chinese Local 400MB
      text2vec-bge-large-chinese Local 1.3GB
      text-embedding-ada-002 Online *B
      embedding-gecko-001 Online *B
      embedding-001 Online *B

      NOTE Please download the embedding model in advance and place it in the specified folder, otherwise the document vectorization will not be possible, and uploading to the knowledge base will also fail.

      NOTE When using the Milvus database, it is recommended to deploy it locally or on a Kubernetes (k8s) cluster using Docker. Please refer to the official Milvus documentation and download the docker file at https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml .

        1. please rename it to docker-compose.yml When download was finished.
      
        2. Create local folder for Milvus and copy the file docker-compose.yml into it.
      
        3. create sub folder conf, db, logs, pic, volumes, wal
      
        4. Execute the command in that folder
           docker-compose up -d
      
        5. Please check whether the image deployment is successful in the Docker interface. and ensure that the image is running and listening on ports 19530 and 9091.

      NOTE When using the PGVector database,it is recommended to deploy it locally using Docker.

        1. Execute the command for download image.
           docker pull ankane/pgvector
      
        2. Deploy the container using the following command, and modify the DB name, username, and password as needed. (Please also update the 'connection_uri' configuration in kbconfig.json under 'pg').
           docker run --name pgvector -e POSTGRES_DB=keras-llm-robot -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d ankane/pgvector
      
        3. Please check whether the image deployment is successful in the Docker interface. and ensure that the image is running and listening on ports 5432.

      As an example with Ubuntu, after successfully launching the server-side of Milvus and PGVector, you can check them in Docker Desktop. Additionally, you can install clients such as attu or pgAdmin to manage vector DB:

      Image1

      Support the following Documents:

      html, mhtml, md, json, jsonl, csv, pdf, png, jpg, jpeg, bmp, eml, msg, epub, xlsx, xls, xlsd, ipynb, odt, py, rst, rtf, srt, toml, tsv, docx, doc, xml, ppt, pptx, enex, txt

      Knowledge Base Interface: Image1 When creating a new knowledge base, please enter the name and introduction of the knowledge base, and select an appropriate vector database and embedding model. If the document content of the knowledge base is in English, it is recommended to choose the local model bge-large-en-v1.5; if the content is predominantly in Chinese with some English, it is recommended to choose bge-large-zh-v1.5 or m3e-large.

      Upload Documents Interface: Image1 You can choose to upload one or multiple documents at a time. During the document upload, content extraction, split, vectorization, and addition to the vector database will be performed. The process may take a considerable amount of time, so please be patient.

      Documents Content Interface: Image1 You can inspect the content of document slices and export them.

      Knowledge Base Chat Interface: Image1 In the chat interface, you can select a knowledge base, and the Foundation model will answer user queries based on the content within the selected knowledge base.

    2. Code Interpreter

      Enable code execution capability for the language model to empower it with actionable functionality for the mind. Not implemented

    3. Speech Recognition and Generation

      Provide the language model with speech input and output capabilities, adding the functions of listening and speaking to the mind. Support local models such as XTTS-v2 and Whisper, as well as integration with Azure online speech services.

    4. Image Recognition and Generation

      Provide the language model with input and output capabilities for images and videos, adding the functions of sight and drawing to the mind.

      Support the following Image:

      png, jpg, jpeg, bmp

      Model Type Size
      blip-image-captioning-large Image Recognition Model *B
      OpenDalleV1.1 Image Generation Model *B

      When using the OpenDalleV1.1 model to generate images, if using 16-bit precision, please download the sdxl-vae-fp16-fix model from Huggingface and place it in the models\imagegeneration folder. If enabling the Refiner, please download the stable-diffusion-xl-refiner-1.0 model from Huggingface and place it in the models\imagegeneration folder beforehand.

      Image Recognition:

      Image1

      Static image generation:

      Image1 Image1 Image1 Image1

      Dynamic image generation:

      Image1

    5. Network Search Engine

      Providing language models with network retrieval capabilities adds the ability for the brain to retrieve the latest knowledge from the internet.

      Support the following Network Search Engine:

      Network Search Engine Key
      duckduckgo No
      bing Yes
      metaphor Yes

      When use bing and metaphor search engine,Please apply and config API Key first.

      Please install the following packages before using the network search engine.

        1. pip install duckduckgo-search
        2. pip install exa-py
        3. pip install markdownify
        4. pip install strsimpy

      Support smart feature, The smart feature allows the model to autonomously decide whether to use a search engine when answering questions

    6. Function Calling

      Provide the language model with function calling capability, empowering the mind with the ability to use tools. Anticipated support for automation platforms such as Zapier, n8n, and others. Not implemented

Note

Anaconda download:(https://www.anaconda.com/download)

Git download:(https://git-scm.com/downloads)

CMake download:(https://cmake.org/download/)

Langchain Project: (https://github.com/langchain-ai/langchain)

Fastchat Project: (https://github.com/lm-sys/FastChat)

keras-llm-robot's People

Contributors

smalltong02 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.