GithubHelp home page GithubHelp logo

imultimedia / chatanything Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhoudaquan/chatanything

0.0 0.0 0.0 28.46 MB

Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS

Shell 0.09% Python 99.84% Dockerfile 0.07%

chatanything's Introduction

title emoji colorFrom colorTo sdk sdk_version app_file python_version pinned
ChatAnything
👀
green
green
gradio
3.41.0
app.py
3.8.10
false

ChatAnything: Facetime Chat with LLM-Enhanced Personas

Yilin Zhao*, Xinbin Yuan*, Shanghua Gao*, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou*

What will it be like to Facetime any imaginary concepts? Do you want to see what the LLM-based agent you have been chatting with looks like? To animate anything, we integrated current open-source models at hand for an animation application for interactive AI-Agent chatting usage.

Assign any agents a visual appearance!

To start with, take a look at these incredible faces generated with open-source Civitai models that are to be animated.

drawing

Here we provide you with ChatAnything. A simple pipeline Enhanced with currently limitless Large Language Models, yielding imaginary Facetime chats with intented visual appearance!

Remember, the repo and application are totally based on pre-trained deep learning methods and haven't included any training yet. We give all the credit to the open-source community (shout out to you). For detail of the pipeline, see our technical report

Release & Features & Future Plans

  • Fine-tune face rendering module.
  • Better TTS module & voice render module.
  • Adding Open-source Language Models.
  • Initial release
    • Facetime Animation.
    • Multiple model choices for initial frame generation.
    • Multiple choices for voices.

Install & Run

Just follow the instructions. Every thing would be simple (hopefully). Reach out if you met with any problems!

Install

First, install the virtual environment.

conda env create -f environment.yaml

# then install
conda env update --name chatanything --file environment.yaml

The Pipeline integrated Open-Source Models. All Models are to be found online(see Acknowledgement). We put some important models together on huggingface remotes just to make life easier. Prepare them for the first run with this Python script prepare_models.py:

# prepare the local models
python python_scripts/prepare_models.py

Building Docker

Try build a docker if you find it easier. This part is not fully tested. If you find a anything wrong, feel free to contribute~

docker build --network=host -t chatanything .
# docker run -dp 127.0.0.1:8901:8901 chatanything
docker run -p 127.0.0.1:8901:8901 -it --gpus all chatanything 
docker run -it --gpus all chatanything bash

Run

Specify a port for the gradio application to run on and set off!

PORT=8809 python app.py $PORT

Configuring: From User Input Concept to Appearance & Voice

The first step of the pipeline is to generate a image for SadTalker and at the same time set up the Text to Sound Module for voice chat.

The pipeline would query a powerful LLM (ChatGPT) for the selection in a zero-shot multi-choice selection format. Three Questions are asked upon the initial of every conversation(init frame generation):

  1. Provide a imagen personality for the user input concept.
  2. Select a Generative model for the init frame generation.
  3. Select a Text To Sound Voice(Model) for the character base on the personality.

We have constructed the model selection to be extendable. Add your ideal model with just a few lines of Configuring! The rest of this section would breifly introduce the steps to add a init-frame generator/language voice.

Image Generator

Configure the models in the Model Config. This Config acts as the memory (or an image-generating tool pool) for the LLM.

The prompt sets up this selection process. Each sub field of the "models" would turn into an option in the multiple-choice question. the "desc" field of each element is what the Language Model would see. The key is not provided to the LM as it would sometimes mislead it. the others are used for the image generation as listed:

  1. model_dir: the repo-path for diffusers package. As the pretrained Face-landmark ControlNet is based on stable-diffusion-v1-5, we currently only supports the derivatives of it.
  2. lora_path: LoRA derivatives are powerful, try a LoRA model also for better stylization. Should directly point to the parameters binary file.
  3. prompt_template & negative_prompt: this is used for prompting the text-to-image diffusion model. Find a ideal prompt for your model and stick with it. A "{}" should be in the prompt template for inserting the user input concept.

Here are some Tips for configuring you own model.

  1. Provide the LLM with a simple description of the generative model. It is worth noting that the description needs to be concise and accurate for a correct selection.

  2. Set the model_dir to a local directory of diffusers stable-diffusion-v1-5 derivatives. Also, you can provide a repo-id on the huggingface hub model space. The model would be downloaded when first chosen, wait for it.

  3. To better utilize the resources from the community, we also add in support of the LoRA features. To add the LoRA module, you would need to give the path to the parameter files.

  4. Carefully write the prompt template and negative prompt. These which affect the initial face generation a lot. Be aware that the prompt template should contain only one pair of "{}" to insert the concept that users wrote on the application webpage. We support the Stable-Diffusion-Webui prompt style as implemented by diffusers, feel free to copy the prompt from Civitai for better prompting the generation and put in the "{}" to the original prompt for ChatAnything!

Again, this model's config acts as an extended tool pool for the LM, the application would drive the LM to choose from this config and use the chosen model to generate. Sometimes the LM fails to choose the correct model or choosing any available model, this would cause the Chatanything app to fail on a generation.

Notice we currently support ONLY stable-diffusion-v1.5 derivatives (Sdxl Pipelines are under consideration, however not yet implemented as we lack a face-landmark ControlNet for it. Reach out if you're interested in training one!)

Voice TTS

We are using the edge_tts package for text-to-speech support. The voice selection and voice configuration file is constructed similarly to the Image generation model selection, except now the LM is supposed to choose the voice base on the personality description given by itself earlier. "gender" and "language" field corresponds to edge_tts.

On-going tasks.

Customized Voice.

There is a Voice Changer TextToSpeach-SpeachVoiceConversion Pipeline app, which ensures a better customized voice. We are trying to leverage its TTS functionality.

Reach out if you want to add a voice of your own or your hero!

Here are the possible steps for You would need to change a little bit in the code first:

  1. Alter this code to import a TTSTalker from chat_anything/tts_talker/tts_voicechanger.py.
  2. switch the config to another one, change code "resources/voices_edge.yaml" -> "resources/voices_voicechanger.yaml"

The try running a Voice Changer on your local machine. Simply set up git-lfs and install the repo and run it for the TTS voice service. The TTS caller was set to port 7860.

Make sure the client class is set up with the same port in here

client = Client("http://127.0.0.1:7860/")

Acknowledgement

Again, the project hasn't yet included any training. The pipeline is totally based on these incredible awesome packages and pretrained models. Don't hesitate to take a look and explore the amazing open-source generative communities. We love you, guys.

Citation

If you like our pipeline and application, don't hesitate to reach out! Let's work on it and see how far it would go!

@misc{zhao2023ChatAnything,
      title={ChatAnything: Facetime Chat with LLM-Enhanced Personas}, 
      author={Yilin, Zhao and Xinbin, Yuan and Shanghua, Gao and Zhijie Lin and Qibin, Hou and Jiashi, Feng and Daquan, Zhou},
      publisher={arXiv:2311.06772},
      year={2023},
}

chatanything's People

Contributors

zhoudaquan avatar ermu2001 avatar gasvn avatar yxb-nku avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.