GithubHelp home page GithubHelp logo

spark-ai's Introduction

Do checkout master branch too!

๐ŸŽฅ Text-to-Video Flask Application ๐ŸŽ™๏ธ

This repository contains a Flask application that converts input text into an animated video of a celebrity speaking the input text. The animation uses a static image of the celebrity and manipulates the image to make it look like the celebrity is speaking the words.

๐Ÿ—๏ธ Application Architecture

The application leverages two key APIs:

ElevenLabs for Text-to-Speech (TTS) conversion: The input text is converted into speech in the voice of a specified celebrity. D-ID for Video Generation: The generated speech is then used to animate a static image of the chosen celebrity, making it appear as if the celebrity is speaking the text. The application also uses Google Cloud Storage to store the generated speech files.

๐Ÿš€ Live Demo

The application is hosted on Replit and can be accessed here.

๐ŸŽฏ How to Use

The application exposes a single endpoint: POST /t2vid. This endpoint expects a JSON payload with two parameters:

text: The text to be spoken in the video. actor: The name of the celebrity who should appear to speak the text. The 'actor' can be one of the following: Donald Trump, Shah Rukh Khan, Sachin Tendulkar, Narendra Modi, Aishwarya Rai, Kangana Ranaut, Amitabh Bachchan.

Example usage:

POST /t2vid
{
    "text": "Hello, world!",
    "actor": "Donald Trump"
}

The endpoint returns a URL to the generated video.

๐Ÿƒโ€โ™€๏ธ Running the Application

To run the application, use the following command:

python app.py

By default, the application will run on port 81.

๐Ÿ“ฆ Dependencies

The application requires the following Python packages:

Flask requests google-cloud-storage Install these packages with pip:

pip install flask requests google-cloud-storage

You also need to have the Google Cloud SDK installed and configured on your machine.

๐Ÿ”‘ Note

This application uses ElevenLabs and D-ID APIs, and Google Cloud Storage. You need to have accounts and valid API keys for these services to run the application.

For production deployment, consider using a production-ready WSGI server like Gunicorn or uWSGI, as Flask's built-in server is not intended for production use. Also, make sure to secure your API keys and other sensitive data using environment variables or a secure secret management system.

๐Ÿ”ฎ Future Work

This application can be extended in many ways, including adding support for more 'actors', improving the quality of the generated videos, and implementing additional features like video download, sharing, and more. Enjoy crafting ๐Ÿš€!

Bounties

LLamaIndex

๐Ÿšง Challenges Faced

During the development of this application, we faced several challenges. Here's an overview of how we tackled them:

๐ŸŽฏ Token Length Limitations

One of the issues we faced was related to the token length limitations of the GPT models. We had to carefully manage our input so that we did not exceed the maximum token limit. Through multiple iterations and testing, we devised a method to split the input text into chunks that could be processed separately, without losing coherence.

๐Ÿ—ฃ๏ธ Autonomous Chatbot with GPT-4

Creating an autonomous chatbot using GPT-4 was a challenge due to the model's complexity and resource requirements. We made multiple iterations on the chatbot's implementation, focusing on prompt engineering, to ensure the bot's responses were appropriate and coherent. JSONifying the response from GPT-3.5/4 was another task that required careful handling to ensure the data was correctly parsed and used.

โฑ๏ธ Latency Issues

We also faced latency issues with GPT-4 due to its high computational requirements. To mitigate this, we performed multiple iterations on the chatbot, focusing on reducing the latency. We streamlined the number of messages exchanged in each workflow to a minimal amount, thus improving response times.

๐ŸŽฅ Video Generation

For video generation, we tried multiple services including D-ID and SadTalker. While SadTalker provided valuable learnings and was used to generate a number of videos, we found D-ID to be faster for our use case and thus decided to proceed with it.

๐ŸŽ™๏ธ Voice Training

We also encountered challenges in generating voice training for celebrities using ElevenLabs. Despite multiple failures, we persisted and were eventually able to find high-quality voices that significantly enhanced the realism of our generated videos.

Despite these challenges, we believe that the end result was well worth the effort. We hope you find this application useful and look forward to seeing what you create with it!

spark-ai's People

Contributors

akshat-khare avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.