GithubHelp home page GithubHelp logo

catid / aiwebcam2 Goto Github PK

View Code? Open in Web Editor NEW
34.0 4.0 8.0 437 KB

Second attempt at AI webcam, this time with OpenAI API

License: BSD 3-Clause "New" or "Revised" License

Python 59.59% JavaScript 21.68% CSS 13.25% HTML 5.49%

aiwebcam2's Introduction

AI Webcam

Fully-interactive webcam AI assistant you can run at home!

Implemented with latest Whisper3 + GPT-4-Vision + OpenAI TTS and a WebRTC browser front-end for speed.

Logo

Demo here: https://www.youtube.com/watch?v=G_L8t3EQMcs

There is an accompanying blog post here: https://catid.io/posts/aiwebcam/

Future work:

  • Add a cancel button so the AI does not talk over you.
  • Improve the HTML render frame and UI in general to be more usable with resizeable frames and copy buttons for generated code.
  • Have a button to switch between desktop apps and user's webcam.
  • Support for other browsers and iPhone.
  • Use Unreal engine to generate a real-time lip-synced avatar for the AI running on the server.
  • Listen to audio and decide when to respond more intelligently.
  • Integrate with a Zoom client to allow the AI to join teleconferences and reply.

Prerequisites

Designed for Ubuntu server with an Nvidia GPU. It might work with other setups, but I have only tested this one.

You'll want to set up Conda first: https://docs.conda.io/en/latest/miniconda.html

Setup

Create a Conda environment, clone the repo, and install the requirements:

conda create -n aiwebcam python=3.10
conda activate aiwebcam

git clone https://github.com/catid/aiwebcam2
cd aiwebcam

pip install -U -r requirements.txt

openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 3650
# Press enter to accept defaults for all questions

Modify the api_key.py file to specify your OpenAI key, which you generate here: https://platform.openai.com/api-keys

Run

python app.py

Open a Chrome browser to https://localhost:8443 On my network I host it on a server at https://gpu3.lan:8443

When you get the Your connection is not private screen, click Advanced and then Proceed to localhost (unsafe).

When you see the "localhost:8443 wants to: Use your camera" permission popup, select [Allow].

You should see the webcam feed in the browser window. Click or press and hold the [space bar] on the keyboard to speak. The AI will respond to what you say, and it will be provided a picture from the webcam stream so that it can see you for context.

Include "look" in your query to use more tokens to improve its eyesight. Include "remember" to keep a high resolution image for the remainder of the session.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.