GithubHelp home page GithubHelp logo

lucab85 / avg Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 5.0 49 KB

Source of Docker image lucab85/avg https://quay.io/lucab85/avg

Home Page: https://hub.docker.com/r/lucab85/avg

License: MIT License

Dockerfile 44.35% R 41.59% Shell 14.06%

avg's Introduction

ci

AVG

AVG is a Docker-based project for programmatically building educational content. For example, converting a Google Slides presentation (that contains speaker notes in each slide) to a video file containing a slide show with computer-generated voiceovers of the speaker notes.

Requirements

This project requires the following items:

  1. A locally installed copy Docker Desktop (requires a commerical license) and connected to a valid DockerHub account.
  2. Access to an Amazon Web Services account (for the purpose of converting the speaker notes to audio files).
  3. A valid Github account.
  4. A working directory where the project can be built and used (e.g. ~/repos).

Installation

Assuming Docker Desktop is installed and running (and connected to your DockerHub account), the only other installation task is to fetch a copy of the AVG github repo, as follows:

$ cd ~/repos
$ git clone https://github.com/lucab85/avg.git
$ cd avg

If you do not have an AWS Access Key and Secret Key, log into your AWS account and generate these from the Security Credentials screen therein.

Setup

The configuration settings that control how this project behaves are stored in the default.env file. Most of the settings therein are pre-configured to help you get started. However, some additional setup steps may be needed, as follows:

AWS API Keys

Enter your AWS Access & Secret Keys in the AWS environment variables (in default.env) below:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION: example: "eu-west-1"

Create Share Folder

You should also create a folder where the input and output files can be stored and ensure that this matches the folder referenced in the AVG_INPUT and AVG_OUTPUT variables.

$ mkdir share

You can choose any folder name you ike here, so long as it matches the one referenced in default.env and you adjust the docker commands below to use the same value.

Fetch Slides

Finally, place a copy of the slides you wish to convert (in .pptx format) into the share folder (or equivalent) above, naming it input.pptx (or whatever value you specified in the โ€‹โ€‹AVG_INPUT variable in default.env).

Fetch AVG Container

Fetch a copy of the AVG container image as follows:

$ docker pull lucab85/avg

You should now be ready to convert your slides to video/audio format.

##Execution

Once all of the setup tasks have been completed, start your container as follows:

$ docker run --name="avg" -dit --mount type=bind,source=$(pwd)/share,target=/share --env-file=default.env lucab85/avg

Now log into the running container using the Terminal:

$ docker exec -it "avg" bash

Finally, convert your slides using the desired conversion script (pptx2ari.sh or gs2ari.sh):

# pptx2ari.sh

Once completed, your video file should be available on your local system in the share foler specified earlier, named in accorandance with the AVG_OUTPUT variable in default.env.

If you need to change any of the settings in the default.env file, you should remove the existing container before restarting it with the updated configuration details, as follows:

$ docker stop avg
$ docker rm avg

Configuration Settings

For reference, rhe full set of configuration settings in default.env (which are aslo available as Environment Variables) are:

AVG

  • AVG_INPUT = input
  • AVG_OUTPUT = output filename (default: input + .mp4)
  • AVG_SERVICE = tts service (default: "amazon", values: "amazon" / "google")
  • AVG_VOICE = voices (default: "Joey")
  • AVG_DPI = dpi of images (default "300")
  • AVG_SUBTITLES = subtitle, output filename (default: "TRUE")
  • AVG_VERBOSE = (default: "TRUE")

AWS

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION: example: "eu-west-1"

GCP

  • GL_AUTH_FILE: path of service json file

FAQ

Text to Speech

How does the text to speech engine in AVG work?

Depending on which value you specify for the AVG_SERVICE variable, the following text-to-speech services will be used:

Each of these services also allows you to listen to sample voices in different languages, most of which are tailored for different geographic regions or accents.

How long does the video generation process take?

The time it takes to convert your presentation depends on a number of factors, including:

  1. The number of slides are in your presentation (i.e. converting each one to a graphic image and then a video (below)).
  2. The amount of text in the speaker notes (per slide), where each must be converted to an audio file and then a video clip (with the slide content) of the same length.
  3. The value of the AVG_VOICE variable, which controls which voice will be used when converting the speaker notes to audio. Some voices speak more slowly than others and slower voices can result in longer videos.
  4. The speed of your local system's CPU (the final step of merging the above audio+video assets together to form the completed video is quite compute-intensive).

Some early experiments show that it could take around 75% of the total length of the final video to complete the full video generation.

Costs

How much does it cost to convert a presentation?

The main costs involved involved in this process are:

  1. A once-off cost for a Docker Desktop Pro license ($5 per month or $60 per year).
  2. A per-usage cost for the text-to-speech conversion, which is incurred each time a presentation is converted and which is charged by the number of characters in the speaker notes (e.g. $4 per million characters on AWS).

avg's People

Contributors

jmernin avatar lucab85 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

avg's Issues

Add language support

In addition to being able to specify different voices, perhaps we could allow different languages to be specified as well. I suggest this because I saw that Amazon Polly supports both voices and languages, which are technically different things.

That way, it could be possible to produce multiple different videos of the same presentation, in different languages but using an accent that is suited to each language.

Clarify which AWS service is doing the text-to-speech conversion

Provide some additional detailed in the README about how the text-to-speech conversion is being done, so people can better understand the other values they could provide in the default.env file, for example.

In fact, perhaps we could add comments into default.env, pointing people to the relevant AWS page to see the available values for each variable.

Guidance on how long rendering takes

Add some details to the README to help people understand how long it takes to render each video. Clearly, it depends on the length of the audio files produced from the speaker notes, but is it exactly 100% of their combined length or somewhere in between?

Ability to control voice playback speed

A few people have commented on recent demos (that I've shared) saying that the voice we've used speaks too quickly. Personally, the voice in question (Joanna) is probably the clearest one I've heard but I'll admit that it could indeed be a little slower when talking.

It would be good if Amazon Polly allowed you to control the pitch (or speed) of the voice.

Custom font support

If your slide deck used custom fonts (e.g. Red Hat Display), which are not available inside the docker container, then the resulting video may contain incorrectly formatted slides. It may be worth exploring if there's any way to add the .ttf files for the fonts in question to the ARI ecosystem.

Support for Apple M1 Silicon chip architecture

I recently switched to a new MacBook Pro powered by the Apple M1 Silicon chipset and I ran into some problems while trying to set up the AVG environment there. Specifically, I got this warning when I ran the docker run command:

$ docker run --name="avg" -dit --mount type=bind,source=$(pwd)/share,target=/share --env-file=default.env  lucab85/avg
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Could it be that we need to build an alternative version of the the base image, that suits the M1 chipset?

Create empty "shared" folder for quicker setup

How about creating an empty "share" folder in the AVG repo so that the step to create this (during the setup phase) is not actually needed?

For sure, people can select/use a different folder later on but for most use cases (and in the interest in increased adoption), maybe this would help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.