GithubHelp home page GithubHelp logo

epistoteles / predicting-speaker-quality Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 106.88 MB

This repository belongs to my Bachelor's thesis on predicting voice likability from pre-trained speech embeddings.

Python 53.29% Shell 2.23% Perl 0.41% Tcl 0.16% C++ 6.39% C 36.73% Makefile 0.79%
speech-embeddings speech-likability speech-quality speech-processing

predicting-speaker-quality's Introduction

Predicting-Speaker-Quality

Python Contributions welcome

Welcome to Predicting-Speaker-Quality! This repository contains the code used for my Bachelor's thesis with the title Predicting Speaker Quality Using Embeddings. All of it is research code written by an inexperienced undergraduate student, so please don't expect perfect documentation. However, if you run into any troubles or even want to improve or add to the code base, don't hesitate to reach out to me. Found a mistake? Let me know as well.

Besides just reading this README file, a good idea to delve into the topic might also be to read the resulting thesis itself, which is included in this repository as Predicting Speaker Quality Using Embeddings.pdf.

Setup

To set up the project, follow these steps:

1. Getting Started

  • Clone this repository.
  • Install the requirements from requiremente.txt using pip install -r requirements.txt if they are not already satisfied. If you like, you can do this in a virtual environment to keep things tidy.

2. Getting and Creating Data

  • Download the Spoken Wikipedia Corpus (German, with audio) from https://nats.gitlab.io/swc/ and replace the directory german with it.
  • Navigate into the main project directory and execute the split.sh script using bash split.sh -m 10 -d 10 -p, which will generate up to 10 samples of length 10 seconds from each audio file in the wavs directory and its subdirectories. This may take a while. To see all available options, type bash split.sh -h.
  • Generate the GE2E and TRILL embeddings by running the update_embeddings.py script once. If you want to create new embeddings, for example because you have new .wav files in your demo folder, just run it again. It will remember which embeddings have already been created and delete embeddings that are no longer needed.
  • Navigate into the feature-scripts directory and execute the update_audio_features.sh script using bash update_audio_features.sh. Just like the previous script, this one does all the bookkeeping for you and tracks new and deleted .wav files.

Training and Evaluating Models

  • In order to train and evaluate the neural network models (DNNs and LSTMs), simply run the keras_regressors.py script. All parameters like network architecture, learning rate, etc. can be modified inside the file itself.
  • For the kNN and random forest regressor, use the sklearn_regressors.py file. Like before, all parameters can be set inside the script itself.

(Re-)Creating Plots

If you want to create plots from the resulting predictions (just like the ones seen in the thesis), take a look at the individual plotting scripts inside plot-scripts.

Demo

In order to evaluate the audio recordings inside wavs/demo, please use the script demo.py.

Acknowledgements

The code in the encoder directory, which generates the GE2E embeddings, is forked from Corentin Jemine (https://github.com/CorentinJ/Real-Time-Voice-Cloning) and available in a better documented format under the name Resemblyzer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.