Topological Data Analysis for Speech Emotion Recognition

Course Project

By Akhmed Sakip, Noor Hussein, Nurdaulet Mukhituly

Downloading & extracting the dataset

The project uses the RAVDESS dataset. To download it, head here and download the dataset. Create a folder named 'RAVDESS' and extract the archive's contents there.
The resulting data paths should look like 'RAVDESS / Actor_** / *.wav'.

Downloading additional data before running the project

To avoid generating persistence images from scratch in generate_persims.py, download the contents (<2 GB) of this folder and place the 2 files (features_ts2pi.txt, labels_ts2pi.csv) directly to the project's folder (alongside with other project files).

Packages to install

The project was tested on Python 3.10.4, so to be on the safe side, it is preferable to use one of the releases of Python 3.10.
Once you have created an environment running Python 3.10, make sure to install the following packages using pip:

NumPy:
pip install numpy

pandas:
pip install pandas

scikit-learn:
pip install scikit-learn

librosa:
pip install librosa

gudhi:
pip install gudhi

giotto-tda:
pip install giotto-tda

tqdm:
pip install tqdm

tensorflow:
pip install tensorflow

Note: if you need GPU acceleration, follow the instructions here. Otherwise, you had better run the TensorFlow code (vgg16_ts2pi.py) on Kaggle, since setting up GPU acceleration on the university's lab computers is complicated and problematic.

Warning: ts2pi cannot be installed alongside the newest versions of the above packages and is thus incompatible with them. If you wish to regenerate persistence images without downloading them from the link provided above, create a new environment and install ONLY the package below, so that it will pull the specific versions by itself.
ts2pi:
pip install ts2pi

Repository contents

The project directory consists of the following Python files:

load_dataset.py - conveniently formats the audio paths and sorts them according to emotions, saving the results to audio_paths.csv. The code for processing the dataset folder is taken from here.

experiments_betti.py - runs classification experiments on Betti curves generated using the gudhi package.

experiments_giottobetti.py - runs classification experiments on Betti curves generated using manual preprocessing and the giotto-tda package.

experiments_landscapes.py - runs classification experiments on persistence landscapes generated using manual preprocessing and the giotto-tda package.

experiments_pairwise.py - runs classification experiments on pairwise distances between persistence diagrams generated using manual preprocessing and the giotto-tda package.

vgg16_ts2pi.py - runs classification experiments by training the VGG16 CNN with persistence images generated using the ts2pi package.

generate_betticurves.py - generates Betti curves for the audio files using the gudhi package by following the approach outlined here. The results are stored in features_gudhi.csv.

generate_giotto.py - generates Betti curves and persistence landscapes for the audio files using manual preprocessing and the giotto-tda package. The results are stored in features_giottobetti.csv and features_landscapes.csv.

generate_pairwise.py - calculates pairwise distances for the persistence diagrams of the audio files using manual preprocessing and the giotto-tda package. The results are stored in features_pairwise.csv.

generate_persims.py - generates persistence images for the audio files using the ts2pi package. The results are stored in features_ts2pi.txt and labels_ts2pi.csv. To download them, refer above.

Replication of experiments

To replicate the classification experiments, simply run the corresponding Python files named experiments_*.py and vgg16_ts2pi.py inside an environment with installed packages. All the sample preprocessed data is already provided in this repository (except for VGG16, which should be downloaded referring to the above instructions), so you do not have to rerun the generate_*.py files.

Warning: the Python files are set to run experiments for 100 train-test splits, which could take considerable time. To reduce the number of splits, simply change the NUM_SPLITS in the files' beginnings to a smaller number.

In case you wish to run the preprocessing files, first run load_dataset.py and proceed to running the generate_*.py files.

References

[1] M. Tlachac, A. Sargent, E. Toto, R. Paffenroth and E. Rundensteiner, "Topological Data Analysis to Engineer Features from Audio Signals for Depression Detection," 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020, pp. 302-307, doi: 10.1109/ICMLA51294.2020.00056.
[2] The GUDHI Project, GUDHI User and Reference Manual. GUDHI Editorial Board, 2015. [Online]. Available: http://gudhi.gforge.inria.fr/doc/latest/
[3] G. Tauzin et al., giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration. 2020. [Online]. Available: https://giotto-ai.github.io/gtda-docs/0.5.1/library.html
[4] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition.” arXiv, 2014. doi: 10.48550/ARXIV.1409.1556.

akhmedsakip / tda-emotion-recognition Goto Github PK

tda-emotion-recognition's Introduction

Topological Data Analysis for Speech Emotion Recognition

Course Project

By Akhmed Sakip, Noor Hussein, Nurdaulet Mukhituly

Downloading & extracting the dataset

Downloading additional data before running the project

Packages to install

Repository contents

Replication of experiments

References

tda-emotion-recognition's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs