GithubHelp home page GithubHelp logo

speech_emotion_recognition's Introduction

End-to-end Speech Emotion Recognition using BLSTMs with self-attention and Multi-domain training

This repository contains the source code for a Speech Emotion Recognition (SER) model built using Tensorflow 1.15 and a set of Python libraries that were used to create the different features of the project like: GUI, sound processing, recording and display of statistics.

This project represents my undergraduate thesis and is only a mere starting point, basis, to which many improvements can be applied.

projPic

The SER architecture uses machine learning models for both feature extraction and classification in an end-to-end setting:

  • For feature extraction I used two CNN layers combined with batch normalization. These were used to automatically detect relevant features from a visual representation of the sound, the Mel-spectrogram.

  • The classification model consists of two BLSTM cells, that were used to take advantage of the temporal relationship between emotions at different time stamps, as the emotional information flows throughout the speech utterance. The attention layer was concatenated to the RNN in order to force the model to focus on the highly emotional frames and, therefore, disregard the ones that lack emotions and would work as noise.

The user can train the model with different configurations and observe statistics (accuracy graph and confusion matrix) during training in the graphical user interface. For inference, the user can classify both pre-recorded speech utterances and ones that are recorded on the spot.

Project structure

The classification model can be found in model.py, while the feature extraction is separated in the feature_extractors/ directory. The feature extraction was implemented in two ways, hand-crafted and end-to-end, but only end-to-end is available for inference, as it gave better results with the current model.

The graphical user interface was created using the PyQt5 library and the source code can be found in graphics/ folder. The recording functionality was created using PyAudio and its implementation can be found in recording/.

Other libraries used were: Librosa (audio processing), qdarkgraystyle (GUI theme), webrtcvad (speech detection during recording).

In the directory best_current_model, one can find a pre-trained model that gave the best results in both testing and inference. However, this model is very sensitive to microphone and room settings and it still suffers from the obstacles of the SER field.

To run the project, one should have both Tensorflow 1.15 and the necessary libraries installed in a Python enviornement. By simply using the command "python SER.py" one can start the graphical user interface and start training or inferring using the current model architecture.

speech_emotion_recognition's People

Contributors

raulsteleac avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.