GithubHelp home page GithubHelp logo

0xironfox / dimensional-ser Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bagustris/dimensional-ser

0.0 0.0 0.0 5.86 MB

Repository for my paper: Dimensional Speech Emotion Recognition Using Acoustic Features and Word Embeddings using Multitask Learning

Home Page: https://www.cambridge.org/core/journals/apsipa-transactions-on-signal-and-information-processing/article/dimensional-speech-emotion-recognition-from-speech-features-and-word-embeddings-by-using-multitask-learning/BCBF69FBED76857F84090A2FB58B2498

Python 100.00%

dimensional-ser's Introduction

Dimensional Speech Emotion Recognition by Using Acoustic Features and Word Embeddings using Multitask Learning

by Bagus Tris Atmaja, Masato Akagi

This paper has been published in APSIPA Transaction on Signal and Information Processing.

Abstract

The majority of research in speech emotion recognition (SER) is conducted to recognize emotion categories. Recognizing dimensional emotion attributes is also important, however, and it has several advantages over categorical emotion. For this research, we investigate dimensional SER using both speech features and word embeddings. The concatenation network joins acoustic networks and text networks from bimodal features. We demonstrate that those bimodal features, both are extracted from speech, improve the performance of dimensional SER over unimodal SER either using acoustic features or word embeddings. A significant improvement on the valence dimension is contributed by the addition of word embeddings to SER system, while arousal and dominance dimensions are also improved. We proposed a multitask learning (MTL) approach for the prediction of all emotional attributes. This MTL maximizes the concordance correlation between predicted emotion degrees and true emotion labels simultaneously. The findings suggest that the use of MTL with two parameters is better than other evaluated methods in representing the interrelation of emotional attributes. In unimodal results, speech features attain higher performance on arousal and dominance, while word embeddings are better for predicting valence. The overall evaluation uses the concordance correlation coefficient score of the three emotional attributes. We also discuss some differences between categorical and dimensional emotion results from psychological and engineering perspectives.

Software implementation

The implementation of the algorithm proposed in the paper was conducted using Numpy, Keras (v2.3), and Tensorflow (v1.15).

All source code used to generate the results and figures in the paper are in the code folder. The calculations and figure generation are all run inside Jupyter notebooks. The data used in this study is provided in data and the sources for the manuscript text and figures are in latex. Results generated by the code are saved in results. See the README.md files in each directory for a full description.

Architecture of the proposed dimensional SER with the main results.

Getting the code

You can download a copy of all the files in this repository by cloning the git repository:

git clone https://github.com/bagustris/dimensional-ser.git

or download a zip archive.

A copy of the paper is also archived at https://doi.org/10.1017/ATSIP.2020.14

Dependencies

You'll need a working Python environment to run the code. The recommended way to set up your environment is through the Anaconda Python distribution which provides the conda package manager. Anaconda can be installed in your user directory and does not interfere with the system Python installation. The required dependencies are specified in the file requirements.txt.

We use pip virtual environments to manage the project dependencies in isolation. Thus, you can install our dependencies without causing conflicts with your setup (even with different Python versions).

Run the following command in the repository folder (where environment.yml is located) to create a separate environment and install all required dependencies in it:

pip3.6 venv REPO_NAME

Reproducing the results

Since the dataset is not included, it is difficult to reproduce the result. However, the plot in the paper can be reproduced from the csv file in data directory.

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE.md for the full license text.

The manuscript text is not open source. The authors reserve the rights to the article content, which is currently published in the journal of APSIPA Transaction on Signal and Information Processing.

Citation

B. T. Atmaja and M. Akagi, “Dimensional speech emotion recognition from speech 
features and word embeddings by using multitask learning,” APSIPA Transactions 
on Signal and Information Processing, vol. 9, p. e17, 2020.

dimensional-ser's People

Contributors

bagustris avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.