GithubHelp home page GithubHelp logo

shourya1997 / multimodal-speech-emotion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from david-yoon/multimodal-speech-emotion

0.0 2.0 0.0 234 KB

TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text", IEEE SLT-18

Home Page: https://arxiv.org/abs/1810.04635

License: MIT License

Jupyter Notebook 75.48% Python 22.08% Shell 0.42% PHP 1.82% C++ 0.20%

multimodal-speech-emotion's Introduction

multimodal-speech-emotion

This repository contains the source code used in the following paper,

Multimodal Speech Emotion Recognition using Audio and Text, IEEE SLT-18, [paper]


[requirements]

tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
python==2.7
scikit-learn==0.20.0
nltk==3.3

[download data corpus]

  • IEMOCAP [link] [paper]
  • download IEMOCAP data from its original web-page (license agreement is required)

[preprocessed-data schema (our approach)]

  • for the preprocessing, refer to codes in the "./preprocessing"
  • If you want to download the "preprocessed corpus" from us directly, please send us an email after getting the license from IEMOCAP team.
  • We cannot publish ASR-processed transcription due to the license issue (commercial API), however, we assume that it is moderately easy to extract ASR-transcripts from the audio signal by oneself. (we used google-cloud-speech-api)
  • Examples

    MFCC : MFCC features of the audio signal (ex. train_audio_mfcc.npy)
    MFCC-SEQN : valid lenght of the sequence of the audio signal (ex. train_seqN.npy)
    PROSODY : prosody features of the audio signal (ex. train_audio_prosody.npy)
    LABEL : targe label of the audio signal (ex. train_label.npy)
    TRANS : sequences of trasnciption (indexed) of a data (ex. train_nlp_trans.npy)

[source code]

  • repository contains code for following models

    Audio Recurrent Encoder (ARE)
    Text Recurrent Encoder (TRE)
    Multimodal Dual Recurrent Encoder (MDRE)
    Multimodal Dual Recurrent Encoder with Attention (MDREA)


[training]

  • refer "reference_script.sh"
  • fianl result will be stored in "./TEST_run_result.txt"

[cite]

  • Please cite our paper, when you use our code | model | dataset

    @inproceedings{yoon2018multimodal,
    title={Multimodal Speech Emotion Recognition Using Audio and Text},
    author={Yoon, Seunghyun and Byun, Seokhyun and Jung, Kyomin},
    booktitle={2018 IEEE Spoken Language Technology Workshop (SLT)},
    pages={112--118},
    year={2018},
    organization={IEEE}
    }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.