GithubHelp home page GithubHelp logo

taoshi1998 / multiemo Goto Github PK

View Code? Open in Web Editor NEW
42.0 3.0 7.0 221.41 MB

MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations (ACL 2023)

Python 99.59% Shell 0.41%

multiemo's Introduction

MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations (ACL 2023)

Overview

This repository is the Pytorch implementation of ACL 2023 paper "MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations". In this work, we propose a novel attention-based correlation-aware multimodal fusion framework named MultiEMO, which effectively integrates multimodal cues by capturing cross-modal mapping relationships across textual, audio and visual modalities based on bidirectional multi-head crossattention layers.

Quick Start

Clone the repository

git clone https://github.com/TaoShi1998/MultiEMO-ACL2023.git

Environment setup

# Environment: Python 3.6.8 + Torch 1.10.0 + CUDA 11.3
# Hardware: single RTX 3090 GPU, 256GB RAM
conda create --name MultiEMOEnv python=3.6
conda activate MultiEMOEnv

Install dependencies

cd MultiEMO
pip install -r requirements.txt

Run the model

# IEMOCAP Dataset
bash Train/TrainMultiEMO_IEMOCAP.sh

# MELD Dataset
bash Train/TrainMultiEMO_MELD.sh

Citation

If you find our work helpful to your research, please cite our paper as follows.

@inproceedings{shi-huang-2023-multiemo,
    title = "{M}ulti{EMO}: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations",
    author = "Shi, Tao  and
      Huang, Shao-Lun",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.824",
    doi = "10.18653/v1/2023.acl-long.824",
    pages = "14752--14766",
    abstract = "Emotion Recognition in Conversations (ERC) is an increasingly popular task in the Natural Language Processing community, which seeks to achieve accurate emotion classifications of utterances expressed by speakers during a conversation. Most existing approaches focus on modeling speaker and contextual information based on the textual modality, while the complementarity of multimodal information has not been well leveraged, few current methods have sufficiently captured the complex correlations and mapping relationships across different modalities. Furthermore, existing state-of-the-art ERC models have difficulty classifying minority and semantically similar emotion categories. To address these challenges, we propose a novel attention-based correlation-aware multimodal fusion framework named MultiEMO, which effectively integrates multimodal cues by capturing cross-modal mapping relationships across textual, audio and visual modalities based on bidirectional multi-head cross-attention layers. The difficulty of recognizing minority and semantically hard-to-distinguish emotion classes is alleviated by our proposed Sample-Weighted Focal Contrastive (SWFC) loss. Extensive experiments on two benchmark ERC datasets demonstrate that our MultiEMO framework consistently outperforms existing state-of-the-art approaches in all emotion categories on both datasets, the improvements in minority and semantically similar emotions are especially significant.",
}

multiemo's People

Contributors

taoshi1998 avatar

Stargazers

 avatar  avatar  avatar BALAGANESH avatar Qiao Liang avatar Tiansheng Deng avatar  avatar  avatar Fei Wang avatar  avatar  avatar  avatar  avatar  avatar Lee avatar  avatar Hiiragi Utena avatar  avatar  avatar guanjiaqi avatar Jeff Carpenter avatar  avatar  avatar Chunfeng1994 avatar  avatar  avatar Feng Xiong avatar  avatar happy678jm avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar Jesse Annan avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

multiemo's Issues

Dataset processing

Hello, is it convenient for you to provide the code of the data processing part? How to generate these ".pkl "files.

visual feature process

I would like to utilize your VisExtNet for reproducing the extraction of visual features, but I've encountered a few issues. I noticed that the resnet_weight_path file path is not present in your code. Could you please guide me on where to locate it? Additionally, I would like to confirm whether video_path refers to the raw video data.

About audio feature extraction.

Hi, dear author
In your paper, you mentioned that the proposed VisExtNet is made up of a MTCNN and a ResNet-101 pre-trained on VGGFace2. I wonder where can I find the ResNet-101 pre-trained weights on VGGFace2, could you please provide a link of the pretrained model? Because it seems that the ResNet-101 in your repo is self-defined.

data processing

Thanks to release good work.
I wonder how to preprocess the multimodal datas.
Could you please provide the corrsponding code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.