GithubHelp home page GithubHelp logo

joannahong / av-relscore Goto Github PK

View Code? Open in Web Editor NEW
25.0 2.0 2.0 24.68 MB

Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23

Python 100.00%
audio-visual avsr multimodal

av-relscore's Introduction

AV-RelScore

This code is part of the paper: Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring accepted at CVPR 2023.

Overview

This repository provides the audio-visual corruption modeling code for testing audio-visual speech recognition of LRS2 and LRS3 datasets. The video demo is available in here.

Prerequisite

  1. Python >= 3.6
  2. Clone this repository.
  3. Install python requirements.
pip install -r requirements.txt
  1. Download the LRS2-BBC and LRS3-TED datasets.
  2. Download the landmarks of LRS2 and LRS3 from this repository.
  3. Download coco_object.7z from here, extract, and put object_image_sr and object_mask_x4 in occlusion_patch folder.

Audio-Visual corruption modeling

  • We utilize babble noise from NOISEX-92 for the audio corruption modeling.
  • The occlusion patches for the visual corruption modeling are provided from this paper.
  • Please create the separate audio (.wav) files from the LRS2 and LRS3 video dataset.

Audio corruption modeling

  • LRS2
python LRS2_audio_gen.py --split_file <SPLIT-FILENAME-PATH> \
                         --LRS2_main_dir <DATA-DIRECTORY-PATH> \
                         --LRS2_save_loc <OUTPUT-DIRECTORY-PATH> \
                         --babble_noise <BABBLE-NOISE-LOCATION> \
  • LRS3
python LRS3_audio_gen.py --split_file <SPLIT-FILENAME-PATH> \
                         --LRS3_test_dir <DATA-DIRECTORY-PATH> \
                         --LRS3_save_loc <OUTPUT-DIRECTORY-PATH> \
                         --babble_noise <BABBLE-NOISE-LOCATION> \

Visual corruption modeling

  • LRS2
python LRS2_gen.py --split_file <SPLIT-FILENAME-PATH> \
                   --LRS2_main_dir <DATA-DIRECTORY-PATH> \
                   --LRS2_landmark_dir <LANDMARK-DIRECTORY-PATH> \
                   --LRS2_save_loc <OUTPUT-DIRECTORY-PATH> \
                   --occlusion <OCCLUSION-LOCATION> \
                   --occlusion_mask <OCCLUSION-MASK-LOCATION> \
  • LRS3
python LRS3_gen.py --split_file <SPLIT-FILENAME-PATH> \
                   --LRS3_test_dir <DATA-DIRECTORY-PATH> \
                   --LRS3_landmark_dir <LANDMARK-DIRECTORY-PATH> \
                   --LRS3_save_loc <OUTPUT-DIRECTORY-PATH> \
                   --occlusion <OCCLUSION-LOCATION> \
                   --occlusion_mask <OCCLUSION-MASK-LOCATION> \

Test datasets

Note that the extracted corrupted data may be different from the actual corrupted test datasets that we have used for the experiment. Since we use random function when modeling the audio-visual corruption, so it may not work the same on all devices.

Please request us ([email protected]) the actual test datasets for the fair comparisons.

Acknowledgement

We refer to Visual Speech Recognition for Multiple Languages for landmarks of the datasets and Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets for visual occlusion patches. We thank the authors for the amazing works.

Citation

If you find our AV-RelSocre useful in your research, please cite our paper.

@article{hong2023watch,
  title={Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring},
  author={Hong, Joanna and Kim, Minsu and Choi, Jeongsoo and Ro, Yong Man},
  journal={arXiv preprint arXiv:2303.08536},
  year={2023}
}

av-relscore's People

Contributors

joannahong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

av-relscore's Issues

Implementation of reliability scoring

Hello,

Do you plan to release the implementation of reliability scoring and training scripts? I'm considering evaluating your method with other datasets.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.