GithubHelp home page GithubHelp logo

heng-hw / v2a-mapper Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 0.0 348 KB

[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Home Page: https://v2a-mapper.github.io/

License: Other

audio audio-generation image-to-audio video-to-audio vision-to-audio aaai2024

v2a-mapper's Introduction

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

For benchmarking purpose, this repo hosts the generated test samples of "V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models", AAAI 2024. ([arXiv] [project])

Authors: Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, and Weidong Cai from University of Sydney and Dolby Laboratories.

teaser

Main Results

Compared to previous methods Im2Wav and CLIPSonic, our V2A-Mapper is trained with 86% fewer parameters but can achieve 53% and 19% improvement in Frechet Distance (FD, fidelity) and Clip-Score (CS, relevance), respectively. teaser

VGGSound

VGGSound contains 199,176 10-second video clips extracted from videos uploaded to YouTube with audio-visual correspondence. Following the original train/test split, we evaluate the performance on 15,446 test samples. Our generated test samples (~5G) for VGGSound can be downloaded from here.

ImageHear

To testify the generalization ability of our V2A-Mapper, we also test on out-of-distribution dataset ImageHear which contains 101 images from 30 visual classes (2-8 images per class). Our generated test samples (~33M) for ImageHear can be downloaded from here.

Custom Datasets

If you need sample results by V2A-Mapper for your own datasets, we are happy to generate that for you. Please send the request to [email protected] and [email protected].

Citation

If you find our work helpful in your research, please kindly cite our paper via:

@inproceedings{v2a-mapper,
  title     = {V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models},
  author    = {Wang, Heng and Ma, Jianbo and Pascual, Santiago and Cartwright, Richard and Cai, Weidong},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  year      = {2024},
}

Contact

If you have any questions or suggestions about this repo, please feel free to contact me! ([email protected])

v2a-mapper's People

Contributors

heng-hw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

v2a-mapper's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.