Binding Text, Images, Graphs, and Audio for Music Representation Learning

This repo is a work in progress. This repo contains the code for inference, and evaluation for the paper Binding Text, Images, Graphs, and Audio for Music Representation Learning

The current state of this repo is not ideal, to help you navigate around checkpoints and inference, please refer to the following sheet temporarily while we prepare this repo. The code for embedding Text and Images is availabe in the scripts folder. For Audio Embeddings, code is available here, for Graph Embeddings, code is available here

N.B. Fairouz refers to the codename given to the model we envisioned, this is an iteration, hopefully of many, it covers part of our vision, but nowhere near the full scope of what we aim to do with Fairouz

Abstract

In the field of Information Retrieval and Natural Language Processing, text embeddings play a significant role in tasks such as classification, clustering, and topic modeling. However, extending these embeddings to abstract concepts such as music, which involves multiple modalities, presents a unique challenge. Our work addresses this challenge by integrating rich multi-modal data into a unified joint embedding space. This space includes textual, visual, acoustic, and graph-based modality features. By doing so, we mirror cognitive processes associated with music interaction and overcome the disjoint nature of individual modalities. The resulting joint low-dimensional vector space facilitates retrieval, clustering, embedding space arithmetic, and cross-modal retrieval tasks. Importantly, our approach carries implications for music information retrieval and recommendation systems. Furthermore, we propose a novel multi-modal model that integrates various data types—text, images, graphs, and audio—for music representation learning. Our model aims to capture the complex relationships between different modalities, enhancing the overall understanding of music. By combining textual descriptions, visual imagery, graph-based structures, and audio signals, we create a comprehensive representation that can be leveraged for a wide range of music-related tasks. Notably, our model demonstrates promising results in music classification, recommendation systems.

a-tabaza / binding_music Goto Github PK

binding_music's Introduction

Binding Text, Images, Graphs, and Audio for Music Representation Learning

Abstract

Nomic Maps

Text Embedding Maps

Image Embedding Maps

Graph Embedding Maps

Audio Embedding Maps

Multimodal Embedding Maps

binding_music's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs