GithubHelp home page GithubHelp logo

a-tabaza / binding_music Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 183.41 MB

Code and Models for Binding Text, Images, Graphs, and Audio for Music Representation Learning

Python 100.00%
joint-embedding multimodal-deep-learning music-information-retrieval

binding_music's Introduction

Binding Text, Images, Graphs, and Audio for Music Representation Learning

This repo is a work in progress. This repo contains the code for inference, and evaluation for the paper Binding Text, Images, Graphs, and Audio for Music Representation Learning

The current state of this repo is not ideal, to help you navigate around checkpoints and inference, please refer to the following sheet temporarily while we prepare this repo. The code for embedding Text and Images is availabe in the scripts folder. For Audio Embeddings, code is available here, for Graph Embeddings, code is available here

N.B. Fairouz refers to the codename given to the model we envisioned, this is an iteration, hopefully of many, it covers part of our vision, but nowhere near the full scope of what we aim to do with Fairouz

Abstract

In the field of Information Retrieval and Natural Language Processing, text embeddings play a significant role in tasks such as classification, clustering, and topic modeling. However, extending these embeddings to abstract concepts such as music, which involves multiple modalities, presents a unique challenge. Our work addresses this challenge by integrating rich multi-modal data into a unified joint embedding space. This space includes textual, visual, acoustic, and graph-based modality features. By doing so, we mirror cognitive processes associated with music interaction and overcome the disjoint nature of individual modalities. The resulting joint low-dimensional vector space facilitates retrieval, clustering, embedding space arithmetic, and cross-modal retrieval tasks. Importantly, our approach carries implications for music information retrieval and recommendation systems. Furthermore, we propose a novel multi-modal model that integrates various data types—text, images, graphs, and audio—for music representation learning. Our model aims to capture the complex relationships between different modalities, enhancing the overall understanding of music. By combining textual descriptions, visual imagery, graph-based structures, and audio signals, we create a comprehensive representation that can be leveraged for a wide range of music-related tasks. Notably, our model demonstrates promising results in music classification, recommendation systems.

Nomic Maps

Text Embedding Maps

Image Embedding Maps

Graph Embedding Maps

Audio Embedding Maps

Multimodal Embedding Maps

binding_music's People

Contributors

a-tabaza avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.