GithubHelp home page GithubHelp logo

hdparmar / tradifusion Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 8.93 MB

Tradi-fusion Refined: Evaluating and Fine-tuning the Riffusion Model for Irish Traditional Music.

Home Page: https://hdparmar.github.io/Tradifusion/

License: Apache License 2.0

Jupyter Notebook 99.54% Python 0.44% HTML 0.01%
diffusion-models machine-learning music riffusion

tradifusion's Introduction

Tradifusion

Tradifusion Refined: Evaluating and tuning the Riffusion Model for Irish Traditional Music.

Check out the samples generated by the model at hdparmar/github.io/Tradifusion

Focus of the project

The project will investigate the following research questions: The main question

  • Can the Riffusion model produce good results for generating Irish Traditional music that is similar?
    • Yeaapp!!
  • How close can we get with Riffusion Model?
    • Pretty close!
  • What challenges are involved in fine-tuning the model for Irish Traditional music?
    • Dataset creation, time taken to train for good results and resources.
  • Can the fine-tuned model generate Irish Traditional music that is comparable in quality to human-composed Irish Traditional music? If not, what is the reason?
    • It can produce music similar to Irish Traditional Music, comparable yes but not in the same quality as human-composed music.

The project is of interest to the field of Music Technology, Culture and Generative AI. It can be of interest to researchers, practitioners, and enthusiasts in these fields who are interested in exploring the possibilities of AI-generated music and its potential applications and limitations.

Sequential visualization of a diffusion process model fine-tuned on Irish traditional tune spectrograms, showing the transition from random noise at step 0 to structured data at step 50. The top row labeled 'Forward Process' shows the gradual formation of patterns, while the bottom row labeled 'Reverse Process' illustrates the deconstruction back to noise
Figure: Visualization of Diffusion Process on Irish Traditional Tunes Spectrogram

Testing the Inference pipeline ▶️

from inference import TradifusionPipeline

pipeline = TradifusionPipeline.load_checkpoint("hdparmar/tradfusion-v2")

# set your start and end prompts
start_prompt = "An Irish traditional tune"
end_prompt = "An Irish traditional tune with acoustic fiddle lead"

# a single mel-spectrogram image/sample based on the start and end prompts
generated_image = pipeline.tradfuse(start_prompt, 
                                    end_prompt, 
                                    num_inference_steps=50, 
                                    alpha=0.5)


# Generates audio based on the prompts, including interpolation steps as num_steps
# NOTE: Inference on CPU can take long time, avg 8 minutes for 1 image and audio
generated_image = pipeline.txt2audio_tradfusion(start_prompt, 
                                    end_prompt, 
                                    num_steps=2)

# All generated images, audio and combined audio from interpolation will be saved in local dir.

Training 🏋🏽

Train on 512 x 512 Spectrograms on recordings of Irish traditional music.

The dataset contains 512x512 images! Main Dataset Card (hugging-face) hdparmar/irish-traditional-tunes.

The fine-tuning training was done on multiple GPUs (NVIDIA GeForce RTX 3090 for Inference and RTX 6000 Ads for Training) with the use of NVIDIA NGC Tensorflow container.

The advantage of using is to avoid erros with cudnn library and errors concering not finding libdevice library under /usr/local/cuda. It also helps with the matching the compatible Tensorflow verison with CUDA and cuDNN.

Running on Jupyter Lab 📓

The file finetune_itt.ipynb can be used to play with the model, visualise the results and tweak the parameters using the config file spectrogram.yaml and see the outcome. Once you are satisfied with that, you can go forward and make a training script.

Checkpoints ⛳︎

The various checkpoints and metrics availble on Hugging Face, along with files:

Training files and metrics. Main Model: tradfusion-v2.

Acknowledgments

This project uses the following resources:

  • LambdaLabsML's stable-diffusion-finetuning to train the model.
  • RunPod for GPUs: 2 x RTX 6000 Ada
  • Dataset Obtained using riffusion-manilab and adoting it for this project (check the dataset folder).
  • Inference pipeline was adopted with modifications from courses, code resources and documents from Hugging Face's Diffusers library and Riffusion
    • For specifics, check out the documentation on Diffusion Model, StableDiffusionImg2Img pipeline in Diffusers and Riffusion repo.

Massive Thanks to all the original authors and contributors.

tradifusion's People

Contributors

hdparmar avatar

Stargazers

zhangkejiang avatar Myk Klemme avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.