GithubHelp home page GithubHelp logo

p-hlp / distributed-source-separation Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 1.19 MB

Intelligent Sample Management and Processing

License: GNU General Public License v3.0

JavaScript 0.27% HTML 0.23% TypeScript 77.81% Dockerfile 0.49% Python 20.84% Shell 0.29% CSS 0.08%
artificial-intelligence audio bullmq demucs openai-whisper prisma react redis typescript vite

distributed-source-separation's Introduction

NeuraLib - Intelligent Sample Management and Processing

GUI Screenshot

What is it?

  • NeuraLib is a distributed sample management and processing platform
  • Leverages multiple state-of-the-art neural networks for audio processing
    • Source Separation (extract vocals, bass, drums and other using Demucs)
    • Vocal Transcription (using OpenAI Whisper)
    • Audio to Midi conversion (convert audio to midi, for further use in your DAW with Basic-Pitch)
  • Audio file library management (build up your library with music, stems and samples, then easily export it for further use)
  • Sample slicing (slice extracted stems further into individual samples e.g. drum hits, vocal chops, synth hits etc.)
  • Streamed playback (stream large audio files without needing to download the whole file)
  • Built in a scalable way using task-queues and workers

Why does it exist?

For music professionals, audio engineers and hobbyist:

  • Sampling has been a big part in music production for decades
  • To explore and deconstruct musical pieces
    • Remix tracks easily by separating music into its individual parts
    • Expand your sample library by slicing stems further into samples
    • Understand the meaning of a song by transcribing vocals
  • A central place for managing music and samples

For developers:

  • A playground for all things audio
    • web-audio
    • music information retrieval tasks
    • neural networks for audio
    • neural networks deployment / usage in an actual application
    • distributed systems dealing with audio processing
  • Offer easy to extend platform for experimentation with neural networks
    • create an endpoint, a task queue and a worker to easily add additional processing tasks
  • Learn to deal with long-running tasks
  • Experiment with Server-Sent Events

Features

  • Music / Sample Collection
    • Per user music/sample management
    • Upload/Download music
    • Manage uploaded library, extracted stems and samples
    • Export library or invidivual samples for local usage, e.g. DAW
    • Stream audio, extracted stems and samples directly from object storage
  • Sample Slicing
    • Users can slice extracted stems further into individual samples (drum hits, vocal chops, synth hits etc.)
    • Samples are automatically added to library and attached to parent audio
  • Source Separation
    • Separate uploaded music into individual stems (vocals, drums, bass, other)
    • Separation is done asynchronously
    • Using Demucs v4
  • Audio to Midi Conversion
    • Convert any audio file to midi
    • Conversion is done asynchronously
    • Using Basic Pitch (Spotify)
  • Audio to Text (Vocals)
    • Extract lyrics / text from vocals
    • Extraction is done asynchronously
    • Using OpenAI Whisper or one of the several other open-source models

System Components and Architecture

Components

Architecture Overview

Architecture Overview

Authentication/Authoriziation

This application uses Auth0 as an identity provider and general authentication/authorization platform. See AUTHENTICATION for further information about the used authorization code flow.

How to get started

Check out the USAGE section for everything you need to get started.

Update History

As this isn't really a product ready for use there are no official changelogs. See commit history for recent changes.

Links

  • Demo Video TBD.

distributed-source-separation's People

Contributors

dependabot[bot] avatar p-hlp avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

distributed-source-separation's Issues

Introduce Library concept

Currently audio files exists on their own, they might have children audio files attached to them or a parent audio file. However there's no way to bundle a bunch of audio files together for export later.

Backend:

  • Add Library Schema to DB
  • Add endpoints for creating/updating/deleting library
  • Add endpoint for querying audio files within a library

Frontend:

  • Library Select / Create component in MenuBar
  • Audio files uploaded should be added to current selected library

Refactor routes in expressjs api gateway

Currently everything is in the index.ts file.
Routes should be modularized, similar with queue handling and registration. Need to find a way to make SSE connections available in every module.

Audio to Midi Worker

  • Use different conda environment
  • Uses Spotify Basic Pitch
  • Use Separation Worker as Template
  • Job should include : userId: string and audioFileId: string
  • Saves midi in objectStorage
  • Updates MidiFile in AudioFIle
  • Add endpoint to trigger audio to midi /to-midi, accepting the audioFileId
  • SSE Event:
    • eventName audio_to_midi
    • data
      • audioFileId: string
      • status: "done" | "inProgress" | "error"
      • progress?: number

Refactor web-client components

Currently everything is in App.tsx component. This should be modularized / refactored to make code more readable / maintainable.

Export files

Should be able to download everything that was created / uploaded as a compressed .zip file.

Folder structure should be like:

  • Folder (Name of full track)
    In each folder
  • Stem files
  • Midi files
  • transcription as .json file

Main FileList

NeuraLib

The main (left) file list should show all files in the library which dont have a parent. The first item should be for uploading purposes (for now only single item).

When a file is uploaded the current main-file query is invalidated, which should refetch all current files from the backend.
The backend endpoint should not overfetch, i.e. waveform data and children of the main audio files aren't needed, they will be fetched separately.

Dockerize Transcription Worker

Audio to Text Worker needs to be dockerized.
Docker image being build should include all needed dependencies / packages and installations for utilizing GPU processing.

Make SSE available everywhere in application

Every component should be able to receive Events that it needs, e.g. if a component wants to
'listen' to the separate events and react on incoming events it should be able to do so.

Improve AudioPlayer UI

Add the following functionalities to the audio player ui:

  • Volume Slider
  • Forward / Backewards 5 Sec
  • Add Region (stops play head, opens dialog for name input and create region)
  • Add Marker (stops play head, opens dialog for name input and create marker)
  • Show file name
  • Player should fill the UI

AppBar with Logout functionality

Application should have an Appbar which shows the Application Name, similar an avatar with name should be shown, which opens the menu on press. Currently the only menu item should be a Logout.

Pre-compute waveform peaks in backend

To be able to stream with WavesurverJs we need pre-computed peaks so a waveform can be rendered before the whole audio file is available on the client.

  • https://github.com/bbc/audiowaveform
  • When uploading | stemming the waveform data should be computed
  • TBD - How many samples per second are enough to display? Initial sample rate of uploaded file?
    • Compute length of audio file
    • Everything should be in-memory / no disk writes
  • Waveform data needs to be saved as either bytestream in minio or array in postgres

Audio to Text Worker

  • Use different conda environment
  • Job should include: userId: string and audioFileId: string
  • Uses OpenAI Whisper Model
  • Updates Text in AudioFile
  • SSE Event:
    • eventName audio_to_text
    • data
      • audioFileId: string
      • status: "done" | "inProgress" | "error"
      • progress?: number

Add slicing / tagging of audio files

Use WavesurferJs sections to tag/mark slices.
Slices aren't processed immediately the audio component seeks to the start-point when playing from a slice/section.

Needs an endpoint which accepts the tags as following format:

{ start: number; end?: number,  name: string; }

When only start is specified its seen as a marker, when both start and end are specified its a slice/region.

A field needs to be added to the AudioFile table which holds these marks. Simple array with the above format.

Might need to add wavesurfer zooming plugin as well.

Stories:

  • As a user I want to be able to select certain regions of the whole audio file.
  • As a user I want to be able to save the selected regions as separate audio files to my library.
  • As a user I want to mark a certain point of the audio file.

File Action Section

The following actions should be possible to do on files:

  • Separate
  • To Midi
  • Transcribe (only vocals)

Each action should register it's own listener on for the server sent events
Each action should have its own inProgress state and should be disabled while the action is in progress
When the action is finished the query cache should be invalidated so the data is re-fetched:

  • separate: invalidate ["childFiles", fileId"] key for the separated file
  • transcribe: invalidate ["transcription", "fileId"] key for the file transcription

To Midi for now should only download the midi file, no support for uploading midi yet.

To be able to tell which stem is what an enum on AudioFile needs to be introduced which holds whether a file isVocal.

Dockerize Audio to Midi Worker

Audio to Midi Worker needs to be dockerized.
Docker image being build should include all needed dependencies / packages and installations for utilizing GPU processing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.