p-hlp / distributed-source-separation Goto Github PK

View Code? Open in Web Editor NEW

3.0 1.0 0.0 1.19 MB

Intelligent Sample Management and Processing

License: GNU General Public License v3.0

JavaScript 0.27% HTML 0.23% TypeScript 77.81% Dockerfile 0.49% Python 20.84% Shell 0.29% CSS 0.08%

artificial-intelligence audio bullmq demucs openai-whisper prisma react redis typescript vite

distributed-source-separation's Introduction

NeuraLib - Intelligent Sample Management and Processing

What is it?

NeuraLib is a distributed sample management and processing platform
Leverages multiple state-of-the-art neural networks for audio processing
- Source Separation (extract vocals, bass, drums and other using Demucs)
- Vocal Transcription (using OpenAI Whisper)
- Audio to Midi conversion (convert audio to midi, for further use in your DAW with Basic-Pitch)
Audio file library management (build up your library with music, stems and samples, then easily export it for further use)
Sample slicing (slice extracted stems further into individual samples e.g. drum hits, vocal chops, synth hits etc.)
Streamed playback (stream large audio files without needing to download the whole file)
Built in a scalable way using task-queues and workers

Why does it exist?

For music professionals, audio engineers and hobbyist:

Sampling has been a big part in music production for decades
To explore and deconstruct musical pieces
- Remix tracks easily by separating music into its individual parts
- Expand your sample library by slicing stems further into samples
- Understand the meaning of a song by transcribing vocals
A central place for managing music and samples

For developers:

A playground for all things audio
- web-audio
- music information retrieval tasks
- neural networks for audio
- neural networks deployment / usage in an actual application
- distributed systems dealing with audio processing
Offer easy to extend platform for experimentation with neural networks
- create an endpoint, a task queue and a worker to easily add additional processing tasks
Learn to deal with long-running tasks
Experiment with Server-Sent Events

Features

Music / Sample Collection
- Per user music/sample management
- Upload/Download music
- Manage uploaded library, extracted stems and samples
- Export library or invidivual samples for local usage, e.g. DAW
- Stream audio, extracted stems and samples directly from object storage
Sample Slicing
- Users can slice extracted stems further into individual samples (drum hits, vocal chops, synth hits etc.)
- Samples are automatically added to library and attached to parent audio
Source Separation
- Separate uploaded music into individual stems (vocals, drums, bass, other)
- Separation is done asynchronously
- Using Demucs v4
Audio to Midi Conversion
- Convert any audio file to midi
- Conversion is done asynchronously
- Using Basic Pitch (Spotify)
Audio to Text (Vocals)
- Extract lyrics / text from vocals
- Extraction is done asynchronously
- Using OpenAI Whisper or one of the several other open-source models

System Components and Architecture

Components

Architecture Overview

Authentication/Authoriziation

This application uses Auth0 as an identity provider and general authentication/authorization platform. See AUTHENTICATION for further information about the used authorization code flow.

How to get started

Check out the USAGE section for everything you need to get started.

Update History

As this isn't really a product ready for use there are no official changelogs. See commit history for recent changes.

distributed-source-separation's People

Contributors

Stargazers

Watchers

distributed-source-separation's Issues

Introduce Library concept

Currently audio files exists on their own, they might have children audio files attached to them or a parent audio file. However there's no way to bundle a bunch of audio files together for export later.

Backend:

Add Library Schema to DB
Add endpoints for creating/updating/deleting library
Add endpoint for querying audio files within a library

Frontend:

Library Select / Create component in MenuBar
Audio files uploaded should be added to current selected library

Refactor routes in expressjs api gateway

Currently everything is in the index.ts file.
Routes should be modularized, similar with queue handling and registration. Need to find a way to make SSE connections available in every module.

Audio to Midi Worker

Use different conda environment
Uses Spotify Basic Pitch
Use Separation Worker as Template
Job should include : userId: string and audioFileId: string
Saves midi in objectStorage
Updates MidiFile in AudioFIle
Add endpoint to trigger audio to midi /to-midi, accepting the audioFileId
SSE Event:
- eventName audio_to_midi
- data
  - audioFileId: string
  - status: "done" | "inProgress" | "error"
  - progress?: number

Refactor web-client components

Currently everything is in App.tsx component. This should be modularized / refactored to make code more readable / maintainable.

Integrate WavesurferJs for rendering / streaming audio files

The Wavesurfer component should use the in #80 introduced pre-computed peaks and file length to render the waveform.

Export files

Should be able to download everything that was created / uploaded as a compressed .zip file.

Folder structure should be like:

Folder (Name of full track)
In each folder
Stem files
Midi files
transcription as .json file

Main FileList

The main (left) file list should show all files in the library which dont have a parent. The first item should be for uploading purposes (for now only single item).

When a file is uploaded the current main-file query is invalidated, which should refetch all current files from the backend.
The backend endpoint should not overfetch, i.e. waveform data and children of the main audio files aren't needed, they will be fetched separately.

Dockerize Transcription Worker

Audio to Text Worker needs to be dockerized.
Docker image being build should include all needed dependencies / packages and installations for utilizing GPU processing.

Make SSE available everywhere in application

Every component should be able to receive Events that it needs, e.g. if a component wants to
'listen' to the separate events and react on incoming events it should be able to do so.

Improve AudioPlayer UI

Add the following functionalities to the audio player ui:

Volume Slider
Forward / Backewards 5 Sec
Add Region (stops play head, opens dialog for name input and create region)
Add Marker (stops play head, opens dialog for name input and create marker)
Show file name
Player should fill the UI

Use audio-buffer directly in audio to midi worker

AppBar with Logout functionality

Application should have an Appbar which shows the Application Name, similar an avatar with name should be shown, which opens the menu on press. Currently the only menu item should be a Logout.

Use audio-buffer directly in transcription worker

Pre-compute waveform peaks in backend

To be able to stream with WavesurverJs we need pre-computed peaks so a waveform can be rendered before the whole audio file is available on the client.

https://github.com/bbc/audiowaveform
When uploading | stemming the waveform data should be computed
TBD - How many samples per second are enough to display? Initial sample rate of uploaded file?
- Compute length of audio file
- Everything should be in-memory / no disk writes
Waveform data needs to be saved as either bytestream in minio or array in postgres

Dockerize Api Gateway

Audio to Text Worker

Use different conda environment
Job should include: userId: string and audioFileId: string
Uses OpenAI Whisper Model
Updates Text in AudioFile
SSE Event:
- eventName audio_to_text
- data
  - audioFileId: string
  - status: "done" | "inProgress" | "error"
  - progress?: number

Add slicing / tagging of audio files

Use WavesurferJs sections to tag/mark slices.
Slices aren't processed immediately the audio component seeks to the start-point when playing from a slice/section.

Needs an endpoint which accepts the tags as following format:

{ start: number; end?: number,  name: string; }

When only start is specified its seen as a marker, when both start and end are specified its a slice/region.

A field needs to be added to the AudioFile table which holds these marks. Simple array with the above format.

Might need to add wavesurfer zooming plugin as well.

Stories:

As a user I want to be able to select certain regions of the whole audio file.
As a user I want to be able to save the selected regions as separate audio files to my library.
As a user I want to mark a certain point of the audio file.

File Action Section

The following actions should be possible to do on files:

Separate
To Midi
Transcribe (only vocals)

Each action should register it's own listener on for the server sent events
Each action should have its own inProgress state and should be disabled while the action is in progress
When the action is finished the query cache should be invalidated so the data is re-fetched:

separate: invalidate ["childFiles", fileId"] key for the separated file
transcribe: invalidate ["transcription", "fileId"] key for the file transcription

To Midi for now should only download the midi file, no support for uploading midi yet.

To be able to tell which stem is what an enum on AudioFile needs to be introduced which holds whether a file isVocal.

Dockerize Audio to Midi Worker

Audio to Midi Worker needs to be dockerized.
Docker image being build should include all needed dependencies / packages and installations for utilizing GPU processing