GithubHelp home page GithubHelp logo

seanwood / gcc-nmf Goto Github PK

View Code? Open in Web Editor NEW
311.0 12.0 133.0 44.28 MB

Real-time GCC-NMF Blind Speech Separation and Enhancement

License: MIT License

Python 100.00%
speech-separation speech-enhancement gcc-nmf nmf real-time real-time-processing speech speech-processing cross-correlation generalized-cross-correlation

gcc-nmf's Introduction

GCC-NMF

GCC-NMF is a blind source separation and denoising algorithm that combines the GCC spatial localization method with the NMF unsupervised dictionary learning algorithm. GCC-NMF has been used for stereo speech separation and enhancement in both offline and real-time settings. Though we have focused on speech applications so far, GCC-NMF is a generic source separation and denoising algorithm and may well be applicable to other types of signals.

This GitHub repository provides:

  1. A standalone Python executable to execute and visualize GCC-NMF in real-time.

  2. A series of iPython notebooks notebooks presenting GCC-NMF in tutorial style, building towards the low latency, real-time context:

Journal Papers

Conference Papers

Real-time Speech Enhancement: RT-GCC-NMF

The Real-time Speech Enhancement standalone Python executable is an implementation of the RT-GCC-NMF real-time speech enhancement algorithm. Users may interactively modify system parameters including the NMF dictionary size and GCC-NMF masking function parameters, where the effects on speech enhancement quality may be heard in real-time.

png

Offline Speech Separation

The Offline Speech Separation iPython notebook shows how GCC-NMF can be used to separate multiple concurrent speakers in an offline fashion. The NMF dictionary is first learned directly from the mixture signal, and sources are subsequently separated by attributing each atom at each time to a single source based on the dictionary atoms' estimated time delay of arrival (TDOA). Source localization is achieved with GCC-PHAT.

png

Offline Speech Enhancement

The Offline Speech Enhancement iPython notebook demonstrates how GCC-NMF can can be used for offline speech enhancement, where instead of multiple speakers, we have a single speaker plus noise. In this case, individual atoms are attributed either to the speaker or to noise at each point in time base on the the atom TDOAs as above. The target speaker is again localized with GCC-PHAT.

png

Online Speech Enhancement

The Online Speech Enhancement iPython notebook demonstrates an online variant of GCC-NMF that works in a frame-by-frame fashion to perform speech enhancement in real-time. Here, the NMF dictionary is pre-learned from a different dataset than used at test time, NMF coefficients are inferred frame-by-frame, and speaker localization is performed with an accumulated GCC-PHAT method.

png

Low Latency Speech Enhancement

In the Low Latency Speech Enhancement iPython notebook we extend the online GCC-NMF approach to reduce algorithmic latency via asymmetric STFT windowing strategy. Long analysis windows maintain the high spectral resolution required by GCC-NMF, while short synthesis windows drastically reduce algorithmic latency with little effect on speech enhancement quality. Algorithmic latency can be reduced from over 64 ms using traditional symmetric STFT windowing to below 2 ms with the proposed asymmetric STFT windowing, provided sufficient computational power is available.

png

gcc-nmf's People

Contributors

seanwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcc-nmf's Issues

real-time gcc-nmf error?

I am trying to make the real-time gcc-nmf work, I have the correct data paths, etc, the script created the pretrained files. The following error appears after starting the script after:

C:\gccNMF>python demo5.py
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

WARNING:theano.sandbox.cuda:The cuda backend is deprecated and will be removed i
n the next release (v0.10). Please switch to the gpuarray backend. You can get
more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

WARNING:theano.sandbox.cuda:The cuda backend is deprecated and will be removed i
n the next release (v0.10). Please switch to the gpuarray backend. You can get
more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Python27\lib\multiprocessing\forking.py", line 509, in prepare
'parents_main', file, path_name, etc
File "C:\gccNMF\demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 258, in init
cmd = get_command_line() + [rhandle]
File "C:\Python27\lib\multiprocessing\forking.py", line 358, in get_command_li
ne
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.

        This probably means that you are on Windows and you have
        forgotten to use the proper idiom in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce a Windows executable.

Traceback (most recent call last):
File "demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 277, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 568, in save_tuple
save(element)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 492, in save_string
self.write(BINSTRING + pack("<i", n) + obj)
IOError: [Errno 22] Invalid argument

what is wrong? thanks!

gccNMF model

hi
I try to run the file "runGccNMF.py", it shows "No module named 'gccNMF'".
I'm not sure how to fix this problem, if you can, please tell me the method to solve this problem.

best

Is there a logical error?

In the processFrames method of GCCNMFProcessor class, the targetTDOAIndex is set after the calling of getTFMask method, which means that the mask is got using the target TDOA of last 6 frames, instead of the latest 6 frames including the current frame.
My English is not very good, please forgive me.

about offline speech separation

”Due to differences in TDOA estimation for the 1m and 5cm microphone separation settings, including increased spatial aliasing and lower spatial resolution in the 5cm case,we also compared results averaged over these two settings separately. While we found somewhat decreased scores and increased variance in the 5cm case, the results were generally comparable.“
In the task of offline voice separation, do you need to make changes in the code for 5cm and 1m voices? Why I ran your code and the 1m data can be separated correctly, but all 5cm voices cannot get the correct results. Only one or two peak points can be obtained (the truth is that there should be three sources).
My English is not very good, these English are from Google Translate, please forgive me

Why gain factor when reconstructing the signal?

Hi seanwood @seanwood , I'm a junior in BSS and thank you for your useful and effective open-source GCC_NMF. I happened to came across an unknown parameter when applying istft and reconstructing waveform:

def getTargetSignalEstimates(targetSpectrogramEstimates, windowSize, hopSize, windowFunction,numSamples):
    numTargets, numChannels, numFreq, numTime = targetSpectrogramEstimates.shape
    stftGainFactor = hopSize / float(windowSize) * 2
......
    return array(targetSignalEstimates) * stftGainFactor

Of course the outcome is good, but I really don't know why multiplying 2*hop_size/n_fft. Could you please give an explanation? Thank you.

Synthesis window in lowLatencySpeechEnhancement.ipynb

Hi,

is there a reason why the synthesis window is not applied?

See also the attached sketch based on the lowLatencySpeechEnhancement.ipynb example.

fix = False

  • your version
  • clearly visible modulation in the OLA output
  • output gain != 1

fix = True

  • adequate hop size
  • unity gain
  • synthesis window after irfft

orig

fixed

from matplotlib.pyplot import *
from numpy import *
from numpy.fft import rfft, irfft

# Apply the fix
fix = False

# Preprocessing params
fftSize = 1024

# Asymmetric windowing params
analysisWindowSize = fftSize
synthesisWindowSize = 128

asymmetricHopSize = synthesisWindowSize // 4 if fix else (synthesisWindowSize * 3) // 4
m = synthesisWindowSize // 2
k = analysisWindowSize
d = 0

# Symmetric windowing params
symmetricWindowSize = fftSize
symmetricHopSize = asymmetricHopSize # to better compare results

# Generate test signal
stereoSamples = ones((1, fftSize*10))
numChannels, numSamples = stereoSamples.shape

def getAsymmetricAnalysisWindow(k, m, d):
    risingSqrtHann = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
    fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )

    window = zeros(k)
    window[:d] = 0
    window[d:k-m] = risingSqrtHann[:k-m-d]
    window[k-m:] = fallingSqrtHann[-m:]

    return window

def getAsymmetricSynthesisWindow(k, m, d):
    risingSqrtHannAnalysis = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
    risingNoramlizedHann = hanning(2*m+1)[:m] / risingSqrtHannAnalysis[k-2*m-d:k-m-d]
    fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )

    window = zeros(k)
    window[:-2*m] = 0
    window[-2*m:-m] = risingNoramlizedHann
    window[-m:] = fallingSqrtHann[-m:]

    return window

def performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, hopSize):
    # Setup variables to save speech enhancement results
    numFrequencies = len(rfft(zeros(len(analysisWindow))))
    numFrames = (numSamples-len(synthesisWindow)) // hopSize

    if fix:
        gainFactor = hopSize / sum(analysisWindow * synthesisWindow)
    else:
        gainFactor = hopSize / float(len(synthesisWindow)) * 2

    targetEstimateSamplesOLA = zeros_like(stereoSamples)
    inputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')
    outputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')

    for frameIndex in range(numFrames):
        # compute FFT
        frameStart = frameIndex * hopSize
        frameEnd = frameStart + analysisWindowSize
        stereoSTFTFrame = rfft( stereoSamples[:, frameStart:frameEnd] * analysisWindow )
        inputSpectrogram[..., frameIndex] = stereoSTFTFrame
        outputSpectrogram[..., frameIndex] = stereoSTFTFrame

        # reconstruct time domain samples
        recStereoSTFTFrame = irfft(stereoSTFTFrame)

        if fix:
            # apply synthesis window as well
            recStereoSTFTFrame *= synthesisWindow

        # overlap-add to output samples
        targetEstimateSamplesOLA[:, frameStart:frameEnd] += recStereoSTFTFrame

    targetEstimateSamplesOLA *= gainFactor

    return inputSpectrogram, outputSpectrogram, targetEstimateSamplesOLA

analysisWindow = getAsymmetricAnalysisWindow(k, m, d)
synthesisWindow = getAsymmetricSynthesisWindow(k, m, d)

symmetricWindow = sqrt(hanning(symmetricWindowSize))

symmetricResults = performOnlineSpeechEnhancement(symmetricWindow, symmetricWindow, symmetricHopSize)
asymmetricResults = performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, asymmetricHopSize)

title('fixed' if fix else 'orig')
plot(symmetricResults[-1][-1], label='symmetric', color='b', alpha=0.5)
plot(asymmetricResults[-1][-1], label='asymmetric', color='r', alpha=0.5)
legend()
show()

two small problems?

First problem is that the low latency algorithm works in terms of working, but the output for both symmetric and asymmetric is silence, no sound? I copied the exact code from the python books.

Second problem is with the low latency and online speech enhancement algorithms, compared to the first two algorithms, that output the correct WAV format, these 2 last ones output all good, except that the bit depth is doubled for some reason? So instead of input signed 16-bit WAV, I get float 32bit WAV output? how to fix this?

thanks!

Any Audio demo?

Hi,
I'm new to audio enhancement, and there seems a lot to learn, any Audio sample or pre trained model for fast evaluation?
Thanks

argmax error

I am running python 2.7 64-bit with latest scipy and numpy.

The first demo works with the multiple speakers, but for the speech enhancement task, I get error:
argMaxGCCNMF = argmax(gccNMF, axis=1)
NameError: name 'argmax' is not defined

how to fix this? Thank you!

Preprocessing for chimeTrainSet.npy

I am interested in making trainSet.npy for 'onlineSpeechEnhancement'

What is the way to make trainSet for prelearning Dictionary?

How can i make trainset with other wav files?

Thank you.

runRealtimeGCCNMF

Hello,

I have a question to fix my problem. When I run 'runRealtimeGCCNMF.py' and hit the play button, a variable 'realGCC' in 'gccNMFProcessor.py' at 'processFrames' has only NaN value for default input.
'dev_Sq1_Co_A_mix.wav'

How can I fix it?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.