Hello! I'm trying to get the same array of data in NWaves as in Libr

Hi! <a href="https://github.com/ar1st0crat/NWaves/wiki/Notes-for-non

Thanks! I started from this page but no success. <code class="notran

How to get NWaves MFCC data similar to librosa about nwaves HOT 11 CLOSED

OrangeOlko commented on July 18, 2024

How to get NWaves MFCC data similar to librosa

from nwaves.

Comments (11)

ar1st0crat commented on July 18, 2024

Hi!

Check this page

Also, this seems strange:

Audio length is 371499 in librosa (len(audio)), 134784 in NWaves. File is mono.

What is the sampling rate and duration (in seconds) of the signal?

from nwaves.

OrangeOlko commented on July 18, 2024

Thanks! I started from this page but no success.

print(librosa.get_duration(audio))
16.848027210884354

sampling rate is 8000

from nwaves.

ar1st0crat commented on July 18, 2024

So the number of samples should be, indeed, int(16.848027210884354 * 8000) = 134784.

mfcc = librosa.feature.mfcc(y = audio, sr = 8000, n_mfcc=13, n_fft=1024, hop_length=int(np.floor(len(audio)/20)),
                                    dct_type=2, norm='ortho', htk=False, fmin=0, center=False, n_mels=128, window='hanning')

is equivalent to

int sr = 8000;           // sampling rate
int fftSize = 1024;
int filterbankSize = 128;

var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr);


int hopLength = <just specify here the value stored in _ int(np.floor(len(audio)/20)) _ >


var opts = new MfccOptions
{
    SamplingRate = sr,
    FrameDuration = (double)fftSize / sr,
    HopDuration = (double)hopLength / sr,
    FeatureCount = 13,
    Filterbank = melBank, 
    NonLinearity = NonLinearityType.ToDecibel,
    Window = WindowTypes.Hann,
    LogFloor = 1e-10f, 
    DctType="2N",
    LifterSize = 0
};

var extractor = new MfccExtractor(opts);

Note. Set center=False in librosa (as I explained in wiki).

PS. Your hop_length depends on len(audio), so specify its concrete value to avoid confusion.

from nwaves.

OrangeOlko commented on July 18, 2024

Thank you very much for the example!

I took this value from python debug code:
print(int(np.floor(len(audio)/20)))

int hopLength = 18574;

Also I set center=False

After all these I got
(13, 20) in Python Librosa
(13, 8) in NWaves

What else could be wrong?


 var left = waveFile[Channels.Left];
                int sr = 8000;           // sampling rate
                int fftSize = 1024;
                int filterbankSize = 128;

                var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr);
                
                int hopLength = 18574;


                   var opts = new MfccOptions
                   {
                       SamplingRate = sr,
                       FrameDuration = (double)fftSize / sr,
                       HopDuration = (double)hopLength / sr,
                       FeatureCount = 13,
                       FilterBank = melBank,
                       NonLinearity = NonLinearityType.ToDecibel,
                       Window = WindowTypes.Hann,
                       LogFloor = 1e-10f,
                       DctType = "2N",
                       LifterSize = 0
                   };

                var mfccExtractor = new MfccExtractor(opts);
                var mfccVectors = mfccExtractor.ComputeFrom(left);

from nwaves.

ar1st0crat commented on July 18, 2024

You need to find out why librosa returns 371499 samples. Because

the number of samples should be, indeed, int(16.848027210884354 * 8000) = 134784.

Also, do you understand what the hop_length is (both in librosa and in NWaves)? Currently you're trying to extract 20 short frames from a relatively long signal, and the distance between 2 adjacent frames is quite big as well (it's very unusual scenario)

UPD.
Seems like the signal is resampled at 22050 Hz during loading.

According to librosa docs

Audio will be automatically resampled to the given rate (default sr=22050). To preserve the native sampling rate of the file, use sr=None.

from nwaves.

OrangeOlko commented on July 18, 2024

Thanks, I will try to find out

from nwaves.

ar1st0crat commented on July 18, 2024

I've already found out (see my previous comment):

Audio will be automatically resampled to the given rate (default sr=22050). To preserve the native sampling rate of the file, use sr=None.

Simply set: librosa.load(..., sr=None)

from nwaves.

OrangeOlko commented on July 18, 2024

Thanks again for help!
You are right about sr=None, so now I have arrays of the same size in librosa and in NWaves, but data is different inside.
I tried to change all parameters but none of then brought me better result.

mfcc = librosa.feature.mfcc(y = audio, sr = sr, n_mfcc=13, n_fft=1024, hop_length=int(np.floor(len(audio)/20)), dct_type=2, norm='ortho', htk=False,fmin=0,center=False, n_mels=128, window='hanning')

from nwaves.

ar1st0crat commented on July 18, 2024

You need to analyze the results more carefully. Compare them frame by frame.
For example, here are the results of my experiments:

The values are very slightly different, and this is because of round-off errors. As we can see, the algorithm is implemented correctly. In the first frame of you signal (and many others as well) the first coeff seems very different, because the corresponding frame contains silence (sample values are very close to 0); essentially, in this case you have some big value in mfcc_0 and zeros in other coeffs (NWaves shows you 10e-5... 10e-7, but basically they are zeros); anyway, frames containing silence, most likely, will be discarded during feature analysis.

Also, read more about:

the first MFCC coeff; what to do with it;
filter banks and their settings (usually, 24 - 40 bands are enough; I don't know why librosa sets 128 by default);
window analysis and what is frame size / hop size

from nwaves.

OrangeOlko commented on July 18, 2024

Thank you very much for the details, I will investigate this!

from nwaves.

OrangeOlko commented on July 18, 2024

I wanted to post final solution and found errors in my code which might help others.

Audio file was 8 bit so librosa and windows libraries results were different in reading samples. File was converted to 16 bit.
Librosa by default loads data as float64, though for this file float32 is needed to get the same results. So call to load file was changed to use soundfile directly to change type parameter:

import soundfile as sf
audio, sr = sf.read(filename, dtype='float32')

As @ar1st0crat mentioned by default sample rate is set to 22500 in librosa, so sr = None can be applied while reading. By in my case we used soundfile which doesn't resample audio so no need in this parameter.
As first mfcc frame doesn't contain relevant information it can be omitted. But even it now contains similar results.

Librosa code:

mfcc = librosa.feature.mfcc(y = audio, sr = sr, center=False, hop_length=int(np.floor(len(audio)/ 20)), 
                                 n_mels=128,  n_fft = 1024, n_mfcc = 20, fmax = 4000, fmin = 0,norm = 'ortho',
                               window = 'hann', htk = False, power=1, dct_type=2)

NWaves code:

int fftSize = 1024;
int filterbankSize = 128;
var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr);
var hopCount = 20;
var hopLength = chunkData.Count / hopCount;
var opts = new MfccOptions
{
    SamplingRate = sr,
    FrameDuration = (double)((double)fftSize / (double)sr),
    HopDuration = ((double)(double)hopLength / (double)sr),
    FeatureCount = 20,
    FilterBank = melBank,
    NonLinearity = NonLinearityType.ToDecibel,
    Window = WindowTypes.Hann,
    LogFloor = 1e-10f,
    DctType = "2N",
    LifterSize = 0,
    FftSize = fftSize,
    HighFrequency =4000,
    SpectrumType = SpectrumType.Magnitude
};

var mfccExtractor = new MfccExtractor(opts);
var mfccVectors = mfccExtractor.ComputeFrom(chunkData.ToArray());
var mfccFlatten = new List<float>();

// remove 1 mfcc
for (int m = 1; m < 20; m++)
    for (int hop = hopCount; hop > 0; hop--)
        mfccFlatten.Add(mfccVectors[hopCount - hop][m]);

Results using NWaves:

Results using Librosa:

from nwaves.

How to get NWaves MFCC data similar to librosa about nwaves HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs