Comments (11)
Hi!
Also, this seems strange:
Audio length is 371499 in librosa (len(audio)), 134784 in NWaves. File is mono.
What is the sampling rate and duration (in seconds) of the signal?
from nwaves.
Thanks! I started from this page but no success.
print(librosa.get_duration(audio))
16.848027210884354
sampling rate is 8000
from nwaves.
So the number of samples should be, indeed, int(16.848027210884354 * 8000) = 134784
.
mfcc = librosa.feature.mfcc(y = audio, sr = 8000, n_mfcc=13, n_fft=1024, hop_length=int(np.floor(len(audio)/20)),
dct_type=2, norm='ortho', htk=False, fmin=0, center=False, n_mels=128, window='hanning')
is equivalent to
int sr = 8000; // sampling rate
int fftSize = 1024;
int filterbankSize = 128;
var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr);
int hopLength = <just specify here the value stored in _ int(np.floor(len(audio)/20)) _ >
var opts = new MfccOptions
{
SamplingRate = sr,
FrameDuration = (double)fftSize / sr,
HopDuration = (double)hopLength / sr,
FeatureCount = 13,
Filterbank = melBank,
NonLinearity = NonLinearityType.ToDecibel,
Window = WindowTypes.Hann,
LogFloor = 1e-10f,
DctType="2N",
LifterSize = 0
};
var extractor = new MfccExtractor(opts);
Note. Set center=False
in librosa (as I explained in wiki).
PS. Your hop_length
depends on len(audio)
, so specify its concrete value to avoid confusion.
from nwaves.
Thank you very much for the example!
I took this value from python debug code:
print(int(np.floor(len(audio)/20)))
int hopLength = 18574;
Also I set center=False
After all these I got
(13, 20) in Python Librosa
(13, 8) in NWaves
What else could be wrong?
var left = waveFile[Channels.Left];
int sr = 8000; // sampling rate
int fftSize = 1024;
int filterbankSize = 128;
var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr);
int hopLength = 18574;
var opts = new MfccOptions
{
SamplingRate = sr,
FrameDuration = (double)fftSize / sr,
HopDuration = (double)hopLength / sr,
FeatureCount = 13,
FilterBank = melBank,
NonLinearity = NonLinearityType.ToDecibel,
Window = WindowTypes.Hann,
LogFloor = 1e-10f,
DctType = "2N",
LifterSize = 0
};
var mfccExtractor = new MfccExtractor(opts);
var mfccVectors = mfccExtractor.ComputeFrom(left);
from nwaves.
You need to find out why librosa returns 371499 samples. Because
the number of samples should be, indeed, int(16.848027210884354 * 8000) = 134784.
Also, do you understand what the hop_length
is (both in librosa and in NWaves)? Currently you're trying to extract 20 short frames from a relatively long signal, and the distance between 2 adjacent frames is quite big as well (it's very unusual scenario)
UPD.
Seems like the signal is resampled at 22050 Hz during loading.
According to librosa docs
Audio will be automatically resampled to the given rate (default sr=22050). To preserve the native sampling rate of the file, use sr=None.
from nwaves.
Thanks, I will try to find out
from nwaves.
I've already found out (see my previous comment):
Audio will be automatically resampled to the given rate (default sr=22050). To preserve the native sampling rate of the file, use sr=None.
Simply set: librosa.load(..., sr=None)
from nwaves.
Thanks again for help!
You are right about sr=None, so now I have arrays of the same size in librosa and in NWaves, but data is different inside.
I tried to change all parameters but none of then brought me better result.
mfcc = librosa.feature.mfcc(y = audio, sr = sr, n_mfcc=13, n_fft=1024, hop_length=int(np.floor(len(audio)/20)), dct_type=2, norm='ortho', htk=False,fmin=0,center=False, n_mels=128, window='hanning')
from nwaves.
You need to analyze the results more carefully. Compare them frame by frame.
For example, here are the results of my experiments:
The values are very slightly different, and this is because of round-off errors. As we can see, the algorithm is implemented correctly. In the first frame of you signal (and many others as well) the first coeff seems very different, because the corresponding frame contains silence (sample values are very close to 0); essentially, in this case you have some big value in mfcc_0 and zeros in other coeffs (NWaves shows you 10e-5... 10e-7, but basically they are zeros); anyway, frames containing silence, most likely, will be discarded during feature analysis.
Also, read more about:
- the first MFCC coeff; what to do with it;
- filter banks and their settings (usually, 24 - 40 bands are enough; I don't know why librosa sets 128 by default);
- window analysis and what is frame size / hop size
from nwaves.
Thank you very much for the details, I will investigate this!
from nwaves.
I wanted to post final solution and found errors in my code which might help others.
- Audio file was 8 bit so librosa and windows libraries results were different in reading samples. File was converted to 16 bit.
- Librosa by default loads data as float64, though for this file float32 is needed to get the same results. So call to load file was changed to use soundfile directly to change type parameter:
import soundfile as sf
audio, sr = sf.read(filename, dtype='float32')
- As @ar1st0crat mentioned by default sample rate is set to 22500 in librosa, so sr = None can be applied while reading. By in my case we used soundfile which doesn't resample audio so no need in this parameter.
- As first mfcc frame doesn't contain relevant information it can be omitted. But even it now contains similar results.
Librosa code:
mfcc = librosa.feature.mfcc(y = audio, sr = sr, center=False, hop_length=int(np.floor(len(audio)/ 20)),
n_mels=128, n_fft = 1024, n_mfcc = 20, fmax = 4000, fmin = 0,norm = 'ortho',
window = 'hann', htk = False, power=1, dct_type=2)
NWaves code:
int fftSize = 1024;
int filterbankSize = 128;
var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr);
var hopCount = 20;
var hopLength = chunkData.Count / hopCount;
var opts = new MfccOptions
{
SamplingRate = sr,
FrameDuration = (double)((double)fftSize / (double)sr),
HopDuration = ((double)(double)hopLength / (double)sr),
FeatureCount = 20,
FilterBank = melBank,
NonLinearity = NonLinearityType.ToDecibel,
Window = WindowTypes.Hann,
LogFloor = 1e-10f,
DctType = "2N",
LifterSize = 0,
FftSize = fftSize,
HighFrequency =4000,
SpectrumType = SpectrumType.Magnitude
};
var mfccExtractor = new MfccExtractor(opts);
var mfccVectors = mfccExtractor.ComputeFrom(chunkData.ToArray());
var mfccFlatten = new List<float>();
// remove 1 mfcc
for (int m = 1; m < 20; m++)
for (int hop = hopCount; hop > 0; hop--)
mfccFlatten.Add(mfccVectors[hopCount - hop][m]);
from nwaves.
Related Issues (20)
- Can't read output of AudioRecorder Xamarin forms HOT 4
- Playing saved audio HOT 2
- FeatureExtractor ComputeFrom FastCopy bug HOT 3
- FIR Bandpass Resampling becomes unstable after a long duration HOT 9
- Is there method for series filter? or parallel filter? HOT 2
- How to obtain digital SOS filter from analog zeros and poles? HOT 1
- FFT compatible with OpenAI Whisper features HOT 1
- Help with realtime resampling HOT 3
- Analog poles and zeros of elliptic filter are different from scipy HOT 2
- How to use the polyphase filters implementation
- Buggy RLS filter implementation HOT 3
- how to use stft like scipy.signal "f, t, tf_data = signal.stft(wavedata, fs=fs, window='hamming', nperseg=N_fft, noverlap=int(N_fft*0.8))"? Now we cannot get f,t value. The tf_data is different from var spectrogram = stft.Spectrogram(discreteSignal, normalize: true). HOT 2
- Question: Pitch Patterns
- out of memory using Stft.Spectrogram function HOT 1
- WaveFile: Compiler Warning (level 2) CS0652 @ line 134
- PowerSpectrum and Magnitude spectrum missing from FFT64, RealFFT, RealFFT64
- Order of instruction wrong on wiki page HOT 1
- Pitch shifter produces garbled noise HOT 1
- python speech features fbanks HOT 4
- DiscreteSignal.Samples contain more samples than the original signal HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nwaves.