Comments (7)
I have the same problem here.
I use the following codes to read a mp3 with 44100 Hz, 192 kb/s:
audio, sr = librosa.load(path, sr=None, Mono=False)
audio = audio[0]
and
sound, sr = audio.load(line)
sound = sound:select(2,1):clone()
sound:mul(2^-31) -- keep it in [-1, 1]
And that's what I got:
librosa.max=0.916290283203, lua.max=0.916290223598,
librosa.min=-0.888732910156, lua.min=-0.888751626015
librosa.shape=(14154624,), lua.shape=(14153472,)
All I can do to keep it (nearly) align is drop the last slice of Librosa version (14154624 - 14153472 = 1152)
, which does no harm in this workaround I think.
As you can see that they are nearly the same, the third row is the difference, around Ne-5
between them. But I still cannot find a correct solution or the reason why.
Any thoughts about the reason?
Thanks!
from lua---audio.
For the similar value part, I guess there might be some version issues or some setting issues between Librosa and Torch audio :( I use librosa 0.5.0 and audio-0.1-0. And my code setting is the same as above.
from lua---audio.
@eborboihuc yes I can confirm this on my side for voice.mp3 file. However when I try different sound files (.mp3 extension), such as 02 - "Canon" (in D-Major), Pachebel from http://www.stephaniequinn.com/samples.htm, librosa.shape=(4010496,1) and lua.shape=(4013568,1) and values are different. Can you validate this on your side?
from lua---audio.
@eborboihuc surprisingly you have very similar values. In my case, values are also differs a lot even thou I used the same file with same settings...But seems like your workaround pretty much solves the issue. However I'm also curious about the reason.
Thanks.
from lua---audio.
@eborboihuc When i test both library with the voice.mp3 (example sound file in torch-audio), I get very similar values. And the dimension difference between two libraries is 576 for sr=22050.
However when i try different sound files (training data from SoundNet), values are not that similar and this time Torch-audio has longer dimension then librosa. Also note that dimension difference is varying for each file.
So your above example is based on only one file or did you get similar results for different files too?
from lua---audio.
@ardasnck
I got similar results for different files actually. But yes, the difference is varying from one to another.
And for the voice.mp3, I only got 4.27 for the difference between them when sr=22050
rosa.max=0.496704101562, rosa.min=-0.511627197266,
th.max=0.496706217527, th.min=-0.51163572073
librosa.shape=(417600,), lua.shape=(417024,)
from lua---audio.
I tried that one, got a big difference with original downloaded version. I now can have a considerably smaller difference after doing some conversion.
I have tried several versions of combinations, and find out a rule of thumb: convert it.
Here is what I do, and this can be easily solved by a simple command.
sox input.mp3 output.mp3 trim 0
Below is the original Canon.mp3
:
Input #0, mp3, from 'data/canon.mp3':
Metadata:
title : Canon
Duration: 00:01:30.98, start: 0.025057, bitrate: 192 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 192 kb/s
Metadata:
encoder : LAME3.96r
--------------------------------------------------------
librosa.max=0.999969482422, librosa.min=-1.0,
lua_audio.max=1.0, lua_audio.min=-1.0
librosa.shape=(4012416,), lua_audio.shape=(4013568,)
and Total Diff: 15436.9
is quite large.
After conversion,
Input #0, mp3, from 'data/canon2.mp3':
Metadata:
encoder : LAME 64bits version 3.99.5 (http://lame.sf.net)
title : Canon
TLEN : 91010
Duration: 00:01:31.04, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s
--------------------------------------------------------
librosa.max=0.999969482422, librosa.min=-1.0,
lua_audio.max=1.0, lua_audio.min=-1.0
librosa.shape=(4014720,), lua_audio.shape=(4013541,)
and Total Diff: 40.9921
is reasonably small now.
Hope this can answer your question.
from lua---audio.
Related Issues (14)
- audio.load(...) leaks memory HOT 3
- Incompatible with lua 5.3 HOT 1
- audio.decompress from lua i/o? HOT 1
- load_and_save_example.lua not working HOT 7
- torch.CudaTensor doesn't contain field 'libsox'. HOT 1
- Converting large mp3 to wav files - error: Unknown length
- install problem HOT 2
- I've saved the 'spect' image, but there is just white-banded image. HOT 2
- formats: no handler for file extension `mp3' HOT 16
- error with libfftw3 with installing audio
- Apple .m4a? HOT 1
- Building against 5.2 HOT 3
- Extreme range of audio data samples HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lua---audio.