soumith / lua---audio Goto Github PK
View Code? Open in Web Editor NEWModule for torch to support audio i/o as well as do common operations like dFFT, generate spectrograms etc.
License: Other
Module for torch to support audio i/o as well as do common operations like dFFT, generate spectrograms etc.
License: Other
my code:
require 'audio'
aud, sample_rate = audio.load('what_a_wonderful_world.mp3')
print(#aud)
print(sample_rate)
audio.save('test_out.mp3', aud, sample_rate)
output:
6094080
2
[torch.LongStorage of size 2]
test_out.mp3 ends up being an empty mp3 file. input file is 6.2mb, test_out.mp3 is 417 bytes, and duration is 00:00
After I installed the first two package,when I use” luarocks install https://raw.githubusercontent.com/soumith/lua---audio/master/audio-0.1-0.rockspec“ , it comes error like
Scanning dependencies of target audio
[ 50%] Building C object CMakeFiles/audio.dir/audio.c.o
Linking C shared module libaudio.so
/usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libfftw3.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [libaudio.so] Error 1
make[1]: *** [CMakeFiles/audio.dir/all] Error 2
make: *** [all] Error 2
Error: Build error: Failed building.
I tried to figure this out but failed.
I was following the steps in the example on readme when I encountered this error.
require 'audio'
require 'image'
voice = audio.samplevoice()
formats: no handler for file extension `mp3'
[read_audio_file] Failure to read file
Aborted (core dumped)
I'm getting the following error when installing with luarocks install https://raw.githubusercontent.com/soumith/lua---audio/master/audio-0.1-0.rockspec
Using https://raw.githubusercontent.com/soumith/lua---audio/master/audio-0.1-0.rockspec... switching to 'build' mode
Cloning into 'lua---audio'...
remote: Counting objects: 16, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 16 (delta 1), reused 8 (delta 0), pack-reused 0
Receiving objects: 100% (16/16), 156.86 KiB | 0 bytes/s, done.
Resolving deltas: 100% (1/1), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/greg/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/greg/torch/install/lib/luarocks/rocks/audio/0.1-0" && make
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/greg/torch/install
SOX_INCLUDE_DIR: /usr/include
SOX_LIBRARIES: /usr/lib/x86_64-linux-gnu/libsox.so
FFTW_INCLUDE_DIR: /usr/local/include
FFTW_LIBRARIES: /usr/local/lib/libfftw3.a
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_audio-0.1-0-9939/lua---audio/build
Scanning dependencies of target audio
[ 25%] Building C object CMakeFiles/audio.dir/audio.c.o
Linking C shared module libaudio.so
/usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libfftw3.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [libaudio.so] Error 1
make[1]: *** [CMakeFiles/audio.dir/all] Error 2
make: *** [all] Error 2
Error: Build error: Failed building.
Hi,
For a 16 bit PCM audio sampled at 11,025 Hz, I would expect the numeric range of audio samples to be (-32768,32768) (i.e. the range of 16 bit signed integer). This is true when I open an audio file using any other library (like using the tuneR in R package) as shown below:
However, when I load the same audio file using lua--audio's audio.load() function, the number range is extremely large -- a DoubleTensor with range (-2147483648, 1719271424) as shown below:
Can someone tell me why is this the case? Furthermore, how can I convert the DoubleTensor values to a signed 16 bit integer?
I'm running on Ubuntu 14.04. I was wondering is there a way to add support for the .m4a files that come from iTunes?
Here the sox output:
AUDIO FILE FORMATS: 8svx aif aifc aiff aiffc al amb amr-nb amr-wb anb au avr awb caf cdda cdr cvs cvsd cvu dat dvms f32 f4 f64 f8 fap flac fssd gsm gsrt hcom htk ima ircam la lpc lpc10 lu mat mat4 mat5 maud mp2 mp3 nist ogg paf prc pvf raw s1 s16 s2 s24 s3 s32 s4 s8 sb sd2 sds sf sl sln smp snd sndfile sndr sndt sou sox sph sw txw u1 u16 u2 u24 u3 u32 u4 u8 ub ul uw vms voc vorbis vox w64 wav wavpcm wv wve xa xi
PLAYLIST FORMATS: m3u pls
AUDIO DEVICE DRIVERS: alsa ao oss ossdsp pulseaudio
I am having issue with loading a large wav file.
So what I did was: I loaded a mp3 file with more than 20min music and then saved it as wav.
But when I load the wav that I just generated it gives the error below:
/home/achang/torch/install/share/lua/5.1/audio/init.lua:56: [read_audio] Unknown length at /tmp/luarocks_audio-0.1-0-130/lua---audio/generic/sox.c:45
stack traceback:
[C]: in function 'load'
/home/achang/torch/install/share/lua/5.1/audio/init.lua:56: in function 'load'
mp3_play.lua:16: in function 'load_file'
[string "x = load_file('music.wav')"]:1: in main chunk
[C]: in function 'xpcall'
/home/achang/torch/install/share/lua/5.1/trepl/init.lua:670: in function 'repl'
...hang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x00405d50
I use the code below to convert:
function convert_mp3_2_wav(file_path)
local x, sample_rate = audio.load(file_path)
local outpath = file_path:gsub(".mp3", ".wav")
audio.save(outpath, x, sample_rate)
end
When use a player the wav file sounds fine.
I've saved the 'spect' image, but there is just white-banded image.
following your spectrogram example code ..
What's wrong with me ?
When I finished calculating CudaTensor on GPU, I try to save audio (Type() = CudaTensor) with
audio.save(('path'), music, 44100)
Error happened
/home/exp/torch/install/bin/luajit: /home/exp/torch/install/share/lua/5.1/audio/init.lua:74: attempt to index field 'libsox' (a nil value)
stack traceback:
/home/exp/torch/install/share/lua/5.1/audio/init.lua:74: in function 'save'
sample_music.lua:155: in main chunk
[C]: in function 'dofile'
.../exp/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405ea0
So I must transform torch.CudaTensor to torch.DoubleTensor.
By the way, I don't know why I can't save music
to mp3 correctly. When I save music
to mp3, I get a very small file that can't play.
I've been trying to debug exactly where it's leaking, but it seems like valgrind and the likes don't like luajit a lot. However, my tests show that indeed audio.load() does leak memory. Reproduce with:
require 'audio'
while true do
data = audio.load( 'voice.mp3' )
collectgarbage()
end
It will eventually get killed by the OOM-killer if you let it run long enough.
Builds, but doesn't import properly.
+ lua -e 'require '\''audio'\'''
lua: error loading module 'libaudio' from file '/Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libaudio.so':
dlopen(/Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libaudio.so, 6): Symbol not found: _luaL_checkint
Referenced from: /Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libaudio.so
Expected in: flat namespace
in /Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libaudio.so
stack traceback:
[C]: in ?
[C]: in function 'require'
...ltschko/anaconda/envs/_test/share/lua/5.3/audio/init.lua:37: in main chunk
[C]: in function 'require'
(command line):1: in main chunk
[C]: in ?
Is there any way to do the following?
require 'audio'
file = io.open("foo.wav", "r")
-- reading all to get the RIFF header
-- could offset to get the raw audio if sample rate is known
contents = file:read("*all")
file:close()
chars = torch.CharTensor(#contents)
chars:storage():string(contents)
signal = audio.decompress(chars, 'wav')
I tried playing around with it a little but but your audio.compress seems to compress the file down to a larger size CharTensor than I expected. For example, suppose that foo.wav is 1 second long with a sample rate of 16kHz — so, 16000 samples. Each sample is a short, so I expect to have 2 * 16000 = 32000 bytes or chars when I compress it to a CharTensor. However, audio.compress will compress it to a 3 * 16000 = 48000 length CharTensor.
I assume this has something to do with embedding the sample rate.
Thanks!
Thank you very much for your contribution Soumith.
When I read the same audio file (.mp3) with your library and 'librosa' in python, I get different size as an output.
Mono channel, sampling rate : 22050
lua---audio returns (417024x1)
librosa returns (417600x1)
Any idea what would be the reason?
Thank you very much.
Getting this error building against Lua 5.2 (LuaJIT 2.0 works fine)
Make any sense?
+ lua -e 'require '\''audio'\'''
lua: error loading module 'libaudio' from file '/Users/Alex/anaconda/envs/_test/lib/lua/5.2/libaudio.so':
dlopen(/Users/Alex/anaconda/envs/_test/lib/lua/5.2/libaudio.so, 6): Symbol not found: _luaL_register
Referenced from: /Users/Alex/anaconda/envs/_test/lib/lua/5.2/libaudio.so
Expected in: flat namespace
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.