Comments (11)
@justinsalamon you know i don't know how to read
from pysox.
However in this kind of library, if you allow the input file to be overwritten, if you run the same process twice you get different results. That's bad for reproducibility, which is why I'm in favor of the "put input and output files in different directories" approach.
Well, you could argue that reproducibility means being able to repeat an experiment as performed and obtain the same results. If you start with the same dataset, then as long as the codebase is the same, you will obtain the same results, even if part of the process means overwriting some of the data. So I don't necessarily see this an an issue for reproducibility.
The bigger point though, I think, is that you're enforcing a certain behavior on the user when I think that decision should be left to the user. Of course you don't want to support the user doing something that's blatantly wrong, but I'm not sure overwriting files falls into that bucket. As a concrete example, for scaper I'm generating new audio files, but because pysox always involves writing to disk, I end up performing multiple steps of file io all to produce a single final result. There's no real reason for me to save each intermediate step to a separate file nor keep it on the file system once the process is over. So basically if pysox doesn't support using the same file for input/output, I'll just have to wrap it in code that implements this logic.
Anyway, it's your call and I won't insist futher, just wanted to put my 2c out there :)
from pysox.
I see your point, but my personal preference is to avoid having input_file = output_file
, because it's harder to find and reproduce bugs. Why not just do:
audio_infile = '/Users/justin/Downloads/trimtest.wav'
audio_outfile = '/Users/justin/Downloads/trimtest_out.wav'
and if you no longer care about audio_infile
after processing, you can os.remove(audio_infile)
.
from pysox.
The other option is to put all your input files and output files in separate directories to allow them to have the same basename. This is what I usually do, fwiw.
from pysox.
I see your point, but my personal preference is to avoid having input_file = output_file, because it's harder to find and reproduce bugs. Why not just do:
audio_infile = '/Users/justin/Downloads/trimtest.wav' audio_outfile = '/Users/justin/Downloads/trimtest_out.wav'
and if you no longer care about audio_infile after processing, you can os.remove(audio_infile).
Ehm, well, I'm of course aware that I can do this, but that's not my use case ;) My use case requires overwriting the same file. Saving the output file into a separate directory is exactly what the snippet I included in my first post does (and then copies it over and overwrites the input file). But I feel like that's besides the point - regardless of whether one prefers to keep their input/output files separate or not, right now the user can provide the same filepath for input/output and pysox will spit out an invalid audio file without any warning, which is undocumented behavior and, I think, undesirable.
from pysox.
this is an easy enough thing to do with tempfiles: https://github.com/ejhumphrey/claudio/blob/master/claudio/sox.py#L131
just make sure if you use mkstmp, you explicitly close the file ID. this is one of those cases where python3 + context managers is way shinier than py2: https://docs.python.org/3/library/tempfile.html#examples
from pysox.
forgot to mention ... would be an easy enough PR 😉
from pysox.
@justinsalamon totally agree that pysox should throw an error if input_filepath = output_filepath
, rather than creating an empty file. Didn't realize that was the main issue you were bringing up - I thought you wanted to be able to allow the paths to be the same.
from pysox.
fwiw, I am in favor of a call signature like do_a_thing(input_file="my_file.wav", output_file=None)
for over-writing a file in place.
from pysox.
@ejhumphrey I know I can use tempfiles, it's in my first comment on this issue ;)
@justinsalamon totally agree that pysox should throw an error if input_filepath = output_filepath, rather than creating an empty file. Didn't realize that was the main issue you were bringing up - I thought you wanted to be able to allow the paths to be the same.
@rabitt yes, the main issue I was bringing up is that the creation of an empty file is bad. However, my preferred solution would be for pysox to use tempfiles (as per my example) to basically yes support using the same path for input/output. An alternative solution is what you propose (raise an error), which would be better than the current behavior, though I don't see why you wouldn't want to support overwriting the input file - there are plenty of tools out there (e.g. shutil, librosa) that happily overwrite files and it's a fairly common use case.
from pysox.
the creation of an empty file is bad
agreed
However, my preferred solution would be for pysox to use tempfiles (as per my example) to basically yes support using the same path for input/output. An alternative solution is what you propose (raise an error), which would be better than the current behavior, though I don't see why you wouldn't want to support overwriting the input file - there are plenty of tools out there (e.g. shutil, librosa) that happily overwrite files and it's a fairly common use case.
In a library like this the main paradigm is creating outputs by modifying inputs. Other types of libraries might do something like "run process, create a new text file" and allow you to overwrite said text file. In this case, you can still "reproduce" the results because the original information is preserved. However in this kind of library, if you allow the input file to be overwritten, if you run the same process twice you get different results. That's bad for reproducibility, which is why I'm in favor of the "put input and output files in different directories" approach.
Because of ^^rant, I prefer to throw an error, rather than use the tempfile approach.
from pysox.
Related Issues (20)
- Maybe a rounding error?
- SOX failed to get mp3 handling on CentOS
- Is there a faster way to read audio-information from files? HOT 2
- Convert an ulaw byte array to a wav array
- Transformer().build() throws an error for default value of output_path HOT 1
- TypeError: bandreject() got an unexpected keyword argument 'constant_skirt'
- set_output_format can not change the encoding type.
- sox.core.SoxiError: SoXI failed with exit code 1
- Add test support for linux aarch64
- Tagged releases are out of sync with PyPi HOT 7
- Debian 11 pip install error HOT 1
- file info doesn't currently support array input
- Extract a song into multitrack wav files HOT 3
- Specifying raw type to `build_array()` HOT 2
- Extract Mic Feed with pysox HOT 1
- Improve pysox by adding `static-sox` package to auto download sox binaries
- For this stretch factor, the stretch effect has better performance. HOT 1
- Replace deprecated imp module with importlib HOT 2
- sox.file_info.bitrate() Value error
- Pysox tries to import numpy during build-time, only declared as a run-time dependency
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pysox.