GithubHelp home page GithubHelp logo

Comments (11)

ejhumphrey avatar ejhumphrey commented on August 28, 2024 1

@justinsalamon you know i don't know how to read

from pysox.

justinsalamon avatar justinsalamon commented on August 28, 2024 1

However in this kind of library, if you allow the input file to be overwritten, if you run the same process twice you get different results. That's bad for reproducibility, which is why I'm in favor of the "put input and output files in different directories" approach.

Well, you could argue that reproducibility means being able to repeat an experiment as performed and obtain the same results. If you start with the same dataset, then as long as the codebase is the same, you will obtain the same results, even if part of the process means overwriting some of the data. So I don't necessarily see this an an issue for reproducibility.

The bigger point though, I think, is that you're enforcing a certain behavior on the user when I think that decision should be left to the user. Of course you don't want to support the user doing something that's blatantly wrong, but I'm not sure overwriting files falls into that bucket. As a concrete example, for scaper I'm generating new audio files, but because pysox always involves writing to disk, I end up performing multiple steps of file io all to produce a single final result. There's no real reason for me to save each intermediate step to a separate file nor keep it on the file system once the process is over. So basically if pysox doesn't support using the same file for input/output, I'll just have to wrap it in code that implements this logic.

Anyway, it's your call and I won't insist futher, just wanted to put my 2c out there :)

from pysox.

rabitt avatar rabitt commented on August 28, 2024

I see your point, but my personal preference is to avoid having input_file = output_file, because it's harder to find and reproduce bugs. Why not just do:

audio_infile = '/Users/justin/Downloads/trimtest.wav'
audio_outfile = '/Users/justin/Downloads/trimtest_out.wav'

and if you no longer care about audio_infile after processing, you can os.remove(audio_infile).

from pysox.

rabitt avatar rabitt commented on August 28, 2024

The other option is to put all your input files and output files in separate directories to allow them to have the same basename. This is what I usually do, fwiw.

from pysox.

justinsalamon avatar justinsalamon commented on August 28, 2024

I see your point, but my personal preference is to avoid having input_file = output_file, because it's harder to find and reproduce bugs. Why not just do:

audio_infile = '/Users/justin/Downloads/trimtest.wav'
audio_outfile = '/Users/justin/Downloads/trimtest_out.wav'

and if you no longer care about audio_infile after processing, you can os.remove(audio_infile).

Ehm, well, I'm of course aware that I can do this, but that's not my use case ;) My use case requires overwriting the same file. Saving the output file into a separate directory is exactly what the snippet I included in my first post does (and then copies it over and overwrites the input file). But I feel like that's besides the point - regardless of whether one prefers to keep their input/output files separate or not, right now the user can provide the same filepath for input/output and pysox will spit out an invalid audio file without any warning, which is undocumented behavior and, I think, undesirable.

from pysox.

ejhumphrey avatar ejhumphrey commented on August 28, 2024

this is an easy enough thing to do with tempfiles: https://github.com/ejhumphrey/claudio/blob/master/claudio/sox.py#L131

just make sure if you use mkstmp, you explicitly close the file ID. this is one of those cases where python3 + context managers is way shinier than py2: https://docs.python.org/3/library/tempfile.html#examples

from pysox.

ejhumphrey avatar ejhumphrey commented on August 28, 2024

forgot to mention ... would be an easy enough PR 😉

from pysox.

rabitt avatar rabitt commented on August 28, 2024

@justinsalamon totally agree that pysox should throw an error if input_filepath = output_filepath, rather than creating an empty file. Didn't realize that was the main issue you were bringing up - I thought you wanted to be able to allow the paths to be the same.

from pysox.

ejhumphrey avatar ejhumphrey commented on August 28, 2024

fwiw, I am in favor of a call signature like do_a_thing(input_file="my_file.wav", output_file=None) for over-writing a file in place.

from pysox.

justinsalamon avatar justinsalamon commented on August 28, 2024

@ejhumphrey I know I can use tempfiles, it's in my first comment on this issue ;)

@justinsalamon totally agree that pysox should throw an error if input_filepath = output_filepath, rather than creating an empty file. Didn't realize that was the main issue you were bringing up - I thought you wanted to be able to allow the paths to be the same.

@rabitt yes, the main issue I was bringing up is that the creation of an empty file is bad. However, my preferred solution would be for pysox to use tempfiles (as per my example) to basically yes support using the same path for input/output. An alternative solution is what you propose (raise an error), which would be better than the current behavior, though I don't see why you wouldn't want to support overwriting the input file - there are plenty of tools out there (e.g. shutil, librosa) that happily overwrite files and it's a fairly common use case.

from pysox.

rabitt avatar rabitt commented on August 28, 2024

@justinsalamon

the creation of an empty file is bad

agreed

However, my preferred solution would be for pysox to use tempfiles (as per my example) to basically yes support using the same path for input/output. An alternative solution is what you propose (raise an error), which would be better than the current behavior, though I don't see why you wouldn't want to support overwriting the input file - there are plenty of tools out there (e.g. shutil, librosa) that happily overwrite files and it's a fairly common use case.

In a library like this the main paradigm is creating outputs by modifying inputs. Other types of libraries might do something like "run process, create a new text file" and allow you to overwrite said text file. In this case, you can still "reproduce" the results because the original information is preserved. However in this kind of library, if you allow the input file to be overwritten, if you run the same process twice you get different results. That's bad for reproducibility, which is why I'm in favor of the "put input and output files in different directories" approach.

Because of ^^rant, I prefer to throw an error, rather than use the tempfile approach.

from pysox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.