GithubHelp home page GithubHelp logo

Comments (7)

justinsalamon avatar justinsalamon commented on July 23, 2024

At the moment the code specifically loads both columns as floats:
https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L229

Can you use np.loadtxt to load columns of different datatypes? It seems as though you have to specify a type and all columns are expected to be of that type.

I wrote this loader with melody in mind - it acts as a first check to ensure the data is of the correct type before proceeding. If you make this function return values that are not necessarily floats, you'd have to add a new check for the melody eval code or it could break...

from mir_eval.

ejhumphrey avatar ejhumphrey commented on July 23, 2024

I believe that yes, you can load different datatypes using that function. The docstring has this example:

>>> d = StringIO("M 21 72\nF 35 58")
>>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
...                      'formats': ('S1', 'i4', 'f4')})
array([('M', 21, 72.0), ('F', 35, 58.0)],
      dtype=[('gender', '|S1'), ('age', '<i4'), ('weight', '<f4')])

One, it's overkill for what I'm asking, but I think you could massage it to behave as intended. Two, while convenient, there's nothing that strictly says it's necessary to use this function (np.loadtxt) versus iterating over a file handle; it's what the rest of the loaders do anyways.

Sure, format assertions are always good, and we'd need a different one if the implementation changes.

from mir_eval.

craffel avatar craffel commented on July 23, 2024

I think lists of strings are saner than np.arrays of strings, so I'd prefer that it just dealt with list-like objects of any type. Feel free to change at will.

from mir_eval.

craffel avatar craffel commented on July 23, 2024

So, currently there are @bmcfee 's functions for loading annotations (ranges)/events: mir_eval.io.load_events and mir_eval.io.load_intervals. load_events loads in one- or two-column files; the first column always gets read in as floats (times), the second (if it exists) gets read in as a list of strings (labels). Similarly, load_annotation loads in two- or three-; first two are floats specifying the intervals (ranges), last column if it exists is a list of string labels. @justinsalamon's load_time_series does exactly the same thing as load_events except that the second column reads in floats. It seems like these functions should be merged, and the last column should be either returned as a np.ndarray or a list of strings.

Similarly, mir_eval.util.adjust_intervals is a useful function across many tasks but currently is only really suitable for segments. It'd be useful to use for chords, but the "padding" label should be just 'N' (no chord), I would argue; and the function only allows for setting a prefix. Similarly, it'd be useful for time series, but the padding label is always a string. It seems like for both of these cases it'd be nice if it could handle labels as a string or float list-like, and that the padding label could be set arbitrarily.

Is everyone comfortable with me trying to merge all of this functionality?

from mir_eval.

bmcfee avatar bmcfee commented on July 23, 2024

load_time_series does exactly the same thing as load_events except that the second column reads in floats. It seems like these functions should be merged, and the last column should be either returned as a np.ndarray or a list of strings.

Sure. load_events allows one argument (converter) to be passed in to specify the event index type (defaults to float). Seems like the quick fix here is to replicate this functionality for label parsing, so we have event_converter and label_converter (defaults to str) in https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L60-L61 .

It'd be useful to use for chords, but the "padding" label should be just 'N' (no chord), I would argue; and the function only allows for setting a prefix.

Agreed, but I'm not sure of an elegant way to do this right now. The __%s notation is a crutch following the dummy label generation in the loader, where you want synthetic labels to be unique (numbered). I figured we should have consistent keying for synthetic labels. But, in adjust_*, we only ever use __T_MIN and __T_MAX. I really wish python supported the %*s formatter right about now...

from mir_eval.

craffel avatar craffel commented on July 23, 2024

We still need to merge the loaders. load_time_series loads two float columns. load_events loads one float column and optionally one string column. load_intervals loads two float columns and optionally one string column.

from mir_eval.

craffel avatar craffel commented on July 23, 2024

OK, the changes proposed above
#35 (comment)
have been implemented and all code/example usage has been updated to reflect this change.

@bmcfee @ejhumphrey @urinieto Important - io.load_intervals now does not return labels; if you want to load labeled intervals you need to use io.load_labeled_intervals. Same is true for load_events vs load_labeled_events but AFAIK no one was using load_events for loading labels. Also, none of these functions will generate synthetic labels themselves. If you want synthetic labels like these functions used to return when no labels were present, use util.generate_labels. Take a look at the evaluators/example usage for updated usage examples.

from mir_eval.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.