achabotl / pambox Goto Github PK

View Code? Open in Web Editor NEW

34.0 6.0 8.0 5.23 MB

Python auditory modeling toolbox.

Home Page: http://pambox.org

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.28% Python 99.72%

auditory modeling hearing speech

pambox's People

Contributors

Stargazers

Watchers

Forkers

lx37 gitter-badger jfsantos maksymdelta atencra jupiterethan 3x10e8 gitzephyr

pambox's Issues

Experiment needs a way to set levels before or after the processing

I often run into the issue that the speech and noise levels should be set before the distortion, for example when an SNR is set at the source, rather than at the ears, in an experiment with signals in space. Right now I have to do very inelegant overrides of the preprocessing function, or

Have an extra argument to the Experiment class, to define adjust_levels_before_processing or something like that would really simplify things.

Additionally, that would solve the problem that adjusting the levels of binaural signals when HRTFs are applied. If the levels are adjusted before the distortion processing, then the signals should, in principle, still be binaural.

SRT conversion should allow for model-specific criteria

Experiment.srts_from_df (which is a bad name) should allow the srt_at argument to take model-specific values. For example, if a model outputs SII values, then the SRT shouldn't be at 50% but at some value between 0 and 1. The current API is

srts_from_df(df, col='Intelligibility', srt_at=50)

We should allow for model specific criteria, like: {('Model', 'Output'}: criterion'}. This has the downside that if the "default" srt_at shouldn't be 50, then all models must be part of the dictionary. Therefore, adding another keyword argument might be a better idea.

srts_from_df(df, col='Intelligibility', srt_at=50, model_srts=None)

Add CI tests for Windows

See AppVeyor.

Simplify the loading and manipulation of experiment results

Should be able to:

load from disk
Convert to SRT
Know its "groups"
Convert to intelligibility

Look into subclassing DataFrame for experiment results:

Add an option to Experiment class to save complete data frame to HDF5

Since it's possible to save a Pandas Dataframe directly to HDF5, it would be a good idea to offer that option when running an experiment. I think the default should be "off", because the resulting files will be too big, but it would certainly be useful for debugging, makings plots of the internal representations, etc.

Implement speech-based STI

Probably a simple addition after the STI model (issue #6) has been implemented.

Not possible to apply processing mixture of target and maskers

In Experiment.processing, the application of the distortion is done independently for the target and masker. It's a problem for non-linear distortions, like spectral-subtraction, which require the noisy signal and the noise alone.

I see two approaches for "fixing" that:

If required the user should subclass Experiment and replace the preprocessing method with a method that applies the distortion whichever way they want.
Add an additional option to Experiment to define the behavior inside the preprocessing method. That would require changing the level adjustment behavior as well. Actually, the level adjustment would have to be done before the application of the distortion.

Right now, I'd say we should stick to option 1.

Get rid of the nose dependency when testing `fftfilt`

The test setup for fftfilt depends on nose, which is not part of the regular dependencies. It would be bit stupid to depend on both nose and pytest, so I should find a way to convert the tests to use pytest.

Consider creating a Results class for intelligibility experiments

It's a bit of a mess right now to require an Experiment whenever one needs to transform results. The results should be independent of the experiment. We should consider creating a class based on Dataframes.

Here are some examples of how to subclass data frames, from GeoPandas or to use composition, from UrbanSim.

Implement the STI speech intelligibility model

Parallel speech.Experiment throws a IPython.parallel.error when engines are not ready

I think the issues is that the load_balanced_view is non-blocking and that I don't check if it's ready before iterating through the results. Should try either making the view blocking, or checking for readiness, before iteration.

utils.rms and utils.setdbspl fail with some signal sizes

If the signal is of shape 2xN, for example, utils.rms and utils.setdbspl spit out a ValueError because of incompatible shapes. The issue is that both function have to do a division by, or a subtraction of, the mean and that does not fit the broadcasting rules.

Implement the model of Lavandier and Culling (2010)

Lavandier, M. and Culling, J. F. (2010). Prediction of binaural speech intelligibility against noise in rooms, J. Acoust. Soc. Am., 127(1), 387-99

DOI: 10.1121/1.3268612

Write tests for the Experiment class

Modulation filtering stage produces different output from "butter" function

Noticed that the time output of the modulation filterbank is different from the time output if we just use the sp.signal.butter function to create the coefficients and then filter the signal with sp. signal.filtfilt.

Apparently, there an extra "-1" factor in when creating the frequency vector that should not be there. If we remove it, then the output of the mod_filterbank function is the same as when using butter.

It would probably make sense to use butter, since we're using Butterworth filters anyway, instead of using our own implementation. Additionally, because of the way the modulation filtering is currently done, the shape of the filter is dependent on the length of the input signal, because it affects the resolution of the frequency vector.

The mr-sEPSM should use the sEPSM auditory processing directly

Ideally, the mr-sEPSM should just call code from the sEPSM, and use the output of the modulation filtering stage to calculate the multi-resolution process. This way we could get the sEPSM and the mr-sEPSM predictions in one model.

Implement Beutelmann et al (2010) intelligibility model

Beutelmann, R., Brand, T., and Kollmeier, B. (2010). Revision, extension, and evaluation of a binaural speech intelligibility model, J. Acoust. Soc. Am., 127(4), 2479--2497

Implement the STOI intelligibility model

Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech, , (), 4214--4217
DOI

Pick a "reference level" for the toolbox

Should pick a "reference level" for signals. For example, should a signal with an RMS value of 1 correspond to 0 dB, 100 dB, or something else?

We could use a physical standard too, where an RMS of 20e-6 corresponds to 0 dB, i.e.

level = 20 * log10(rms / 20e-6)

Convert modulation filtering function into a Class and cache filter coefficients

It's probably a good idea to convert the modulation fitlerbank code into a class, just like the other filter banks. It would be more consistent, but additionally, it would allow for caching the filter coefficients, instead of calculating them every time.

Fix warning "A value is trying to be set on a copy of a slice from a DataFrame" in Experiment

/Users/chabot/Dropbox/PhD/pambox/pambox/speech/experiment.py:604: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

It happens in pred_to_pc. Should use .loc instead of double indexing.

Models should return something other than dictionaries

namedtuples would make much more sense. See also #33, and #40

Create an "experiment" class to simplify intelligibility experiments

The idea is to have (possibly) a single class, or function, that can loop through multiple distortions, speech materials, models, etc. and run all the experiments, with a minimum of boilerplate code.

Loading SSN should follow the `force_mono` flag

When loading the speech shaped noise in speech.Material, a mono signal should be returned if force_mono is True.

Add a notification when an experiment is finished.

I could send an email or use local notification mechanisms on each platform:

OS X: notification center, Growl, terminal-notifier
Linux: ?
Window: balloontip

Ideal observer: curve should be fitted using the average intelligibility

The current implementation minimizes the squared error for each sentence compared to the reference. What should instead be done is to minimize the error between the average intelligibility (intelligibility across sentences) and the reference intelligibility.

noctave_filtering should calculate the boundaries for each center frequency

The boundaries of each flitter should be calculated independently, and not suppose that the input frequencies are spaced properly.

So right now, if we input two frequencies that are not spaced according to the width parameter, e.g. [63, 1000], the boudaries are: [56.12661924, 70.71510904, 1122.46204831]. But there should be 4 boundaries.

Tests fail on Travis because IPython is missing as a dependency

Implement the model of Jelfs et al. (2013)

Matlab implementation

Jelfs, S., Culling, J. F., and Lavandier, M. (2011). Revision and validation of a binaural model for speech intelligibility in noise, Hear. Res., 275(127), 96-104

Level adjustment should work for binaural signals too

In speech.Exeriment.adjust_levels, it should be possible to adjust the levels correctly even if the signals are binaural. A way to do this is simply:

average_level = np.mean(utils.dbspl(signal))

Average level should therefore always be a single number, independently of if signal has one, two, or more channels.

Standardize the return values of the intelligibility models

Each intelligibility model returns a different type of prediction value. Sometimes it is intelligibility percentage directly, but more often than not, it is some particular value that has to be transformed to intelligibility. A model can also return internal intermediate values, such a envelope powers, level spectra, etc. It would be great if the output of the the models was standardized such that the models can be used interchangeably.

Sentences not found if path does not end with '/'

If the path_to_sentences doesn't end with '/' then the Material class cannot find the sentences. It concatenates path with *.ext, which of course, is wrong.

Implement STI model

International Electrotechnical Commission (2003). 60268--16-2003 Sound system equipment---Part 16: Objective rating of speech intelligibility by speech transmission index, , (60268--16-2003), 1--28

Refactor Sepsm and MrSepsm

The predict function should be broken down for more modularity, such that there's no need to duplicate it for the mr-sEPSM. The abstraction level of all the call in predict should be the same.

Signals loaded with `utils.read_wav_as_float` are 180 deg out of phase

It happens in the conversion to float, if the signal is integers. The signal is divided by the largest value possible, which is pretty much always -2^(Nbits - 1). The signal should be divided by the absolute value.

The sEPSM does a double compensation for filter bandwidth

When finding the bands above threshold in the sEPSM, there is a factor of 0.231 for the compensation of the filter bandwidth. This factor is unnecessary because the diffuse hearing threshold used for the comparison are already adjusted for the filter bandwidths.

The factor should be removed. Hopefully, that would not affect the predictions too much.

achabotl / pambox Goto Github PK

pambox's People

Contributors

Stargazers

Watchers

Forkers

pambox's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs