achabotl / pambox Goto Github PK
View Code? Open in Web Editor NEWPython auditory modeling toolbox.
Home Page: http://pambox.org
License: BSD 3-Clause "New" or "Revised" License
Python auditory modeling toolbox.
Home Page: http://pambox.org
License: BSD 3-Clause "New" or "Revised" License
I often run into the issue that the speech and noise levels should be set before the distortion, for example when an SNR is set at the source, rather than at the ears, in an experiment with signals in space. Right now I have to do very inelegant overrides of the preprocessing function, or
Have an extra argument to the Experiment class, to define adjust_levels_before_processing
or something like that would really simplify things.
Additionally, that would solve the problem that adjusting the levels of binaural signals when HRTFs are applied. If the levels are adjusted before the distortion processing, then the signals should, in principle, still be binaural.
Experiment.srts_from_df
(which is a bad name) should allow the srt_at
argument to take model-specific values. For example, if a model outputs SII values, then the SRT shouldn't be at 50% but at some value between 0 and 1. The current API is
srts_from_df(df, col='Intelligibility', srt_at=50)
We should allow for model specific criteria, like: {('Model', 'Output'}: criterion'}
. This has the downside that if the "default" srt_at
shouldn't be 50, then all models must be part of the dictionary. Therefore, adding another keyword argument might be a better idea.
srts_from_df(df, col='Intelligibility', srt_at=50, model_srts=None)
See AppVeyor.
Should be able to:
Look into subclassing DataFrame for experiment results:
Since it's possible to save a Pandas Dataframe directly to HDF5, it would be a good idea to offer that option when running an experiment. I think the default should be "off", because the resulting files will be too big, but it would certainly be useful for debugging, makings plots of the internal representations, etc.
Probably a simple addition after the STI model (issue #6) has been implemented.
In Experiment.processing
, the application of the distortion is done independently for the target and masker. It's a problem for non-linear distortions, like spectral-subtraction, which require the noisy signal and the noise alone.
I see two approaches for "fixing" that:
Experiment
and replace the preprocessing
method with a method that applies the distortion whichever way they want.Experiment
to define the behavior inside the preprocessing
method. That would require changing the level adjustment behavior as well. Actually, the level adjustment would have to be done before the application of the distortion.Right now, I'd say we should stick to option 1.
The test setup for fftfilt
depends on nose
, which is not part of the regular dependencies. It would be bit stupid to depend on both nose and pytest, so I should find a way to convert the tests to use pytest.
It's a bit of a mess right now to require an Experiment
whenever one needs to transform results. The results should be independent of the experiment. We should consider creating a class based on Dataframes.
Here are some examples of how to subclass data frames, from GeoPandas or to use composition, from UrbanSim.
I think the issues is that the load_balanced_view
is non-blocking and that I don't check if it's ready before iterating through the results. Should try either making the view blocking, or checking for readiness, before iteration.
If the signal is of shape 2xN, for example, utils.rms
and utils.setdbspl
spit out a ValueError because of incompatible shapes. The issue is that both function have to do a division by, or a subtraction of, the mean and that does not fit the broadcasting rules.
Lavandier, M. and Culling, J. F. (2010). Prediction of binaural speech intelligibility against noise in rooms, J. Acoust. Soc. Am., 127(1), 387-99
DOI: 10.1121/1.3268612
Noticed that the time output of the modulation filterbank is different from the time output if we just use the sp.signal.butter
function to create the coefficients and then filter the signal with sp. signal.filtfilt
.
Apparently, there an extra "-1" factor in when creating the frequency vector that should not be there. If we remove it, then the output of the mod_filterbank
function is the same as when using butter
.
It would probably make sense to use butter
, since we're using Butterworth filters anyway, instead of using our own implementation. Additionally, because of the way the modulation filtering is currently done, the shape of the filter is dependent on the length of the input signal, because it affects the resolution of the frequency vector.
Ideally, the mr-sEPSM should just call code from the sEPSM, and use the output of the modulation filtering stage to calculate the multi-resolution process. This way we could get the sEPSM and the mr-sEPSM predictions in one model.
Beutelmann, R., Brand, T., and Kollmeier, B. (2010). Revision, extension, and evaluation of a binaural speech intelligibility model, J. Acoust. Soc. Am., 127(4), 2479--2497
Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech, , (), 4214--4217
DOI
Should pick a "reference level" for signals. For example, should a signal with an RMS value of 1 correspond to 0 dB, 100 dB, or something else?
We could use a physical standard too, where an RMS of 20e-6 corresponds to 0 dB, i.e.
level = 20 * log10(rms / 20e-6)
It's probably a good idea to convert the modulation fitlerbank code into a class, just like the other filter banks. It would be more consistent, but additionally, it would allow for caching the filter coefficients, instead of calculating them every time.
/Users/chabot/Dropbox/PhD/pambox/pambox/speech/experiment.py:604: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
It happens in pred_to_pc. Should use .loc
instead of double indexing.
The idea is to have (possibly) a single class, or function, that can loop through multiple distortions, speech materials, models, etc. and run all the experiments, with a minimum of boilerplate code.
When loading the speech shaped noise in speech.Material
, a mono signal should be returned if force_mono
is True
.
I could send an email or use local notification mechanisms on each platform:
The current implementation minimizes the squared error for each sentence compared to the reference. What should instead be done is to minimize the error between the average intelligibility (intelligibility across sentences) and the reference intelligibility.
The boundaries of each flitter should be calculated independently, and not suppose that the input frequencies are spaced properly.
So right now, if we input two frequencies that are not spaced according to the width
parameter, e.g. [63, 1000]
, the boudaries are: [56.12661924, 70.71510904, 1122.46204831]
. But there should be 4 boundaries.
Jelfs, S., Culling, J. F., and Lavandier, M. (2011). Revision and validation of a binaural model for speech intelligibility in noise, Hear. Res., 275(127), 96-104
In speech.Exeriment.adjust_levels
, it should be possible to adjust the levels correctly even if the signals are binaural. A way to do this is simply:
average_level = np.mean(utils.dbspl(signal))
Average level should therefore always be a single number, independently of if signal
has one, two, or more channels.
Each intelligibility model returns a different type of prediction value. Sometimes it is intelligibility percentage directly, but more often than not, it is some particular value that has to be transformed to intelligibility. A model can also return internal intermediate values, such a envelope powers, level spectra, etc. It would be great if the output of the the models was standardized such that the models can be used interchangeably.
If the path_to_sentences
doesn't end with '/' then the Material class cannot find the sentences. It concatenates path
with *.ext
, which of course, is wrong.
International Electrotechnical Commission (2003). 60268--16-2003 Sound system equipment---Part 16: Objective rating of speech intelligibility by speech transmission index, , (60268--16-2003), 1--28
The predict
function should be broken down for more modularity, such that there's no need to duplicate it for the mr-sEPSM. The abstraction level of all the call in predict
should be the same.
It happens in the conversion to float, if the signal is integers. The signal is divided by the largest value possible, which is pretty much always -2^(Nbits - 1)
. The signal should be divided by the absolute value.
When finding the bands above threshold in the sEPSM, there is a factor of 0.231 for the compensation of the filter bandwidth. This factor is unnecessary because the diffuse hearing threshold used for the comparison are already adjusted for the filter bandwidths.
The factor should be removed. Hopefully, that would not affect the predictions too much.
The srts_from_df
method crashes if model_srts
is defined but the models are not in the dataframe. The issues is that the aggregate
method ends up with functions for columns that don't exist.
This should be fixed by adding the model_srts
values only if the model--output pair exists in the data frame.
So far, there are a sound
function and a soundsc
function. The naming is inherited from Matlab. Have a single play
function would be much more obvious.
When listing the files in the speech.Material class, all files in the path specified are listed. Only the wav files should be listed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.