GithubHelp home page GithubHelp logo

fooof-tools / fooof Goto Github PK

View Code? Open in Web Editor NEW
346.0 27.0 93.0 207.45 MB

Parameterizing neural power spectra into periodic & aperiodic components.

Home Page: https://fooof-tools.github.io/

License: Apache License 2.0

Python 99.50% Makefile 0.50%
electrophysiology oscillations power-spectral-density neuroscience local-field-potential eeg meg ecog lfp

fooof's People

Contributors

charlesbmi avatar jdominguez0005 avatar luyandamdanda avatar mwprestonjr avatar rdgao avatar ryanhammonds avatar sm-figueroa avatar srcole avatar tomdonoghue avatar torbenator avatar voytek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fooof's Issues

Saving FOOOF results

There could probably be some user friendly saving options. Building this into FOOOF itself would also give us a change to have some level of 'FOOOF data' standardization, making it easier to potentially share fooof results.

Options to add to FOOOF:

  • Add method to save FOOOF object to pickle.
  • Add method to save out to standard format (JSON or csv).

However, these, given the current organization of FOOOF would be implemented at a PSD by PSD level.

Options for dealing with multiple PSDs (probably the most common use case):

  • Add separate functions (without a new object) that allow for / support running FOOOF across PSDs (basically, that implement the loop we currently typically write out).
  • Change the current FOOOF object to support multiple PSDs.
  • Add a 'MultiplePSDFOOOF' object. This would be initialized with the same settings as FOOOF, but take in a list of PSDs that (with the same frequency vector, and range), and internally run FOOOF across the dataset. It would include methods to save out group results. (I imagine this at a subject level. There could also even be an even higher level 'GroupOfSubjects' object).

Some of the above ideas are not mutually exclusive.

Perhaps the first line of decision points are:

  • How much do we add / support on our end in terms of support for multiple PSDs (across locations and also potentially across subjects), as opposed to letting users loop it as they need.
  • How much do add / support on our end in terms of IO operations, as opposed to letting users return parameters and save/load as they need.
  • Assuming we add at least some saving: Which save format to use, and what exactly should be saved out.

Another note:
The options above basically presume the current organization (a base object basically designed to run on a single PSD), is reasonable, and we should perhaps keep in mind that a larger refactor is perhaps more sensible.

Extension:

  • Note that something like the GroupObject, or similar solutions, could also support extra useful functionality, for example optional support for running in parallel, and options to, given standardized location files, plot FOOOF results across channels (presumably adding new viz dependencies)
  • Some of this would start collapsing into, and could pull from code already used / implemented in the MEG project, with minor tweaks for more generalizability for others applying this stuff to their own datasets.

Also: these points are not particularly linked to v0.1, but are rather more general, and can mostly be addressed much later.

`x0` is infeasible

For data 11-JS.set, when f_range = [1, 35], fitting fails and throws the above error. I ran into this before, but forget what the solution is...

Error disappears and fit is successful for f_range = [1, 40]

Turn the jupyter notebooks into docs and serve 'em with sphinx gallery!

This looks cool! @voytek it's basically the method you were working on with Matar, right?

I noticed you've got a bunch of demo-style notebooks in the root level of the repo. I think it'd be helpful if these were python files that were served with sphinx. What do you all think about this? (happy to help set that up if there's interest)

as an example, here's the MNE examples folder: https://github.com/mne-tools/mne-python/tree/master/examples

and this gets turned into this gallery:

http://martinos.org/mne/stable/auto_examples/index.html

with each example getting notebook-ified:

http://martinos.org/mne/dev/auto_examples/decoding/plot_receptive_field.html#sphx-glr-auto-examples-decoding-plot-receptive-field-py

include a folder for "test" data

would it be useful to have a collection of test PSDs from different recording modalities (LFP, ECoG, EEG, MEG, etc), not for any quantitative analysis of fooof, but for quick eyeballing that it still fits something reasonable for all the possible test cases when trying new algorithms?

detail peak_params in FOOOFResult

Currently, the FOOOFResult object has a peak_params field that has one row per peak and 3 columns per peak. These columns refer to center freq, amp, and bw. However, this is not evident from the object itself or docstrings.

For example,

FOOOFResult(background_params=array([-21.57921891,   0.88217696]),
peak_params=array([[  4.79002936,   0.10790063,   1.        ],
       [ 10.77517628,   0.8725322 ,   2.48344685]]),
r_squared=0.9556209115836026, error=0.06597575096746296,
gaussian_params=array([[  4.79002936,   0.10976303,   0.5       ],
       [ 10.77517628,   0.87284017,   1.24172342]]))

It would be nice if the user was more explicitly informed of this structure. Like, instead of having peak_params, we could separate that into center_frequencies, amplitudes, and bandwidths.

trim_psd behaviour isn't obvious (perhaps undesirable).

There are notes in file: trim_psd, rounds bi-directionally to find closest values, and also not including last frequency. We should specific desired behaviour, update, and document.
It also expects PSD to be 2d. To Change?

TODO:

  • pick clear conventions, update functions to reflect, and document.

Add more information to README

The README should have more info:

  • image showing PSD fit
  • travisci testing
  • installation instructions (once on pypi)
  • note which versions of python are supported

FOOOFGroup.load() automatically appends json

I'm using the glob library to find all the FOOOFGroup .json files I want to load. glob returns full file paths, e.g. '/gh/data/megshape/fooof/102816_fooof_vertex.json'

However, FOOOFGroup.load('/gh/data/megshape/fooof/102816_fooof_vertex.json') returns an error because it is trying to load '/gh/data/megshape/fooof/102816_fooof_vertex.json.json' (extra .json).

So, ideally FOOOF would be able to recognize that the filename that I am passing into the load function already ends in .json.

Oscillation fitting stopping too early?

Per note pulled out of fooof.py:

NOTE: depending how good our osc_gauss guesses are, it can 'induce' residuals. As in - making negative error. So - with a bad fit, we could add a lot of error, increased STD, and make us stop early.

ri_ind needs a catch

435 
436             # This is in index values - convert to frequency

--> 437 shortest_side = min(abs(le_ind - max_index), abs(ri_ind - max_index))
438 guess_bw = shortest_side * 2 * self.freq_res
439

UnboundLocalError: local variable 'ri_ind' referenced before assignment

Can KDE solve the overlapping oscillations issue?

Okay, the overlapping oscillations issue is really annoying. As much as I want to avoid an iterative procedure that requires more than one call to the gaussian multi-fitting function, we may need to, in some cases.

My proposed solution is to take the gaussian output from from FOOOF--which may have many noisy overlapping gaussian fits--and do kernel density estimation on that output. It looks like this on the right:

KDE from Wikipedia

This integrates those overlapping gaussians, and then we can try running FOOOF on that, again.

seaborn has some nice KDE code that only has a scipy dependency, so it doesn't change our dependencies at all.

from scipy import stats, integrate

support = np.linspace(np.min(frequency_vector), np.max(frequency_vector), np.size(frequency_vector))
bandwidth = 1.06 * x.std() * x.size ** (-1 / 2.)
kernels = []
for i in x:
    kernel = stats.norm(i, bandwidth).pdf(support)
    kernels.append(kernel)

density = np.sum(kernels, axis=0)
density /= integrate.trapz(density, support)

Clarify advice for PSDs without a consistent slope across the region of interest

I'm fitting a PSD of motor cortical ECoG data (same data as issue #20), and it may be a bit unique because it does not have a consistent slope across the frequency range I'm interested in fitting.

If I fit 1-100Hz, then I end up with a lot of "peaks" to fill in the power above the beta range, since the curve fit is at a much lower power. If I fit 1-40Hz, the peak seems reasonable, but I would have to know to do this, but the slope is pretty meaningless.

It would be really great if fooof was able to recognize if the slope fit is really bad and notify the user that the fit is not good. But I acknowledge that's a really tough problem. I'll try to think of possible ways to handle this.

image
image

inf/NaN array while trying to curve_fit

After successfully running the tutorials, I'm trying to pass some computed spectra from experimental data through the FOOOFGroup methods, and got the following error:

  File "run_fooof.py", line 18, in <module>
    fg.fit(freqs, spectra_pre)
  File "/data/purdongp/users/dwzhou/code/fooof/fooof/group.py", line 141, in fit
    self._fit(power_spectrum=power_spectrum)
  File "/data/purdongp/users/dwzhou/code/fooof/fooof/group.py", line 287, in _fit
    super().fit(*args, **kwargs)
  File "/data/purdongp/users/dwzhou/code/fooof/fooof/fit.py", line 316, in fit
    self.background_params_ = self._robust_bg_fit(self.freqs, self.power_spectrum)
  File "/data/purdongp/users/dwzhou/code/fooof/fooof/fit.py", line 527, in _robust_bg_fit
    popt = self._simple_bg_fit(freqs, power_spectrum)
  File "/data/purdongp/users/dwzhou/code/fooof/fooof/fit.py", line 505, in _simple_bg_fit
    maxfev=5000, bounds=self._bg_bounds)
  File "/PHShome/dwz0/.conda/envs/py36/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 701, in curve_fit
    ydata = np.asarray_chkfinite(ydata)
  File "/PHShome/dwz0/.conda/envs/py36/lib/python3.6/site-packages/numpy/lib/function_base.py", line 1233, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

I checked my input arrays with assertions for infs and NaNs, but they all passed. I noticed the comment about negative values of b in the source code where curve_fit is called. (Maybe not the same issue.) My input spectra have some negative values in them, so I added to the entire data to remove negative values and no errors appeared. Since spectra in log10 scale (mine are in dB) should be able to take on negative values, perhaps something should be done about curve_fit.

EDIT: There are other issues that come up with spectra containing 0 or 1 values that don't have to do with curve_fit.

EDIT2: I may have misunderstood the specifications for the input spectra. I thought they were supposed to be in log10 scale, so I precomputed my spectra in dB and then passed them. Looking at the plotted results, it seems like FOOOF took another log10 of them. So instead I input raw power spectra, and it's fine. Is this right?

ImportError: cannot import name JSONDecodeError

Hi all, long-time watcher, first time issue submitter:

Encountered an error while importing FOOOF:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fooof/__init__.py", line 5, in <module>
    from .fit import FOOOF
  File "fooof/fit.py", line 42, in <module>
    from fooof.core.io import save_fm, load_json
  File "fooof/core/io.py", line 6, in <module>
    from json import JSONDecodeError
ImportError: cannot import name JSONDecodeError

Is there a required version of the json library that I am not using?

Much thanks!

Returned amplitude isn't *quite* right

Per note in fooof.py:

NOTE: Currently, calculates based on nearest actual point. Should we instead estimate based an actual CF?

We should base on fitted gaussian, not nearest point.

verbose FOOOFGroup

I'm running FOOOFGroup on 7500 power spectra, and it'd be nice to have some periodic output to get a sense of how many are done / how much time is remaining.

If I'm not mistaken, the current 'verbose' option only alerts the user of warnings, but I think it'd be useful to have some other options for more updates. e.g. if verbose = 2, then print an update after every 100 PSDs.

FAAAAAAAT FOOOF.

Cloning the repo and it's 89.32 Mb! Is the release version that big? This thing fits a line(s) and finds some peaks. I mean I'm glad we've got tests/notebooks but there has to be way to trim it down.

TODO & NOTEs

There is a list of TODOs & NOTEs in the code itself (dev branch), which, once all addressed and cleaned up, should (hopefully) leave us with a clean v0.1.

Support for Matlab users

Many people who might be interested in this tool are primarily Matlab users. We should make it as easy as possible for them to integrate FOOOF into their pipelines.

Options (non-exclusive):

  • Would could have a demo notebook / script, that serves a template for the minimum required to load PSDs (from a mat file), run FOOOF, and save them back out, for further analysis back in Matlab.
    Edit: done, in fooof_mat
  • To assist with above, we could add a FOOOF save option to save out to .mat? (This would be useful if loading the default json files is not so good / straightforward from Matlab).
    Edit: not really needed, since Matlab can load the json files.
  • Is there for simple Matlab code we can write for loading and parsing FOOOF result files (jsonlines)?
    Edit: yep. Added in fooof_mat.
  • Can we add a Matlab wrapper? How easy is it to call python from Matlab? Can we avoid people having to move to python entirely? (Didn't pacmat do this? cc @srcole).
    Edit: done, in fooof_mat.

Website hosting / Binder for tutorials

Add ways for people to easily get / view the tutorials:

  • Create a simple website for FOOOF and link/host html versions directly, and give instructions for cloning/getting to the tutorials in a runable way (suggestion from Erik)
  • Binder link to launch people right into the tutorials (linked from README)

inconsistent convention between clean_background_fit and quick_background_fit

we've already talked about this on slack, i'm just opening an issue for reminder.

in fit.py:
line 191: _clean_background_fit takes in linear frequency and log psd, and perform log on frequency inside _clean_background_fit
line 208: _quick_background_fit takes in log frequency and oscillation removed log psd.

it's confusing to keep track of whether frequency is logged or not inside self.fit. I recommend keeping frequency linear in the scope of self.fit, passing linear frequency into the 2 background_fit functions, and logging them within those.

Potentially get worse fits when fitting knee.

In theory, asking to fit a knee, should reduce to a linear fit, if that is indeed a better fit. In practice, that is not the case - there are PSDs in which setting to not fit the knee (linear fit) leads to a better fit (in R^2 sense), then also fitting a knee parameter.

It's weird that with an extra parameter to play with, it can perform worse. It might have to do with the interaction between fitting slope & oscillations. It may also simply be due to the fact that fitting a knee effectively adds a new constraint - that the slope of the background go to zero below the knee - and this, in at least some cases, may be unhelpful.

Something to perhaps look into, play around with a bit. If nothing else, supports a strong suggestion to only try to fit a knee if there really is one (although that's potentially hard to evaluate).

Fit can fail (optimal parameters not found)

Curve-fitting has a max number of iterations set, and (very rarely) FOOOF can fail if optimal parameters are not found within this bound. I think this comes from very messed up PSDs - but it's to be explored.

Also - perhaps alter the behaviour so it doesn't fail out if this occurs (instead, returns Null model or similar), so that this issue doesn't scrap a process running FOOOF across many PSDs. (Alternately, suggests users use a try/except clause).

API overhaul

Suggested API overhaul, responding to point 1 in #53 (following input from Erik):

I've gone back and forth on this a little in my mind, and I think it comes down to the appropriate level of abstraction to use. Part of it comes down to how we recommend thinking about 'oscillations', and I think we've sort of broken that abstraction, in that at times we do sort of need people to acknowledge that fitting gaussians != (separate) oscillations.

That said, I think I've come around to 'peaks'. We have two components to the fit: 'background' and 'peaks'. We don't need to imply that 'peaks' == 'oscillations', that's a layer of interpretation left to the user (and again, the main issue is that overlapping gaussians may be used to constitute what we would otherwise want to call '1 oscillation'), but we can talk about 'peaks' consistently, as a middle ground, without having to talk about 'below' (gaussians) or 'above' (oscillations). This does abstract away the gaussian aspect, helpfully, in so far as there is nothing fundamental about gaussians being used here (and this could even change in the future).

Now this is a pretty total API overhaul, slightly annoying to anyone who has started on our current version but, oh well, still better to do this now.

Note: following sklearn, the trailing are reserved for results from fitting the model.

Everything below is the API for the FOOOF object (public & private), except for the FOOOFGroup particulars at the end. I don't think this overhaul effects any external functions within the fooof module, but after we settle this here, I'll do a run through of the rest to make sure it's all consistent.

This is like, legitimately, the last thing - so please tear it apart, and then I'll push through the last things, and update everything with that.

So, full API description, and planned overhaul:

Settings (public attributes)

  • amp_std_thresh -> min_peak_threshold
    • This is an amplitude threshold for peaks, defined in terms of standard deviation of the flattened spectrum. Any clearer suggestions for name are welcome.
  • bandwidth_limits -> peak_width_limits
  • bg_use_knee -> background_mode
    • Turn into a string. Options: 'knee', 'fixed'.
  • max_n_gauss -> max_n_peaks
  • min_amp -> min_peak_amplitude
  • verbose -> same

Data (public attributes)

  • freqs -> same
  • freq_range -> same
    • Note: This is used to select a frequency range (FOOOF can trim PSDs), so useful / important, I think.
  • freq_res -> same
    • Note: Only used internally, but (I think) useful to have, as it's quite relevant to fit / settings.
  • psd -> power_spectrum
    • Well, you maybe shouldn't FOOOF raw fft's (they probably aren't smooth enough), but sure, lets go a little more general / verbose.

Model fit results (public attributes)

  • background_params_ -> same
  • error_ -> rmse_error_
    • Be a bit more descriptive.
  • oscillation_params_ -> peak_params_
  • psd_fit_ -> fooofed_psd_
    • My only nitpicky hesitation here is I think of a FOOOFed PSD as representing the model, as in the parameters results, where as this is more a representation/instantiation of a FOOOFed PSD, but sure, we can run with it.
  • r2_ -> r_squared
    • Be a bit more descriptive. (It's R^2 of the fit, not r^2 error.)

Public Methods

  • add_data -> same
  • add_results -> same
  • create_report -> save_report
    • I sorta don't like how this makes it similar, and un-cleanly differentiated from 'save', but it is more specific...
  • fit -> same
  • load -> same
  • get_results -> same
  • plot -> same
  • print_report_issue -> same
  • print_results -> same
  • print_settings -> same
  • report -> same
  • save -> same

Private Methods:

  • _add_from_dict -> same
  • _bg_fit_func -> same
  • _check_bw -> same
  • _check_loaded_results -> same
  • _check_loaded_settings -> same
  • _clear_settings -> same
  • _clean_background_fit -> _robust_bg_fit
    • Quick & Clean are not super description labels for bg-fitting.
      • Robust, as this deals with outliers.
    • Update to consistently use 'bg' in private methods/attributes.
  • _create_bg_fit -> same
  • _create_osc_fit -> _create_peak_fit
  • _create_osc_params -> _create_peak_params
  • _drop_osc_cf -> _drop_peak_cf
  • _drop_osc_overlap -> _drop_peak_overlap
  • _fit_osc_guess -> _fit_peak_guess
  • _fit_oscs -> _fit_peaks
  • _infer_knee -> _infer_bg_mode
  • _prepare_data -> same
  • _quick_background_fit -> _simple_bg_fit
    • Quick & Clean are not super description labels for bg-fitting.
      • Simple, as in 'non-robust'.
    • Update to consistently use 'bg' in private methods/attributes.
  • _r_squared -> _calc_r_sqaured
    • Make more action-oriented, to emphasize it's a method, not an attribute.
  • _regenerate_model -> same
  • _reset_data -> same
  • _reset_settings -> same
  • _rmse_error -> _calc_rmse_error
    • Make more action-oriented, to emphasize it's a method, not an attribute.

Private Attributes:

Note: mixture of private settings and intermediate model stuff.

  • _background_fit -> _bg_fit
    • For consistency, use 'bg' for all private methods/attributes.
  • _bg_amp_thresh -> same
  • _bg_bounds -> same
  • _bg_guess -> same
  • _bw_std_edge -> same
  • _cf_bound -> same
  • _gaussian_params -> same
  • _oscillation_fit -> _peak_fit
  • _psd_flat -> _spectrum_flat
  • _psd_osc_rm -> _spectrum_peak_rm
  • _std_limits -> same

FOOOFGroup Particulars:

  • _fit -> same
  • _get_results -> same
  • _reset_group_results -> same
  • get_all_data -> same
  • get_fooof -> same
  • group_results -> same
  • psds -> power_spectra

Some comments after a static review of the code/design.

FOOOF notes

  1. Variable/kwarg naming is inconsistent in style. Sometimes verbose names are used (’bandwidth_limits’) other times highly abbr. var. nm. are used (‘bg_use_knee’). Same holds true for FOOOF.atttr_ names, etc. I’d opt for a more verbose style.
  2. I appreciate the intended convenience of FOOOF.model() but it’s auto-plot functionality has has two negative effects. a. It may lead to confusion: ‘model’ seems like a top-level generic command. It’s the first thing I would try if approaching the library cold. The user may however be running on a remote connection, where auto-plotting can be both unexpected and SLOW as the plot is sent over SSH/X11. That is, the name ’model’ seems like the generic top-level way to invoke FOOOF, but it is not. b. It adds a hard unnecessary dependency of FOOOF on matplotlib. Consider renaming to ‘report’? I realize you address this in the examples; I don’t that is good enough.
  3. The methods do not in plt do not test for a matplotlib install. This will lead to not-graceful failures. Wrap the matplotlib import in a try/except and flag failure to import throughout the package so things go smoothly?
  4. The fit.FOOOF class has code intended for group.FOOOF (e.g., https://github.com/voytekresearch/fooof/blob/master/fooof/fit.py#L278). This is a pretty serious abstraction leak, even if it is ‘private’. Is this really the best solution?
  5. It seems odd/inconsistent to generate a ‘FOOOFResult’ on request but not use that internally on FOOOF itself. What’s the use case here? Why use a named tuple instead of collecting/returning the key attrs_ as they are? …There should be one right way to do things.
  6. While I really like the reporting-as-text-thing (‘print_results’) it uses a lot of whitespace. The report could be made more compact/succinct. If printing several results on the CL, the info density would be very low; there’d be more scrolling than necessary, which is a pain in some terminal environments, and even in the notebook.
  7. What is the intended use case for being able to load past FOOFs back into FOOOF - https://github.com/voytekresearch/fooof/blob/master/fooof/fit.py#L365. Who would do this? When? Why?
  8. Docs on private attrs (https://github.com/voytekresearch/fooof/blob/master/fooof/fit.py#L66). If they are for users, remove the leading ‘_’. If they are private, don’t doc them in the top-level docs? These are very useful to debug, but they are only useful if you know FOOOF internals. What level of competency do you expect from users?
  9. Overall comment about 7/8: it is tempting to want to add things, “that might be nice to have” or “someone might want” but that comes with a real burden on maintenance and code complexity. I’d make sure the features/info you expose cover only the core expected uses and then let people’s actual use determine other additions. Maybe all that has passed that test; I don’t know. Just sayin’ :)
  10. In FOOF (note the lack of one ‘O’) I had a notebook that the stepped through all the internal processing steps, showing intermediate results. Might be good to do the same for FOOOF.

(I'm going to do some runtime/performance analysis later as well...)

Phase spectrum

Right now FOOOF operates only on the power spectrum.

For v0.2 we should explore whether and to what degree that adding information from the phase spectrum improves FOOOF performance.

Update Overlap Drop Parameter (small algorithmic update)

I've been investigating the couple PSDs that fail fitting (it's not a big problem - literally a couple in towards a million PSDs in MEG dataset) - but it's slightly annoying in that they fail in very much edge case suboptimal guess gaussian parameters. In particular, I think it's when it tries to fit gaussians a bit close to the edge / each other. Small jitters can fix this (for example, moving f_range by 1 point).

This got me playing a little with one of our 'magic' numbers - the degree of overlap with another gaussian at which we drop an oscillation guess before final fit (done in _drop_osc_overlap). Right now we have it set at 1 (and implicitly so, either way this should be noted as an internal parameter.)

There's no totally obvious way to choose this parameter in a purely data-driven approach (and it is currently 1.0 arbitrarily). Updating from 1.0 -> 1.5 or 2.0 (in units of bandwidth estimate of the gaussian guess) for example, marginally increases average error (on synthetic or real PSDs), but does reduce average number of gaussians fit (arguably reducing overfitting), so it's not entirely obvious what's better. I could do more explicit synthetic testing, but I reckon the answer becomes very dependent on the specific peak properties of the generated power spectra.

My quick impression from anecdotal fits, is that updating this number does reduce it being overzealous in uselessly adding more gaussians, so that's good (as this is one of our weakest points). It also does so in a manner not easily available to the user to control from public settings.

(Updating this upwards also seems to reduce likelihood of failing to fit, but from like 0.001% to something lower, so not really a relevant point).

tl-dr: I think updating this number (increasing it from 1.0 to 1.5 or 2.0) is better, but exactly where to is unclear. My only hesitation is making a change to the algorithm so late (at worst, I think it'll be benign, but there is always a non-zero chance it has some negative effects somewhere).

Thoughts?

Edit: I'm basically just thinking aloud here. I recommend updating this to 2.0.

0.1.0 Quirks

  • EDIT: FIXED. It would appear we don't support python 3.5 before 3.5.4 (fails at initialization in 'safe_import').
    • Investigate why, see if it's worth tweaking approach to broaden python support.
      • Behaviour changed in 3.5.4, changing the type of error in some situations.
  • pip install fooof[plot] doesn't seem to actually install a working install of matplotlib (on MacOS - fails with backend error). This is also true with pip install fooof[all]
    • So, for some reason, specializing matplotlib as a dependency that gets installed with it doesn't lead to a working install of mpl out of the box.
    • Figure out why / how to fix.

Exclude frequency range of line noise when fitting slope

I'm fitting a PSD (see below) for a signal after I removed line noise (data available in data.mat here). This biases the slope fit down. It might be a good idea to have a kwarg that allows the user to define frequency range(s) to ignore during the fit.

image

FOOOFGroup results as dataframe

Right now, FOOOFGroup saves to a .json file where each line is a FOOOFResult object. However, the group-wide oscillation information would be more easily accessible as a dataframe, in which each row is a peak, and the columns are: psd_index, center frequency, amp, bandwidth, etc.

So, there could be a function that converts the list of FOOOFResults (FOOOFGroup.group_results) into this dataframe.

As Tom mentioned IRL, this will add a dependency for pandas. But, that's common and why we safe_import.

Returned oscillation amplitude is not intuitive

Currently the returned amplitudes are from the multi-gaussian fitting, which takes into account all the gaussians at once. If there are overlapping gaussians, the amplitude for any given fitted gaussian is contingent upon the others. This defies our expectation of what amplitude is, which is the distance between the 1/f background and the amplitude at the center frequency.

Solution: after multifitting, overwrite the amplitude as the distance between the final background and the amplitude at each center frequency.

Potentially: Drop overlapping oscillation guesses

The current procedure can include oscillation guesses that are largely overlapping, or entirely within, another oscillation. Sometimes the actual fit moves them, sometimes not.

Potentially: Drop oscillations guesses based on overlap, and/or proximity.

Guess-BW using FWHM

@voytek
Okay, I think I figured it out:

tl-dr:
The formula for gaussians (gaussian_function) had a mistake (or, an oddity in it's definition I didn't notice). It was missing the '2*' in front of std. dev param, and so was expecting BW as defined across both sides, as opposed to across one side (half), as normal defined.

When I did the quick data-driven BW hack, I accidentally fulfilled this by passing in the double-BW, as opposed to one sided (sorta wrong - but worked given our gauss definition).

When I updated to FWHM proper formula, that formula returns BW as one sided, as usually defined, and passing this in to our gaussian definition, that expects double BW, led to wrong / halved BW guess, ultimately messing things up (small BW guess -> many more oscillations found). (Basically: after updating to FWHM estimation, we were using halved BW guesses).

SOLUTION:
What I suggest:
Let's keep the math clean, and update the Gaussian function to use STD as usually defined (the 'half BW'). This means passing in the FWHM estimation works properly (and FOOOF works properly again).

Now: to return to user, I think oscillation_params (as different from gaussian_params) should return (as FOOOF has done so far, the 2 sided Bandwidth. This is easy - we already have the separation of gaussian_params from oscillation_definitions, so just double the STD param to return two-sided BW to the user.

^This is suggested so that we don't have a weird / custom definition of gaussians.

equidistant frequency fitting in log-space

PSDs are currently fit in log-log, which means it weighs higher frequency much more so than low frequencies, because evenly spaced frequencies in linear scale congregate in high frequencies in log scale.

Practically, this is not a huge concern. It might even give us the desired effect of ignoring non-1/f stuff in the low frequencies. But it's not immediately obvious that this is the way it currently works to the user. For 1.0, perhaps a note stating the above is sufficient. For future versions, the user would ideally get an option to fit equidistant frequencies in log space, or something that de-weighs higher frequencies (weighted regression).

Add parallel support in FOOOFGroup

FOOOGroup runs through a list of PSDs linearly - but every fit is independent. We should add parallel support for running across the matrix - probably get a big speed up for running large numbers of PSDs.

As of right now: the user could set up parallel outside the FOOOFGroup object (like I have set up for the MEG data) - but no reason not to have it offered internally.

Magic Numbers!

There are still a fair few magic numbers - now pulled out and documented in the code itself. These need cleaning up:
a) can any be data-driven (become non-magic)
b) which, if any, should be settable parameters
c) If neither a or b, document them, and sanity check they are reasonable values.

elegant fail if scipy<19

If scipy version isn't 19+ fooof fails in... non-obvious... ways. Just spent 10 minutes troubleshooting before I had an "ooooh wait" moment.

Add copy method

If you want to use the same object to run over different groups of data, collecting objects, right now you need to re-initialize, or explicitly copy. A copy method would probably be useful.

ToDo: Add test coverage

Probably the obvious way to do this - test on synthetic data created from our funcs (gaussian + background). At minimum, tests can check that the algorithm can accurately decompose these synthetic test cases.

ToDo:

To Add:

  • Implement post-fit, pre-multifit (optimize step) merging of overlapping oscillations.

To figure out:

  • Fit failures and what to do. Why is it so sensitive to f_low? Do we jitter f_low?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.