fooof-tools / fooof Goto Github PK
View Code? Open in Web Editor NEWParameterizing neural power spectra into periodic & aperiodic components.
Home Page: https://fooof-tools.github.io/
License: Apache License 2.0
Parameterizing neural power spectra into periodic & aperiodic components.
Home Page: https://fooof-tools.github.io/
License: Apache License 2.0
There could probably be some user friendly saving options. Building this into FOOOF itself would also give us a change to have some level of 'FOOOF data' standardization, making it easier to potentially share fooof results.
Options to add to FOOOF:
However, these, given the current organization of FOOOF would be implemented at a PSD by PSD level.
Options for dealing with multiple PSDs (probably the most common use case):
Some of the above ideas are not mutually exclusive.
Perhaps the first line of decision points are:
Another note:
The options above basically presume the current organization (a base object basically designed to run on a single PSD), is reasonable, and we should perhaps keep in mind that a larger refactor is perhaps more sensible.
Extension:
Also: these points are not particularly linked to v0.1, but are rather more general, and can mostly be addressed much later.
For data 11-JS.set, when f_range = [1, 35], fitting fails and throws the above error. I ran into this before, but forget what the solution is...
Error disappears and fit is successful for f_range = [1, 40]
This looks cool! @voytek it's basically the method you were working on with Matar, right?
I noticed you've got a bunch of demo-style notebooks in the root level of the repo. I think it'd be helpful if these were python files that were served with sphinx. What do you all think about this? (happy to help set that up if there's interest)
as an example, here's the MNE examples folder: https://github.com/mne-tools/mne-python/tree/master/examples
and this gets turned into this gallery:
http://martinos.org/mne/stable/auto_examples/index.html
with each example getting notebook-ified:
Add option to fit linear (equivalent).
would it be useful to have a collection of test PSDs from different recording modalities (LFP, ECoG, EEG, MEG, etc), not for any quantitative analysis of fooof, but for quick eyeballing that it still fits something reasonable for all the possible test cases when trying new algorithms?
Currently, the FOOOFResult object has a peak_params field that has one row per peak and 3 columns per peak. These columns refer to center freq, amp, and bw. However, this is not evident from the object itself or docstrings.
For example,
FOOOFResult(background_params=array([-21.57921891, 0.88217696]),
peak_params=array([[ 4.79002936, 0.10790063, 1. ],
[ 10.77517628, 0.8725322 , 2.48344685]]),
r_squared=0.9556209115836026, error=0.06597575096746296,
gaussian_params=array([[ 4.79002936, 0.10976303, 0.5 ],
[ 10.77517628, 0.87284017, 1.24172342]]))
It would be nice if the user was more explicitly informed of this structure. Like, instead of having peak_params, we could separate that into center_frequencies, amplitudes, and bandwidths.
Options:
There are notes in file: trim_psd, rounds bi-directionally to find closest values, and also not including last frequency. We should specific desired behaviour, update, and document.
It also expects PSD to be 2d. To Change?
TODO:
The README should have more info:
I'm using the glob
library to find all the FOOOFGroup .json files I want to load. glob returns full file paths, e.g. '/gh/data/megshape/fooof/102816_fooof_vertex.json'
However, FOOOFGroup.load('/gh/data/megshape/fooof/102816_fooof_vertex.json') returns an error because it is trying to load '/gh/data/megshape/fooof/102816_fooof_vertex.json.json' (extra .json).
So, ideally FOOOF would be able to recognize that the filename that I am passing into the load function already ends in .json.
Per note pulled out of fooof.py:
NOTE: depending how good our osc_gauss guesses are, it can 'induce' residuals. As in - making negative error. So - with a bad fit, we could add a lot of error, increased STD, and make us stop early.
435
436 # This is in index values - convert to frequency
--> 437 shortest_side = min(abs(le_ind - max_index), abs(ri_ind - max_index))
438 guess_bw = shortest_side * 2 * self.freq_res
439
UnboundLocalError: local variable 'ri_ind' referenced before assignment
Okay, the overlapping oscillations issue is really annoying. As much as I want to avoid an iterative procedure that requires more than one call to the gaussian multi-fitting function, we may need to, in some cases.
My proposed solution is to take the gaussian output from from FOOOF--which may have many noisy overlapping gaussian fits--and do kernel density estimation on that output. It looks like this on the right:
This integrates those overlapping gaussians, and then we can try running FOOOF on that, again.
seaborn has some nice KDE code that only has a scipy dependency, so it doesn't change our dependencies at all.
from scipy import stats, integrate
support = np.linspace(np.min(frequency_vector), np.max(frequency_vector), np.size(frequency_vector))
bandwidth = 1.06 * x.std() * x.size ** (-1 / 2.)
kernels = []
for i in x:
kernel = stats.norm(i, bandwidth).pdf(support)
kernels.append(kernel)
density = np.sum(kernels, axis=0)
density /= integrate.trapz(density, support)
I'm fitting a PSD of motor cortical ECoG data (same data as issue #20), and it may be a bit unique because it does not have a consistent slope across the frequency range I'm interested in fitting.
If I fit 1-100Hz, then I end up with a lot of "peaks" to fill in the power above the beta range, since the curve fit is at a much lower power. If I fit 1-40Hz, the peak seems reasonable, but I would have to know to do this, but the slope is pretty meaningless.
It would be really great if fooof was able to recognize if the slope fit is really bad and notify the user that the fit is not good. But I acknowledge that's a really tough problem. I'll try to think of possible ways to handle this.
After successfully running the tutorials, I'm trying to pass some computed spectra from experimental data through the FOOOFGroup methods, and got the following error:
File "run_fooof.py", line 18, in <module>
fg.fit(freqs, spectra_pre)
File "/data/purdongp/users/dwzhou/code/fooof/fooof/group.py", line 141, in fit
self._fit(power_spectrum=power_spectrum)
File "/data/purdongp/users/dwzhou/code/fooof/fooof/group.py", line 287, in _fit
super().fit(*args, **kwargs)
File "/data/purdongp/users/dwzhou/code/fooof/fooof/fit.py", line 316, in fit
self.background_params_ = self._robust_bg_fit(self.freqs, self.power_spectrum)
File "/data/purdongp/users/dwzhou/code/fooof/fooof/fit.py", line 527, in _robust_bg_fit
popt = self._simple_bg_fit(freqs, power_spectrum)
File "/data/purdongp/users/dwzhou/code/fooof/fooof/fit.py", line 505, in _simple_bg_fit
maxfev=5000, bounds=self._bg_bounds)
File "/PHShome/dwz0/.conda/envs/py36/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 701, in curve_fit
ydata = np.asarray_chkfinite(ydata)
File "/PHShome/dwz0/.conda/envs/py36/lib/python3.6/site-packages/numpy/lib/function_base.py", line 1233, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
I checked my input arrays with assertions for infs and NaNs, but they all passed. I noticed the comment about negative values of b in the source code where curve_fit is called. (Maybe not the same issue.) My input spectra have some negative values in them, so I added to the entire data to remove negative values and no errors appeared. Since spectra in log10 scale (mine are in dB) should be able to take on negative values, perhaps something should be done about curve_fit.
EDIT: There are other issues that come up with spectra containing 0 or 1 values that don't have to do with curve_fit.
EDIT2: I may have misunderstood the specifications for the input spectra. I thought they were supposed to be in log10 scale, so I precomputed my spectra in dB and then passed them. Looking at the plotted results, it seems like FOOOF took another log10 of them. So instead I input raw power spectra, and it's fine. Is this right?
Hi all, long-time watcher, first time issue submitter:
Encountered an error while importing FOOOF:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fooof/__init__.py", line 5, in <module>
from .fit import FOOOF
File "fooof/fit.py", line 42, in <module>
from fooof.core.io import save_fm, load_json
File "fooof/core/io.py", line 6, in <module>
from json import JSONDecodeError
ImportError: cannot import name JSONDecodeError
Is there a required version of the json library that I am not using?
Much thanks!
Per note in fooof.py:
NOTE: Currently, calculates based on nearest actual point. Should we instead estimate based an actual CF?
We should base on fitted gaussian, not nearest point.
I'm running FOOOFGroup on 7500 power spectra, and it'd be nice to have some periodic output to get a sense of how many are done / how much time is remaining.
If I'm not mistaken, the current 'verbose' option only alerts the user of warnings, but I think it'd be useful to have some other options for more updates. e.g. if verbose = 2, then print an update after every 100 PSDs.
Cloning the repo and it's 89.32 Mb! Is the release version that big? This thing fits a line(s) and finds some peaks. I mean I'm glad we've got tests/notebooks but there has to be way to trim it down.
There is a list of TODOs & NOTEs in the code itself (dev branch), which, once all addressed and cleaned up, should (hopefully) leave us with a clean v0.1.
Scipy has a convention that class attributes that are estimated from the data have a trailing underscore. This helps delineate not only attributes from methods, but also results from parameters.
We should add this. Makes it clearer what to grab from the model after fitting.
Brief overview here:
https://github.com/rasbt/python-machine-learning-book/blob/master/faq/underscore-convention.md
Many people who might be interested in this tool are primarily Matlab users. We should make it as easy as possible for them to integrate FOOOF into their pipelines.
Options (non-exclusive):
Add ways for people to easily get / view the tutorials:
we've already talked about this on slack, i'm just opening an issue for reminder.
in fit.py:
line 191: _clean_background_fit takes in linear frequency and log psd, and perform log on frequency inside _clean_background_fit
line 208: _quick_background_fit takes in log frequency and oscillation removed log psd.
it's confusing to keep track of whether frequency is logged or not inside self.fit. I recommend keeping frequency linear in the scope of self.fit, passing linear frequency into the 2 background_fit functions, and logging them within those.
In theory, asking to fit a knee, should reduce to a linear fit, if that is indeed a better fit. In practice, that is not the case - there are PSDs in which setting to not fit the knee (linear fit) leads to a better fit (in R^2 sense), then also fitting a knee parameter.
It's weird that with an extra parameter to play with, it can perform worse. It might have to do with the interaction between fitting slope & oscillations. It may also simply be due to the fact that fitting a knee effectively adds a new constraint - that the slope of the background go to zero below the knee - and this, in at least some cases, may be unhelpful.
Something to perhaps look into, play around with a bit. If nothing else, supports a strong suggestion to only try to fit a knee if there really is one (although that's potentially hard to evaluate).
Curve-fitting has a max number of iterations set, and (very rarely) FOOOF can fail if optimal parameters are not found within this bound. I think this comes from very messed up PSDs - but it's to be explored.
Also - perhaps alter the behaviour so it doesn't fail out if this occurs (instead, returns Null model or similar), so that this issue doesn't scrap a process running FOOOF across many PSDs. (Alternately, suggests users use a try/except clause).
Suggested API overhaul, responding to point 1 in #53 (following input from Erik):
I've gone back and forth on this a little in my mind, and I think it comes down to the appropriate level of abstraction to use. Part of it comes down to how we recommend thinking about 'oscillations', and I think we've sort of broken that abstraction, in that at times we do sort of need people to acknowledge that fitting gaussians != (separate) oscillations.
That said, I think I've come around to 'peaks'. We have two components to the fit: 'background' and 'peaks'. We don't need to imply that 'peaks' == 'oscillations', that's a layer of interpretation left to the user (and again, the main issue is that overlapping gaussians may be used to constitute what we would otherwise want to call '1 oscillation'), but we can talk about 'peaks' consistently, as a middle ground, without having to talk about 'below' (gaussians) or 'above' (oscillations). This does abstract away the gaussian aspect, helpfully, in so far as there is nothing fundamental about gaussians being used here (and this could even change in the future).
Now this is a pretty total API overhaul, slightly annoying to anyone who has started on our current version but, oh well, still better to do this now.
Note: following sklearn, the trailing are reserved for results from fitting the model.
Everything below is the API for the FOOOF object (public & private), except for the FOOOFGroup particulars at the end. I don't think this overhaul effects any external functions within the fooof module, but after we settle this here, I'll do a run through of the rest to make sure it's all consistent.
This is like, legitimately, the last thing - so please tear it apart, and then I'll push through the last things, and update everything with that.
So, full API description, and planned overhaul:
Note: mixture of private settings and intermediate model stuff.
FOOOF notes
plt
do not test for a matplotlib install. This will lead to not-graceful failures. Wrap the matplotlib import in a try/except and flag failure to import throughout the package so things go smoothly?https://github.com/voytekresearch/fooof/blob/master/fooof/fit.py#L365
. Who would do this? When? Why?(I'm going to do some runtime/performance analysis later as well...)
Right now FOOOF operates only on the power spectrum.
For v0.2 we should explore whether and to what degree that adding information from the phase spectrum improves FOOOF performance.
I've been investigating the couple PSDs that fail fitting (it's not a big problem - literally a couple in towards a million PSDs in MEG dataset) - but it's slightly annoying in that they fail in very much edge case suboptimal guess gaussian parameters. In particular, I think it's when it tries to fit gaussians a bit close to the edge / each other. Small jitters can fix this (for example, moving f_range by 1 point).
This got me playing a little with one of our 'magic' numbers - the degree of overlap with another gaussian at which we drop an oscillation guess before final fit (done in _drop_osc_overlap). Right now we have it set at 1 (and implicitly so, either way this should be noted as an internal parameter.)
There's no totally obvious way to choose this parameter in a purely data-driven approach (and it is currently 1.0 arbitrarily). Updating from 1.0 -> 1.5 or 2.0 (in units of bandwidth estimate of the gaussian guess) for example, marginally increases average error (on synthetic or real PSDs), but does reduce average number of gaussians fit (arguably reducing overfitting), so it's not entirely obvious what's better. I could do more explicit synthetic testing, but I reckon the answer becomes very dependent on the specific peak properties of the generated power spectra.
My quick impression from anecdotal fits, is that updating this number does reduce it being overzealous in uselessly adding more gaussians, so that's good (as this is one of our weakest points). It also does so in a manner not easily available to the user to control from public settings.
(Updating this upwards also seems to reduce likelihood of failing to fit, but from like 0.001% to something lower, so not really a relevant point).
tl-dr: I think updating this number (increasing it from 1.0 to 1.5 or 2.0) is better, but exactly where to is unclear. My only hesitation is making a change to the algorithm so late (at worst, I think it'll be benign, but there is always a non-zero chance it has some negative effects somewhere).
Thoughts?
Edit: I'm basically just thinking aloud here. I recommend updating this to 2.0.
I'm fitting a PSD (see below) for a signal after I removed line noise (data available in data.mat here). This biases the slope fit down. It might be a good idea to have a kwarg that allows the user to define frequency range(s) to ignore during the fit.
Per fooof.py note:
WARNING: INPUT IS LOGGED PSD & LINEAR FREQS (TO FIX, THEN UPDATE WARNING)
Get it ready to share!
Right now, FOOOFGroup saves to a .json file where each line is a FOOOFResult object. However, the group-wide oscillation information would be more easily accessible as a dataframe, in which each row is a peak, and the columns are: psd_index, center frequency, amp, bandwidth, etc.
So, there could be a function that converts the list of FOOOFResults (FOOOFGroup.group_results) into this dataframe.
As Tom mentioned IRL, this will add a dependency for pandas. But, that's common and why we safe_import.
Currently the returned amplitudes are from the multi-gaussian fitting, which takes into account all the gaussians at once. If there are overlapping gaussians, the amplitude for any given fitted gaussian is contingent upon the others. This defies our expectation of what amplitude is, which is the distance between the 1/f background and the amplitude at the center frequency.
Solution: after multifitting, overwrite the amplitude as the distance between the final background and the amplitude at each center frequency.
The current procedure can include oscillation guesses that are largely overlapping, or entirely within, another oscillation. Sometimes the actual fit moves them, sometimes not.
Potentially: Drop oscillations guesses based on overlap, and/or proximity.
@voytek
Okay, I think I figured it out:
tl-dr:
The formula for gaussians (gaussian_function) had a mistake (or, an oddity in it's definition I didn't notice). It was missing the '2*' in front of std. dev param, and so was expecting BW as defined across both sides, as opposed to across one side (half), as normal defined.
When I did the quick data-driven BW hack, I accidentally fulfilled this by passing in the double-BW, as opposed to one sided (sorta wrong - but worked given our gauss definition).
When I updated to FWHM proper formula, that formula returns BW as one sided, as usually defined, and passing this in to our gaussian definition, that expects double BW, led to wrong / halved BW guess, ultimately messing things up (small BW guess -> many more oscillations found). (Basically: after updating to FWHM estimation, we were using halved BW guesses).
SOLUTION:
What I suggest:
Let's keep the math clean, and update the Gaussian function to use STD as usually defined (the 'half BW'). This means passing in the FWHM estimation works properly (and FOOOF works properly again).
Now: to return to user, I think oscillation_params (as different from gaussian_params) should return (as FOOOF has done so far, the 2 sided Bandwidth. This is easy - we already have the separation of gaussian_params from oscillation_definitions, so just double the STD param to return two-sided BW to the user.
^This is suggested so that we don't have a weird / custom definition of gaussians.
PSDs are currently fit in log-log, which means it weighs higher frequency much more so than low frequencies, because evenly spaced frequencies in linear scale congregate in high frequencies in log scale.
Practically, this is not a huge concern. It might even give us the desired effect of ignoring non-1/f stuff in the low frequencies. But it's not immediately obvious that this is the way it currently works to the user. For 1.0, perhaps a note stating the above is sufficient. For future versions, the user would ideally get an option to fit equidistant frequencies in log space, or something that de-weighs higher frequencies (weighted regression).
FOOOGroup runs through a list of PSDs linearly - but every fit is independent. We should add parallel support for running across the matrix - probably get a big speed up for running large numbers of PSDs.
As of right now: the user could set up parallel outside the FOOOFGroup object (like I have set up for the MEG data) - but no reason not to have it offered internally.
There are still a fair few magic numbers - now pulled out and documented in the code itself. These need cleaning up:
a) can any be data-driven (become non-magic)
b) which, if any, should be settable parameters
c) If neither a or b, document them, and sanity check they are reasonable values.
If scipy version isn't 19+ fooof fails in... non-obvious... ways. Just spent 10 minutes troubleshooting before I had an "ooooh wait" moment.
Right now the mulitifitting parameter bounds for cf and bw are the same for all estimated Gaussians. This isn't ideal. For example we should let the fitted cf drift too far (bw * X?) from the identified peak.
If you want to use the same object to run over different groups of data, collecting objects, right now you need to re-initialize, or explicitly copy. A copy method would probably be useful.
Probably the obvious way to do this - test on synthetic data created from our funcs (gaussian + background). At minimum, tests can check that the algorithm can accurately decompose these synthetic test cases.
import fooof as ff
fms = ff.FOOOFGroup(verbose=False,bg_use_knee=True)
fms.fit_group(freqs,PSDs, freq_range=(2,100))
fms.plot()
this will take the knee parameters and plot them as slope in the report.
Have an option to shut up any warnings.
To Add:
To figure out:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.