Fitting user-specified PDF, e.g. power spectral density about powerlaw HOT 7 OPEN

jeffalstott commented on July 22, 2024

Fitting user-specified PDF, e.g. power spectral density

from powerlaw.

Comments (7)

jeffalstott commented on July 22, 2024

Thanks, Ondrej! For your needs, this is the relevant paper: https://projecteuclid.org/euclid.aoas/1396966280 I haven't implemented it, but there may be an implementation somewhere here: http://tuvalu.santafe.edu/~aaronc/powerlaws/ If a good implementation were created for powerlaw, I'd happily bring it on board.

…

On Tue, Oct 2, 2018 at 10:20 AM Ondrej Grover ***@***.***> wrote: I really like your software, it makes it easier to judge the hype of powerlaws in datasets. However, right now it focuses on fitting full datasets, creating their PDf and CDF on the fly. I'd like to use in situations where I already have a PDF (defined at several points) - or generally a distribution function of some sort - and fits shape in some range. An example is the power spectral density of turbulent plasmas, where there is an ongoing discussion <https://dx.doi.org/10.1103/PhysRevLett.107.185003> whether they are powerlaws or exponentials. I'd be wiling to contribute modifications to powerlaw which would make this optional sue-case possible. But I would greatly appreciate if you could point out how best to approach this issue. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#62>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6_rwpkcC2lQ7V6ndeNkTO_JlVtoSrbks5ug3Y9gaJpZM4XEI98> .

from powerlaw.

smartass101 commented on July 22, 2024

Thank you for the reply.
My naive hope was that it would suffice to simply enable the user to specify the cdf and bins directly, i.e. set self.fitting_cdf_bins, self.fitting_cdf without the actual data as done [here](self.fitting_cdf_bins, self.fitting_cdf). Then I would probably have to change operations later on to operate on the CDF instead of the data itself.
Perhaps a reasonable approach would be to wrap the data in some object which would expose methods such as cdf, this would separate whatever source of the information on the data distribution from the actual calculation with the distribution.
But perhaps I have missed some part where access to actual data is necessary.
What do you think about this approach?

from powerlaw.

smartass101 commented on July 22, 2024

I also found out that their implementation of the operations on binned data is available at http://tuvalu.santafe.edu/~aaronc/powerlaws/bins/

from powerlaw.

jeffalstott commented on July 22, 2024

The methods currently in powerlaw do not do fitting based on the binned data; they work directly on the data points themselves. Binning is done only for visualizing PDFs (in a sense there is no binning for CDFs, which is actually a major reason to use them for visualization, as they do less damage to the data in presentation).

…

On Wed, Oct 3, 2018 at 2:44 AM Ondrej Grover ***@***.***> wrote: Thank you for the reply. My naive hope was that it would suffice to simply enable the user to specify the cdf and bins directly, i.e. set self.fitting_cdf_bins, self.fitting_cdf without the actual data as done [here](self.fitting_cdf_bins, self.fitting_cdf). Then I would probably have to change operations later on to operate on the CDF instead of the data itself. Perhaps a reasonable approach would be to wrap the data in some object which would expose methods such as cdf, this would separate whatever source of the information on the data distribution from the actual calculation with the distribution. But perhaps I have missed some part where access to actual data is necessary. What do you think about this approach? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6_r_GX72BB3SSY38M4tKXzolSJM_YRks5uhFzbgaJpZM4XEI98> .

from powerlaw.

smartass101 commented on July 22, 2024

I've been reading that article and I began to realize that it may not be directly applicable to the PSD case. The reason is that most algorithms (FFT or wavelet) do not give the PSD as a histogram, but rather actual point-wise estimates, i.e. PSD(f_k) for all f_k. The f_k can be spaced either linearly (usually the case with FFT-based algorithms) or logarithmically (often the case in continuous wavelet analysis).

A dirty (probably not completely wrong, but neither right) workaround would be to generate surrogate datasets based on the pdf given by the PSD. I've seen it done e.g. here.

Perhaps I should get in touch with Clauset and ask him for guidance in this.

from powerlaw.

smartass101 commented on July 22, 2024

Clauset seems to be on sabbatical. I had another idea, perhaps I could simply use the Kolmogorov-Smirnov test to determine the "distance" between the PSD and a given distribution. Chi^2 might be an alternative. But that would mean determining the fitted parameters an f_k_min at the same, time, not sure if that would be a problem.

from powerlaw.

smartass101 commented on July 22, 2024

Mentioning directly @aaronclauset in case you have time (and interest) to comment.

from powerlaw.

Fitting user-specified PDF, e.g. power spectral density about powerlaw HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs