Comments (8)
Thanks for using powerlaw
, David!
Your intuitions are good. However, try plotting the two degree distributions so you see the data you're asking the statistical test to deal with. Particularly try plotting both the PDF and the CCDF, then overlay powerlaw
's best fit lines (examples of how to do this are in the paper).
from powerlaw.
Here are the plotted ccdf (1) and pdf (2) functions of the sf_graph
(blue) and non_sf_graph
(red)
They don't really seem that similar. Are you saying that detecting a binomial degree distribution is hard/impossible jsut from the pdf or ccdf?
from powerlaw.
Also when looking at graphs i plotted myself, both pdf and ccdf of scale-free and random networks seem to be easily discernible:
ccdf_loglog.pdf
ccdf.pdf
degdist_loglog.pdf
degdist.pdf
from powerlaw.
Great! Now plot the fitted power law for fitnpl
. I.e. fitnpl.power_law.plot_pdf()
, IIRC (which I may not)
from powerlaw.
I don't think this is necessary. I can see that a statistical test could fit a power law to this graph.
My question is however why we don't use a binomial distribution for comparison, which would fit the graph unarguably better than a power_law. It would be close to a perfect fit on all data points. Look eg at ccdf_loglog.pdf. For a power law to fir that curve it would have to choose a pretty high min_x and would still not hit the data points in a better way than the actual original distribution (binomial) would.
Is it hard to implement? Is the binomial distribution not well defined for the ccdf? Does the exponential / stretched exponential distribution already cover it? Or was it simply not deemed useful to implement it?
from powerlaw.
Visualizing the fitted power law for fitnpl
should show that the power law is only actually fitted for the extreme tail of the distribution. This is the important insight: by default, powerlaw
finds the optimal value at which to cut off the tail of the distribution, where "optimal" is the tail that is best described by a power law. You will observe that fitnpl.xmin
is different from fitpl.xmin
(also callable with fitnpl.power_law.xmin
and fitpl.power_law.xmin
).
So, essentially, you're taking the tail of an exponential distribution, chopping off the tail that's near vertical, fitting a power law to that, and then asking if that near-vertical tail is better described by a power law or an exponential (or the other functional forms you tested). This is not what you want. The ability to notice undesirable values for xmin
is why printing xmin
is in the Basic Usage example.
tl;dr: Set xmin=1
when you call Fit
. Does that yield the behavior that was expected?
As for no binomial distribution being implemented: It would be very welcome! powerlaw
has a happy history of accepting pull requests from the community that implemented other distributions, like the stretched exponential.
from powerlaw.
Thanks for the elaborate response.
unfortunately setting xmin=1
does not solve the problem as now the power law tail of the scale-free network wont be detected.
But apparently the binomial distribution is not needed, as the exponential does fit very well for most small xmin.
I guess the issue can then be closed.
For the example I gave, I observe the following behavior when comparing 'powerlaw' vs 'exponential' distribution for different xmin:
xmin 1 to 7: power-law misclassified as exponential (p-val < 0.0001)
xmin 8 to 22: both graphs classified correctly (p-val < 0.001) <= sweet spot!
xmin 22 to 26: gnp graph unsure
xmin over 26: gnp graph misclassified as scale-free (p-val < 0.01)
So in the end it comes down to picking a good xmin
. Are there best practices on how to pick the xmin in that case? Surely there must be some measure of what constitutes as a "power law distribution"? If any tail can be fitted to a power law that defeats the purpose. But I guess this is not within the realm of the powerlaw package, which simply provides the tools.
Thanks for all the help!
from powerlaw.
Glad to help!
A few things to consider:
- At
xmin
of 26, the GNP graph has virtually no data left. - Getting good sampling from a power law distribution is hard. You generated 10,000 nodes, but the scale-free network still has a very ragged PDF above
k=20
or so. - If you consider the entirety of a distribution, the tail is by definition not much of the PDF, and so that data will not contribute as much to the likelihood function when fitting an equation to the full data. Thus, at small
xmin
we would expect that the tail of a power law would not contribute much. This is particularly true if we don't have good sampling of the power law, which will be particularly bad if the power law is steep (has a large negativealpha
). BA networks in the limit yield a degree distribution ofalpha
of -3. The "scale free" properties of power laws only start atalpha=-3
. If|alpha|<3
, the distribution's variance is undefined. If|alpha| <2
, the distribution's mean is undefined. This is to say, we only get the wacky "scale free" properties of power laws ifalpha
is small enough, which is when the tail is heavy enough, which is when we'll have good sampling of the tail, which is when we'll be most readily able to identify a power law. If|alpha|>3
, it's indeed hard to identify a power law, but also there's little point. Steep power laws aren't cool power laws. - Identifying tails is philosophically hard. Ideally, you have some semantic understanding of the system that generated the data, which will inform where
xmin
"ought" to be (or where anxmax
ought to be, which is described in the paper). Outside of that, there are no right answers (or at least there weren't when I was in this game several years ago). The original Clauset et al. paper that came up withpowerlaw
's procedure to identify anxmin
was trying to give the strongest possible support for a power law in a given dataset, in part so that they could then show that even then power laws weren't supported in a bunch of empirical datasets.
from powerlaw.
Related Issues (20)
- `estimate_discrete` should be False by default or raise a warning for x_min < 6 HOT 1
- p_value not computed from normalizes R HOT 6
- Alpha exponent less than 1? HOT 4
- python 3.7 HOT 1
- Version label
- Added xmin computation does not work for distributions != power_law/truncated_power_law HOT 1
- power law plot showing fit and all data, not just data from xmin HOT 1
- New user: Why the curvature in power_law.plot_ccdf fit? HOT 14
- Defunct scipy import HOT 1
- threshold in powerlaw fit HOT 1
- Remove or make optional xmin fitting print
- Fitting a powerlaw with the xmax parameter HOT 17
- How to improve the efficiency of the fit.
- Get the estimates when i only have an probability distribution from empirical data
- Some issues in lognormal fit
- how to calculate the R value properly for discrete data
- Feature Request: Return the normalization constant HOT 9
- Please remove print statement on line 341 of powerlaw.py
- parameter1 attribute not set for fit.powerlaw HOT 1
- Can not pass 'bins' keyword to `plot_pdf` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powerlaw.