Comments (25)
@rhjohnstone Thanks for the edits; it saved me typing! :)
To your question, xmin is determined by finding the value that minimizes the KS distance between the empirical distribution and a perfect power law. Take a look at the papers Clauset et al. or Alstott et al. for more info.
The larger issue is that the data is almost certainly not a power law, and if it is it doesn't have the interesting properties we look for in a power law. That data shows p(x) dropping ~6 orders of magnitude over ~1.5 OOM of x. That would be an alpha of ~4, which is very steep. The interesting properties of power laws happen at smaller/shallower alphas (mean undefined: 2 ; std undefined: 3). Also, the data is only over 1.5 OOM, so there's likely not enough data to differentiate it from an exponential. If you use 'distribution_compare' you'll probably find the power law fit is less likely than the exponential.
from powerlaw.
from powerlaw.
Counter is for demo, usually I pass a flat array of integers from 1 to 18. I also tried your suggestion with xmin but getting alpha=4.87
Could it be that I'm using Python 3.6? I had to upgrade the code provided by Clauset to work with my environment
from powerlaw.
from powerlaw.
I downloaded the Manufscript_code.ipynb file and rerun all cells - output the same as yours.
Is the algorithm in powerlab different from http://tuvalu.santafe.edu/~aaronc/powerlaws/plfit.py?
from powerlaw.
from powerlaw.
Hmm in Jupyter the 1.4.3 is displayed while pip tells that the version 1.4.4 is installed, am I using the older version? Seems the problem is with conda...
from powerlaw.
from powerlaw.
After some refactoring: the version in powerlaw.py was not updated to 1.4.4 that's why I'm seeing 1.4.3 :p Now I'm sure its the latest version, floats make no difference.
from powerlaw.
from powerlaw.
I've created a pdf with some simple tests.
from powerlaw.
from powerlaw.
Tried, output the same
Edit: not the same but similar
from powerlaw.
I'll go grab some sleep, we have 4 am :) Please keep me updated on your progress.
from powerlaw.
from powerlaw.
powerlaw.Fit([500,150,90,81,75,75,70,65,60,58,49,47,40]).alpha
3.1151851915645903
powerlaw.Fit([500,150,90,81,75,75,70,65,60,58,49,47,40], discrete=True).alpha
3.0771455810069
plfit.plfit([500,150,90,81,75,75,70,65,60,58,49,47,40])[0]
2.71
from powerlaw.
xmin = 58.0, xmin = 58.0, and xmin = 47 accordingly
from powerlaw.
from powerlaw.
from powerlaw.
My data has length of ~300
plfit.py gives xmin=1, alpha=2.59
powerlaw.py with discrete=True gives xmin=2, alpha=2.38
from powerlaw.
The same applies to data which is 10x larger
from powerlaw.
from powerlaw.
For floats are results are the same. With discrete=True and integer data powerlaw.py produces slightly lower alpha but the same xmin.
If I multiply each element in dataset with 100, results are the same.
If I add to each element 100, results vary greatly.
There seem to be some issue with the scale.
from powerlaw.
from powerlaw.
I guess I'm having similar problems to @polakowo. I get a very poor fit just using the most basic example in the readme (result is very similar for both discrete=True
and discrete=False
, as well as recording the data as int
or float
:
# Number of occurrences, starting from 1
counts = np.array("534602 192300 61903 25675 12380 6472 3727 2082 1441 908 650"
" 488 341 239 182 123 87 76 46 48 43 27 24 16 11 13 14 7 8 9"
" 7 7 4 3 3 1 0 2 0 1 1 0 0 0 0 1".split()).astype(int)
data = []
for i, count in enumerate(counts):
data += [i+1]*count
data = np.array(data)
powerlaw.plot_pdf(data, label="Data as PDF")
fit = powerlaw.Fit(data, discrete=True)
fit.plot_pdf(label="Fitted PDF")
plt.legend(loc=3, fontsize=14);
Am I just plotting the results incorrectly, or is something deeper going on?
Edit: I see now that the "fitted" plot is just using data points >= fit.power_law.xmin
, so it's just plotting the data but re-normalized? In which case, is there a simple function to just plot the fitted power curve?
Edit 2: I should be using fit.power_law.plot_pdf(label="Fitted PDF")
, and not fit.plot_pdf(label="Fitted PDF")
. This is the updated version:
fit = powerlaw.Fit(data, discrete=True)
powerlaw.plot_pdf(data[data>=fit.power_law.xmin], label="Data as PDF")
print(fit.power_law.alpha)
print(fit.power_law.xmin)
R, p = fit.distribution_compare('power_law', 'lognormal')
fit.power_law.plot_pdf(label="Fitted PDF", ls=":")
plt.legend(loc=3, fontsize=14);
which outputs
which I guess is OK, but I'm surprised at how large fit.power_law.xmin
is. Just inspecting the original data plot, the line appears straight enough down to 2 or 3. Is there an obvious reason (that I can fix) why it's choosing such a high value?
from powerlaw.
Related Issues (20)
- `estimate_discrete` should be False by default or raise a warning for x_min < 6 HOT 1
- Finding xmin for a truncated power law HOT 3
- Alpha exponent less than 1? HOT 4
- python 3.7 HOT 1
- Version label
- Added xmin computation does not work for distributions != power_law/truncated_power_law HOT 1
- power law plot showing fit and all data, not just data from xmin HOT 1
- New user: Why the curvature in power_law.plot_ccdf fit? HOT 14
- Defunct scipy import HOT 1
- threshold in powerlaw fit HOT 1
- Remove or make optional xmin fitting print
- Fitting a powerlaw with the xmax parameter HOT 17
- How to improve the efficiency of the fit.
- Get the estimates when i only have an probability distribution from empirical data
- Some issues in lognormal fit
- how to calculate the R value properly for discrete data
- Feature Request: Return the normalization constant HOT 9
- Please remove print statement on line 341 of powerlaw.py
- parameter1 attribute not set for fit.powerlaw HOT 1
- Can not pass 'bins' keyword to `plot_pdf` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powerlaw.