Comments (8)
This is the Truncated_Power_Law class trying to set some initial parameters
(alpha and lambda), from which it will numerally search for a better fit.
We get an initial estimate on alpha analytically from the data using this
formula:
alpha = 1 + len(data)/sum( log( data / (self.xmin) ))
The problem is when sum( log( data / (self.xmin) )) == 0. This could occur
if the only data points are equal to xmin (so data/xmin == 1, and log(1) ==
0). It could also occur if it just happens that sum of the logs is 0
(vanishingly unlikely).
The result of such a situation is that alpha = inf. So that's the starting
guess for the numerical search, and it should very quickly get off that and
go find a real value for alpha. Do you still get a real number for the
alpha on the truncated power law fit?
fit = powerlaw.Fit(data)
fit.truncated_power_law.alpha
I am interested in how/why this situation could arise to begin with. As I
said, this seems only likely to occur if all data is exactly equal to xmin.
If that were the case all the fits should look terrible, as we effectively
have one data point. What do the data and fits look like?
I suppose another possibility is if data and xmin were both ints (instead
of floats). As ints, 3/2 == 1, and log(1) == 0, etc. So data and xmin could
just be within the same order of magnitude, and not exactly equal. However,
xmin should at least be a float. Is it in your case? Maybe we should
forcibly cast all the data as float at the initialization of the Fit object.
On Thu, Feb 27, 2014 at 4:54 AM, Philipp Singer [email protected]:
Hi Jeff,
Just stumbled upon a runtime error by fitting the truncated power law
function. Here is the corresponding message:Assuming nested distributions
/usr/local/lib/python2.7/dist-packages/powerlaw.py:1351: RuntimeWarning:
divide by zero encountered in double_scalars
alpha = 1 + len(data)/sum( log( data / (self.xmin) ))
Traceback (most recent call last):
[...]
R, p = fit.distribution_compare('power_law', 'truncated_power_law',
normalized_ratio=True)
File "/usr/local/lib/python2.7/dist-packages/powerlaw.py", line 315, in
distribution_compare
[...]Thanks,
PhilippReply to this email directly or view it on GitHubhttps://github.com//issues/8
.
from powerlaw.
Sorry for late response. Seems like it is indeed an int problem. My data contained of ints instead of floats. By forcing floats the error does not occur any longer.
from powerlaw.
Thanks, Philipp. Perhaps we should cast all data as floats, then, so this
doesn't happen again.
On Wed, Mar 5, 2014 at 5:38 AM, Philipp Singer [email protected]:
Sorry for late response. Seems like it is indeed an int problem. My data
contained of ints instead of floats. By forcing floats the error does not
occur any longer.Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-36729633
.
from powerlaw.
All data is now casted as floats.
from powerlaw.
Hi Jeff,
I'm getting a similar error, I casted the data to float. But the error is still there:
fit.distribution_compare('power_law', 'truncated_power_law'
[...]
/home/myuser/anaconda/lib/python2.7/site-packages/powerlaw.pyc in _pdf_discrete_normalizer(self)
1383 from mpmath import exp # faster /here/ than numpy.exp
1384 C = ( float(exp(self.xmin * self.Lambda) /
-> 1385 lerchphi(exp(-self.Lambda), self.alpha, self.xmin)) )
1386 if self.xmax:
1387 Cxmax = ( float(exp(self.xmax * self.Lambda) /
/home/myuser/anaconda/lib/python2.7/site-packages/mpmath/ctx_mp_python.pyc in div(self, other)
/home/myuser/anaconda/lib/python2.7/site-packages/mpmath/libmp/libmpf.pyc in mpf_div(s, t, prec, rnd)
932 return fzero
933 if t == fzero:
--> 934 raise ZeroDivisionError
935 s_special = (not sman) and sexp
936 t_special = (not tman) and texp
ZeroDivisionError:
any possible solution??
from powerlaw.
It looks like lerchphi is returning a 0. Can you:
- Provide the data you used that led to this error.
- Provide the minimal data needed to create this error.
OR - return the values of self.Lamba, self.alpha, and self.xmin right before the error?
Thanks!
from powerlaw.
Hi Jeff,
the data is too big, about 150M points. I don't know how to share it with you (the csv file is about 900Mb).
I need some hours to load the data an fit it again to give you the value for these variables.
Thanks for your prompt response!!
from powerlaw.
How about trying on a smaller, random sample?
On Friday, October 31, 2014, robegs [email protected] wrote:
Hi Jeff,
the data is too big, about 150M points. I don't know how to share it with
you (the csv file is about 900Mb).I need some hours to load the data an fit it again to give you the value
for these variables.Thanks for your prompt response!!
—
Reply to this email directly or view it on GitHub
#8 (comment).
from powerlaw.
Related Issues (20)
- `estimate_discrete` should be False by default or raise a warning for x_min < 6 HOT 1
- p_value not computed from normalizes R HOT 6
- Issue with the x_min
- Curve fitted using power law is far from the data points
- Version label
- Added xmin computation does not work for distributions != power_law/truncated_power_law HOT 1
- power law plot showing fit and all data, not just data from xmin HOT 1
- New user: Why the curvature in power_law.plot_ccdf fit? HOT 14
- Defunct scipy import HOT 1
- threshold in powerlaw fit HOT 1
- Remove or make optional xmin fitting print
- Fitting a powerlaw with the xmax parameter HOT 17
- How to improve the efficiency of the fit.
- Get the estimates when i only have an probability distribution from empirical data
- Some issues in lognormal fit
- how to calculate the R value properly for discrete data
- Feature Request: Return the normalization constant HOT 9
- Please remove print statement on line 341 of powerlaw.py
- parameter1 attribute not set for fit.powerlaw HOT 1
- Can not pass 'bins' keyword to `plot_pdf` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powerlaw.