GithubHelp home page GithubHelp logo

Comments (25)

jeffalstott avatar jeffalstott commented on July 3, 2024 1

@rhjohnstone Thanks for the edits; it saved me typing! :)

To your question, xmin is determined by finding the value that minimizes the KS distance between the empirical distribution and a perfect power law. Take a look at the papers Clauset et al. or Alstott et al. for more info.

The larger issue is that the data is almost certainly not a power law, and if it is it doesn't have the interesting properties we look for in a power law. That data shows p(x) dropping ~6 orders of magnitude over ~1.5 OOM of x. That would be an alpha of ~4, which is very steep. The interesting properties of power laws happen at smaller/shallower alphas (mean undefined: 2 ; std undefined: 3). Also, the data is only over 1.5 OOM, so there's likely not enough data to differentiate it from an exponential. If you use 'distribution_compare' you'll probably find the power law fit is less likely than the exponential.

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

Counter is for demo, usually I pass a flat array of integers from 1 to 18. I also tried your suggestion with xmin but getting alpha=4.87
unknown-2

Could it be that I'm using Python 3.6? I had to upgrade the code provided by Clauset to work with my environment

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

I downloaded the Manufscript_code.ipynb file and rerun all cells - output the same as yours.

Is the algorithm in powerlab different from http://tuvalu.santafe.edu/~aaronc/powerlaws/plfit.py?

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

Hmm in Jupyter the 1.4.3 is displayed while pip tells that the version 1.4.4 is installed, am I using the older version? Seems the problem is with conda...

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

After some refactoring: the version in powerlaw.py was not updated to 1.4.4 that's why I'm seeing 1.4.3 :p Now I'm sure its the latest version, floats make no difference.

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

I've created a pdf with some simple tests.

tests.pdf

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

Tried, output the same
Edit: not the same but similar

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

I'll go grab some sleep, we have 4 am :) Please keep me updated on your progress.

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

powerlaw.Fit([500,150,90,81,75,75,70,65,60,58,49,47,40]).alpha
3.1151851915645903

powerlaw.Fit([500,150,90,81,75,75,70,65,60,58,49,47,40], discrete=True).alpha
3.0771455810069

plfit.plfit([500,150,90,81,75,75,70,65,60,58,49,47,40])[0]
2.71

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

xmin = 58.0, xmin = 58.0, and xmin = 47 accordingly

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

unknown-3
unknown-4
unknown-5

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

My data has length of ~300

plfit.py gives xmin=1, alpha=2.59
unknown-6

powerlaw.py with discrete=True gives xmin=2, alpha=2.38
unknown-7

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

The same applies to data which is 10x larger

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

polakowo avatar polakowo commented on July 3, 2024

For floats are results are the same. With discrete=True and integer data powerlaw.py produces slightly lower alpha but the same xmin.

If I multiply each element in dataset with 100, results are the same.
If I add to each element 100, results vary greatly.

There seem to be some issue with the scale.

from powerlaw.

jeffalstott avatar jeffalstott commented on July 3, 2024

from powerlaw.

rhjohnstone avatar rhjohnstone commented on July 3, 2024

I guess I'm having similar problems to @polakowo. I get a very poor fit just using the most basic example in the readme (result is very similar for both discrete=True and discrete=False, as well as recording the data as int or float:

# Number of occurrences, starting from 1
counts = np.array("534602 192300 61903 25675 12380 6472 3727 2082 1441 908 650"
                " 488 341 239 182 123 87 76 46 48 43 27 24 16 11 13 14 7 8 9"
                " 7 7 4 3 3 1 0 2 0 1 1 0 0 0 0 1".split()).astype(int)
data = []
for i, count in enumerate(counts):
    data += [i+1]*count
data = np.array(data)

powerlaw.plot_pdf(data, label="Data as PDF")

fit = powerlaw.Fit(data, discrete=True)
fit.plot_pdf(label="Fitted PDF")
plt.legend(loc=3, fontsize=14);

which outputs
image

Am I just plotting the results incorrectly, or is something deeper going on?

Edit: I see now that the "fitted" plot is just using data points >= fit.power_law.xmin, so it's just plotting the data but re-normalized? In which case, is there a simple function to just plot the fitted power curve?

Edit 2: I should be using fit.power_law.plot_pdf(label="Fitted PDF"), and not fit.plot_pdf(label="Fitted PDF"). This is the updated version:

fit = powerlaw.Fit(data, discrete=True)
powerlaw.plot_pdf(data[data>=fit.power_law.xmin], label="Data as PDF")
print(fit.power_law.alpha)
print(fit.power_law.xmin)
R, p = fit.distribution_compare('power_law', 'lognormal')
fit.power_law.plot_pdf(label="Fitted PDF", ls=":")
plt.legend(loc=3, fontsize=14);

which outputs
image
which I guess is OK, but I'm surprised at how large fit.power_law.xmin is. Just inspecting the original data plot, the line appears straight enough down to 2 or 3. Is there an obvious reason (that I can fix) why it's choosing such a high value?

from powerlaw.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.