Comments (7)
I do not know what's causing the maximum recursion error. However, the Fit
object contains all your data*, so it's huge. Maybe that's the origin of the problem. However, you probably don't want to actually store all your data; you already have that stored somewhere! What you probably want are the following values:
fit.xmin
fit.powerlaw.alpha
fit.powerlaw.sigma
fit.distribution_compare('power_law', 'exponential')
Those are some of the essentials, and they're only a few numbers. You may also want things like:
fit.distribution_compare('power_law', 'truncated_power_law')
fit.distribution_compare('power_law', 'lognormal')
fit.exponential.Lambda
fit.xmax (if you have one)
and so on. All of these values will be very small together, so storing them in a pickle or whatnot should be easy.
There are other values that you also might to store, which are nearly the same size as your data:
fit.Ds
fit.alphas
fit.sigmas
These are the value of D, alpha, and sigma for the power law fit at every possible value of xmin. So they're as long as the number of unique values in your dataset, which is probably huge! You probably don't want to store these, but you might if you have a tricky situation like that described in the "Multiple Possible Fits" section of the paper describing PLoS ONE. If you're not in that situation, don't save these.
So, what describes your use case? What are the features you're actually needing to save? The simplest route is to identify the features you need, put them in a dictionary and pickle that. But maybe you need a less simple route.
*It actually contains two copies of it: All of the original data, and then a copy of the data with just the points within xmin
and xmax
. To better handle very large datasets such as yours, perhaps we should come up with a system that doesn't store the data twice. Just have one copy of the data, and two views on it. Actually, this might be how numpy is handling the data, anyway. I'd have to check.
from powerlaw.
Thanks, Jeff!
I've tested with a small object as well and got the same issue.
data_fit = pl.Fit(data['min_rank'][:1000])
pickle.dump(data_fit,open('test.pck','wb'))
pickle.load(open('test.pck','rb'))
RuntimeError Traceback (most recent call last)
<ipython-input-140-a0aef4fe1530> in <module>()
----> 1 pickle.load(open('test.pck','rb'))
...
135
136 def __getattr__(self, name):
--> 137 if name in self.supported_distributions.keys():
138 #from string import capwords
139 #dist = capwords(name, '_')
RuntimeError: maximum recursion depth exceeded while calling a Python object
It might be related with some circular reference like in the fit.power_law.parent_Fit
, I don't know.
This pickle
issue seems to be similar to what happens with BeautifulSoup objects
.
Anyway, I was trying to store it because it took several hours to run and I was afraid of a python kernel crash and I could want to reproduce the results later (or on another machine). I'm gonna store the necessary information to recreate the Fit and Distribution objects!!
Thank you, again!
from powerlaw.
Huh! Thanks for pointing this out; I've replicated the issue. For now I am not going to alter anything, since nobody else has brought up the idea of pickling Fit
objects. However, I am going to leave this issue open in case someone wants to try to fix it.
It might be nice to have some standard procedure for grabbing all the "important" properties of a Fit
and storing them in some way. That is more of a UX question than anything.
"I'm gonna store the necessary information to recreate the Fit and Distribution objects!!"
On that note: The data points I suggested are not going to allow you to easily recreate a Fit
or Distribution
object. But, if you even know the optimal xmin
, then you can create a new Fit
with a manually defined xmin
(fit = powerlaw.Fit(data, xmin=my_optimal_xmin_that_I_stored
). That Fit
will be very easy to calculate, and from there you can calculate loglikelihood ratios fairly easily. What that Fit
will NOT have is things like Ds
or alphas
, which are desired in some edge cases but almost certainly not useful to you.
from powerlaw.
Hi guys,
Thanks for the useful thread. I've been searching around most of the afternoon for a solution to this. Its good to know it's not just a problem with my code anyway. I'd be interested to hear of any developments with this. Thanks for the great module!
Cheers
from powerlaw.
Hi @rnpgeo,
No advances so far. If you want to take a stab at it, you're very welcome to try. If you get a solution we can merge your update into powerlaw
.
from powerlaw.
Guys,
I think I've got a solution for this issue and it's much simpler than I thought it would be.
I haven't tested it thoroughly for all PL classes but at least for the Fit objects, it works.
The solution is simply calling the base implementation of the __getattr__
method for magic names in Fit class, changing it from.
line 144
def __getattr__(self, name):
if name in self.supported_distributions.keys():
...
to
def __getattr__(self, name):
if name.startswith('__') and name.endswith('__'):
return super(Fit, self).__getattr__(name)
if name in self.supported_distributions.keys():
...
from powerlaw.
Excellent! I can see how what's currently written would create undesired behaviors.
Question: Currently the __getattr__
is in a kind of stupid if/else
statement, where the else
returns an AttributeError
(in line 166.) Would your correction be better placed by integrating with that if/else
? For example:
def __getattr__(self, name):
if name in self.supported_distributions.keys():
...
else:
return super(Fit, self).__getattr__(name)
Would that accomplish the goal of fixing the pickling problem while also being more robust and elegant? If you test this out and determine that everything works (including the Distribution
objects), then I can either make the edit to the code or you can submit a pull request and I'll merge it in.
Thanks!
from powerlaw.
Related Issues (20)
- `estimate_discrete` should be False by default or raise a warning for x_min < 6 HOT 1
- p_value not computed from normalizes R HOT 6
- Issue with the x_min
- Curve fitted using power law is far from the data points
- Version label
- Added xmin computation does not work for distributions != power_law/truncated_power_law HOT 1
- power law plot showing fit and all data, not just data from xmin HOT 1
- New user: Why the curvature in power_law.plot_ccdf fit? HOT 14
- Defunct scipy import HOT 1
- threshold in powerlaw fit HOT 1
- Remove or make optional xmin fitting print
- Fitting a powerlaw with the xmax parameter HOT 17
- How to improve the efficiency of the fit.
- Get the estimates when i only have an probability distribution from empirical data
- Some issues in lognormal fit
- how to calculate the R value properly for discrete data
- Feature Request: Return the normalization constant HOT 9
- Please remove print statement on line 341 of powerlaw.py
- parameter1 attribute not set for fit.powerlaw HOT 1
- Can not pass 'bins' keyword to `plot_pdf` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powerlaw.