Comments (6)
[ LP comment 1 by: joep, on 2010-10-13 17:48:56.032290+00:00 ]
see also new thread Oct 13, Logit predict
logit_res.mle_retvals['converged']
True
we could check at the end of the fit() what the return value of the optimization is, and do further inspection if converged is not true
from statsmodels.
[ LP comment 2 by: Skipper Seabold, on 2010-12-15 00:18:27.728435+00:00 ]
Code to replicate:
import scikits.statsmodels as sm
import scikits.statsmodels.discretemod as dm
import numpy as np
Endog = np.array([1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0])
Exog = np.array([[ 10. , -2.7, 12.3, 1. ],
[ -2.7, 8.1, -5.7, 1. ],
[ 0.6, -5.8, -7.7, 1. ],
[ 5.5, 0.6, 2. , 1. ],
[ -2.3, 10.6, -3.7, 1. ],
[ 0.3, -2.3, 0.1, 1. ],
[ -0.8, 0.3, -1.3, 1. ],
[ -4. , 1.3, -1.4, 1. ],
[ 9.4, -4. , 6.9, 1. ]])
GLM_Model = sm.GLM(Endog, Exog, family = sm.families.Binomial())
GLM_results = GLM_Model.fit()
print GLM_results.params
Logit_Model = dm.Logit(Endog, Exog)
Logit_results = Logit_Model.fit()
print Logit_results.params
A possible solution is something like (not sure what the correct default tolerance should be):
from scipy import optimize
def callback(params):
if np.allclose(Logit_Model.cdf(np.dot(Logit_Model.exog,params))-Logit_Model.endog,0,atol=1e-4):
raise ValueError("Perfect or Quasi-Perfect separation detected")
func = lambda params : -Logit_Model.loglike(params)
In [93]: ret = optimize.fmin_bfgs(func, np.zeros(4)+1, callback=callback)
ValueError Traceback (most recent call last)
/home/skipper/ in ()
/usr/local/lib/python2.6/dist-packages/scipy/optimize/optimize.pyc in fmin_bfgs(f, x0, fprime, args, gtol, norm, epsilon, maxiter, full_output, disp, retall, callback)
505 gfk = gfkp1
506 if callback is not None:
--> 507 callback(xk)
508 k += 1
509 gnorm = vecnorm(gfk,ord=norm)
/home/skipper/ in callback(params)
ValueError: Perfect or Quasi-Perfect separation detected
from statsmodels.
[ LP comment 3 by: Skipper Seabold, on 2010-12-15 15:26:45.168352+00:00 ]
It has been proposed to do something like:
def callback(params):
if np.allclose(Logit_Model.cdf(np.dot(Logit_Model.exog,
params))-Logit_Model.endog,0,atol=1e-4):
print "_Perfect or Quasi-Perfect separation detected_"
print "Results are most likely not useful"
raise ValueError
func = lambda params : -Logit_Model.loglike(params)
try:
ret = optimize.fmin_bfgs(func, np.zeros(4)+1, callback=callback)
except:
ret = optimize.fmin_bfgs(func, np.zeros(4)+1, maxiter=1)
This is ok, but it does not give xopt values that actually demonstrate perfect separation. Perhaps if in the callback, we attach params to the model and then use these as starting values for the optimization in the except case, this will work.
from statsmodels.
[ LP comment 4 by: joep, on 2010-12-15 15:43:06.050398+00:00 ]
the callback function needs to hold on to the current state of the optimizer, params. In fitting model it will be relatively easy, because we can attach it to the model instance.
self.callback_params = params
and restart the second optimization, in the except, with start values self.callback_params
from statsmodels.
discussion and example also in #66
summary method for Logit and Probit adds warning text about complete (quasi-) separation
more work is in https://github.com/statsmodels/statsmodels/tree/perfect-pred
from statsmodels.
committed raising an exception in PR #100
added option to turn of exception in PR #184
from statsmodels.
Related Issues (20)
- Unreliable auto-casting of pandas data in model fitters
- ENH: inter_rater.fleiss_kappa p-values and confidence interval HOT 1
- Statsmodels 0.14.2 release (NumPy 2 compat) HOT 7
- ENH/BUG: fixed scale in RLM, float scale_est HOT 3
- bug: 0.14.2 seems to have broken my dependency graph HOT 4
- BUG/DOC: unavailable datasets for docs notebooks
- mixed effects model with nested random effects by statsmodels.formula.api.negativebinomial
- BUG: RLM fit history uses WLS scale and not the robust scale estimate
- ARIMA(1,1,1) model doesn't seem to discard non-differentiated values after differentiation. HOT 3
- ENH: handle dummy and categorical variables in robust methods, e.g. cov, mahalanobis, subsampling
- ENH: roadmap, overview robust HOT 1
- Is the implementation of the clopper-pearson upperbound correct? HOT 1
- DOC: Some function links in 'Regression diagnostics' are broken HOT 3
- How to get in touch regarding a security concern HOT 4
- BUG: RLM fit with start_params raises if only one parameter
- Tweedie loglikelihood returning inf for some values and impacting constrained optimization. Related to use of scipy's wrigh_bessel function in computing likelihood. HOT 8
- Failed to install on android HOT 10
- bug:too many indices for array: array is 0-dimensional, but 1 were indexed
- stats/proportion.py's binom_test_reject_interval function's error HOT 2
- ENH: proportion_confint only has two-sided confidence intervals, no "alternative" option HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from statsmodels.