Comments (4)
This one just bit me :)
For reference, please see my question and response in this thread: http://stats.stackexchange.com/questions/36064/calculating-r-squared-coefficient-of-determination-with-centered-vs-un-center
The common wisdom seems to be that the correct definition of R^2 is the one you mentioned, using the total sum of squares. Also, this is how R does it---this isn't a reason (in and of itself) to change things, but the community seems to have a certain expectation in this regard, at least.
I would say, this is an issue that deserves some bandwidth. Linear regression is a pretty basic thing for people to want to do with a statistical package...
from statsmodels.
Thanks for the links. Will have a look and sort this out.
A quick comment. Linear regression is indeed a pretty basic thing. We provide an OLS class that is fully correct. As soon as you fit the model without an intercept, you're not doing OLS AFAIK. Without an intercept, you're fitting a regression through the origin model (which is a strong substantive claim). The reason I've kicked this down the road so far is that it's still not really clear to me what a "correct" R^2 is in this case. The existence of the R^2 measure depends on the model being fit with an intercept. This (and other) R_0^2 measure [1] may be (somewhat) analogous to R^2 in that they're forced to be in (0,1) like psedo-R^2 measure for non-linear models, but R_0^2 is not the same as R^2. Ie., you can't compare R^2 and R_0^2 measures. For the most part, I've seen it recommended not to rely on R^2 for the RTO model since it can be wildly overinflated (with the higher uncentered sum of squares), and it's almost never the case in the models I deal with that you want to force the predictor to be zero when the regressors are zero. I always assumed this is why it's almost never discussed in textbooks.
In any event, I'll sit down with this before 0.5 and see if I can sort out the theory and the implications for the other inferential statistics. I definitely do not want to silently use a different definition for the no-constant model like R. For the record, SAS includes a big warning that R^2 is redefined and also uses the uncentered TSS.
See also #60.
from statsmodels.
I've been thinking about your comments a bit. I'd like to offer a bit of feedback, but I want to be clear that I'm sensitive to the fact that you (or yall) are the one(s) actually doing the work :)
In my mind, R^2 is a property of two data sets, not of the ordinary least squares algorithm for dealing with residuals. One must choose a model first, then use OLS (or GLS, or NNLS or...) to derive the regression constants. Whether one chooses to include an intercept in the model amounts to a modeling choice: either I believe the data is best represented by a model with an intercept, or I believe the data is best represented by a model without an intercept. The algorithm doesn't care whether you include a constant or not: it is perfectly happy to deal with the extra 1's.
The real reason for the change in definition (as I've come to understand it) is a consequence of a null hypothesis (our old friend). In either instance, the null hypothesis is "no relationship exists", which means "set the slope to 0". I hate to reference myself here, but I've seen no other concise explanation of the subject online (see link above). Couched in this way, it makes sense to me that there is a "right" definition of R^2 in the case without an intercept (however bad a statistic).
Given this, it's not clear (to me, at least) why silently doing the right thing is a bad idea. The definition of R^2 that is implemented in 0.4 is only correct for a model with an intercept, we both agree on that. The same definition of R^2 cannot be applied to a model without an intercept, we both agree on that. Given that you provide a function called "add_constant", I would (naively) expect the OLS class to figure out the relevant calculations, and return the correct version of R^2 when queried.
Anyway, I definitely want to say thanks for a great piece of software!
from statsmodels.
Closing this as a duplicate of #423 so there's less to keep track of. Fixed the RTO linear case as per your suggestions in this branch
https://github.com/jseabold/statsmodels/tree/handle-constant
but I need to look at how this will affect the rest of the code base.
from statsmodels.
Related Issues (20)
- ENH/BUG: fixed scale in RLM, float scale_est HOT 3
- bug: 0.14.2 seems to have broken my dependency graph HOT 4
- BUG/DOC: unavailable datasets for docs notebooks
- mixed effects model with nested random effects by statsmodels.formula.api.negativebinomial
- BUG: RLM fit history uses WLS scale and not the robust scale estimate
- ARIMA(1,1,1) model doesn't seem to discard non-differentiated values after differentiation. HOT 3
- ENH: handle dummy and categorical variables in robust methods, e.g. cov, mahalanobis, subsampling
- ENH: roadmap, overview robust HOT 1
- Is the implementation of the clopper-pearson upperbound correct? HOT 1
- DOC: Some function links in 'Regression diagnostics' are broken HOT 3
- How to get in touch regarding a security concern HOT 4
- BUG: RLM fit with start_params raises if only one parameter
- Tweedie loglikelihood returning inf for some values and impacting constrained optimization. Related to use of scipy's wrigh_bessel function in computing likelihood. HOT 8
- Failed to install on android HOT 10
- bug:too many indices for array: array is 0-dimensional, but 1 were indexed
- stats/proportion.py's binom_test_reject_interval function's error HOT 2
- ENH: proportion_confint only has two-sided confidence intervals, no "alternative" option HOT 3
- ENH: robust default start_params in M-estimators, RLM, and CovM ?
- ENH: maybe, S- and M-estimator for multivariate mean, scatter for large k_vars HOT 3
- ENH: pseudo-determinant, matrix determinant for singular cov, mahalanobis
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from statsmodels.