GithubHelp home page GithubHelp logo

Comments (4)

BenDundee avatar BenDundee commented on May 22, 2024

This one just bit me :)

For reference, please see my question and response in this thread: http://stats.stackexchange.com/questions/36064/calculating-r-squared-coefficient-of-determination-with-centered-vs-un-center

The common wisdom seems to be that the correct definition of R^2 is the one you mentioned, using the total sum of squares. Also, this is how R does it---this isn't a reason (in and of itself) to change things, but the community seems to have a certain expectation in this regard, at least.

I would say, this is an issue that deserves some bandwidth. Linear regression is a pretty basic thing for people to want to do with a statistical package...

from statsmodels.

jseabold avatar jseabold commented on May 22, 2024

Thanks for the links. Will have a look and sort this out.

A quick comment. Linear regression is indeed a pretty basic thing. We provide an OLS class that is fully correct. As soon as you fit the model without an intercept, you're not doing OLS AFAIK. Without an intercept, you're fitting a regression through the origin model (which is a strong substantive claim). The reason I've kicked this down the road so far is that it's still not really clear to me what a "correct" R^2 is in this case. The existence of the R^2 measure depends on the model being fit with an intercept. This (and other) R_0^2 measure [1] may be (somewhat) analogous to R^2 in that they're forced to be in (0,1) like psedo-R^2 measure for non-linear models, but R_0^2 is not the same as R^2. Ie., you can't compare R^2 and R_0^2 measures. For the most part, I've seen it recommended not to rely on R^2 for the RTO model since it can be wildly overinflated (with the higher uncentered sum of squares), and it's almost never the case in the models I deal with that you want to force the predictor to be zero when the regressors are zero. I always assumed this is why it's almost never discussed in textbooks.

In any event, I'll sit down with this before 0.5 and see if I can sort out the theory and the implications for the other inferential statistics. I definitely do not want to silently use a different definition for the no-constant model like R. For the record, SAS includes a big warning that R^2 is redefined and also uses the uncentered TSS.

[1] http://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-boosts-r2-in-linear-model#26205

See also #60.

from statsmodels.

BenDundee avatar BenDundee commented on May 22, 2024

I've been thinking about your comments a bit. I'd like to offer a bit of feedback, but I want to be clear that I'm sensitive to the fact that you (or yall) are the one(s) actually doing the work :)

In my mind, R^2 is a property of two data sets, not of the ordinary least squares algorithm for dealing with residuals. One must choose a model first, then use OLS (or GLS, or NNLS or...) to derive the regression constants. Whether one chooses to include an intercept in the model amounts to a modeling choice: either I believe the data is best represented by a model with an intercept, or I believe the data is best represented by a model without an intercept. The algorithm doesn't care whether you include a constant or not: it is perfectly happy to deal with the extra 1's.

The real reason for the change in definition (as I've come to understand it) is a consequence of a null hypothesis (our old friend). In either instance, the null hypothesis is "no relationship exists", which means "set the slope to 0". I hate to reference myself here, but I've seen no other concise explanation of the subject online (see link above). Couched in this way, it makes sense to me that there is a "right" definition of R^2 in the case without an intercept (however bad a statistic).

Given this, it's not clear (to me, at least) why silently doing the right thing is a bad idea. The definition of R^2 that is implemented in 0.4 is only correct for a model with an intercept, we both agree on that. The same definition of R^2 cannot be applied to a model without an intercept, we both agree on that. Given that you provide a function called "add_constant", I would (naively) expect the OLS class to figure out the relevant calculations, and return the correct version of R^2 when queried.

Anyway, I definitely want to say thanks for a great piece of software!

from statsmodels.

jseabold avatar jseabold commented on May 22, 2024

Closing this as a duplicate of #423 so there's less to keep track of. Fixed the RTO linear case as per your suggestions in this branch

https://github.com/jseabold/statsmodels/tree/handle-constant

but I need to look at how this will affect the rest of the code base.

from statsmodels.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.