GithubHelp home page GithubHelp logo

mod1-section10-linreg-lesson's Introduction

Questions

Objectives

YWBAT

  • define linear regression
  • create an example of linear regression
  • describe what the various parts of the linreg equation do
  • calculate the error of a linear regression equation

What is linear regression?

When do we use it?

Let's make an example with some data!!!!!

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.xkcd()
<contextlib._GeneratorContextManager at 0x1c20065e80>
x_vals = np.linspace(0, 100, 50)
slope = np.random.randint(20, 50)
errors = np.random.normal(100, 200, 50)
bias = np.random.randint(20, 200)
y_vals = slope*x_vals + bias + errors
plt.scatter(x_vals, y_vals)
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("data points")
plt.show()

png

Let's check the correlation coefficient

np.corrcoef(x_vals, y_vals)
array([[1.        , 0.98366979],
       [0.98366979, 1.        ]])
### let's just guess a slope
slope_guess = 21
bias_guess = 0
### this yields a y_hat array of
y_hat = slope_guess*x_vals + bias_guess
plt.scatter(x_vals, y_vals)
plt.plot(x_vals, y_hat, c='r', label='y-hat')
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("data points")
plt.legend()
plt.show()

png

RMSE equation

### Yikes! How bad is this?

### Let's create our RMSE equations

def RMSE(y_true, y_pred):
    num = np.sum((y_true-y_pred)**2)
    den = np.sum((y_true - y_true.mean())**2)
    return 1 - 1.0*num / den


def RMSE2(y_true, y_pred):
    num = np.sum((y_pred - y_true.mean())**2)
    den = np.sum((y_true - y_true.mean())**2)
    return 1.0 * num / den
RMSE(y_vals, y_hat)
0.349614137436005
RMSE2(y_vals, y_hat)
0.846406783242641

which one will python use? Let's import from sklearn.metrics

from sklearn.metrics import r2_score
r2_score(y_vals, y_hat)
0.349614137436005

now, how can we do this using statsmodels?

import statsmodels.api as sm
x = sm.add_constant(x_vals)
linreg = sm.OLS(y_vals, x).fit()
summary = linreg.summary()
summary
OLS Regression Results
Dep. Variable: y R-squared: 0.968
Model: OLS Adj. R-squared: 0.967
Method: Least Squares F-statistic: 1434.
Date: Tue, 07 May 2019 Prob (F-statistic): 2.07e-37
Time: 21:19:13 Log-Likelihood: -331.45
No. Observations: 50 AIC: 666.9
Df Residuals: 48 BIC: 670.7
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 53.5332 52.073 1.028 0.309 -51.167 158.233
x1 33.9789 0.897 37.865 0.000 32.175 35.783
Omnibus: 2.110 Durbin-Watson: 1.886
Prob(Omnibus): 0.348 Jarque-Bera (JB): 1.445
Skew: 0.172 Prob(JB): 0.486
Kurtosis: 2.242 Cond. No. 114.

Let's interpret this!


# how close were we?
slope, bias
(32, 51)
ols_slope, ols_bias = linreg.params
ols_y_hat = ols_slope*x_vals + ols_bias
ols_slope, ols_bias
(53.53323559323411, 33.97888956550166)
plt.scatter(x_vals, y_vals)
plt.plot(x_vals, ols_y_hat, c='r', label='ols y-hat')
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("data points")
plt.legend()
plt.show()

png

best_y_hat = slope*x_vals + bias
plt.scatter(x_vals, y_vals)
plt.plot(x_vals, best_y_hat, c='r', label='best y-hat')
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("data points")
plt.legend()
plt.show()

png

what did we learn?

mod1-section10-linreg-lesson's People

Contributors

erdos2n avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.