GithubHelp home page GithubHelp logo

elenasm7 / dsc-2-24-04-interactions-lab-nyc-ds-career-012819 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from learn-co-students/dsc-2-24-04-interactions-lab-nyc-ds-career-012819

0.0 1.0 0.0 240 KB

License: Other

Jupyter Notebook 100.00%

dsc-2-24-04-interactions-lab-nyc-ds-career-012819's Introduction

Interactions - Lab

Introduction

In this lab, you'll explore interactions in the Boston Housing data set.

Objectives

You will be able to:

  • Understand what interactions are
  • Understand how to accommodate for interactions in regression

Build a baseline model

You'll use a couple of built-in functions, which we imported for you below.

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Import the Boston data set using load_boston(). We won't bother to preprocess the data in this lab. If you still want to build a model in the end, you can do that, but this lab will just focus on finding meaningful insights in interactions and how they can improve $R^2$ values.

regression = LinearRegression()
boston = load_boston()

Create a baseline model which includes all the variables in the Boston housing data set to predict the house prices. The use 10-fold cross-validation and report the mean $R^2$ value as the baseline $R^2$.

## code here
baseline

See how interactions improve your baseline

Next, create all possible combinations of interactions, loop over them and add them to the baseline model one by one to see how they affect the R^2. We'll look at the 3 interactions which have the biggest effect on our R^2, so print out the top 3 combinations.

You will create a for loop to loop through all the combinations of 2 predictors. You can use combinations from itertools to create a list of all the pairwise combinations. To find more info on how this is done, have a look here.

from itertools import combinations
combinations = list(combinations(boston.feature_names, 2))
## code to find top 3 interactions by R^2 value here

Look at the top 3 interactions: "RM" as a confounding factor

The top three interactions seem to involve "RM", the number of rooms as a confounding variable for all of them. Let's have a look at interaction plots for all three of them. This exercise will involve:

  • splitting our data up in 3 groups: one for houses with a few rooms, one for houses with a "medium" amount of rooms, one for a high amount of rooms.
  • Create a function build_interaction_rm. This function takes an argument varname (which can be set equal to the column name as a string) and a column description (which describes the variable or varname, to be included on the x-axis of the plot). The function outputs a plot that uses "RM" as a confounding factor.

We split the data set for high, medium and low amount of rooms for you.

rm = np.asarray(df[["RM"]]).reshape(len(df[["RM"]]))
high_rm = all_data[rm > np.percentile(rm, 67)]
med_rm = all_data[(rm > np.percentile(rm, 33)) & (rm <= np.percentile(rm, 67))]
low_rm = all_data[rm <= np.percentile(rm, 33)]

Create build_interaction_rm.

def build_interaction_rm(varname, description):
    pass

Next, use build_interaction_rm with the three variables that came out with the highest effect on $R^2$

# first plot
# second plot
# third plot

Build a final model including all three interactions at once

Use 10-fold crossvalidation.

# code here
# code here

Our $R^2$ has increased considerably! Let's have a look in statsmodels to see if all these interactions are significant.

# code here

What is your conclusion here?

# formulate your conclusion

Summary

You now understand how to include interaction effects in your model!

dsc-2-24-04-interactions-lab-nyc-ds-career-012819's People

Contributors

loredirick avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.