In this lab, you'll practice your feature scaling and normalization skills!
You will be able to:
- Identify if it is necessary to perform log transformations on a set of features
- Perform log transformations on different features of a dataset
- Determine if it is necessary to perform normalization/standardization for a specific model or set of data
- Compare the different standardization and normalization techniques
- Use standardization/normalization on features of a dataset
Let's import our Ames Housing data.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn')
ames = pd.read_csv('ames.csv')
Since there are so many features it is helpful to filter the columns by datatype and number of unique values. A heuristic you might use to select continous variables might be a combination of features that are not object datatypes and have at least a certain amount of unique values.
# Your code here
We can see from our histogram of the contiuous features that there are many examples where there are a ton of zeros. For example, WoodDeckSF (square footage of a wood deck) gives us a positive number indicating the size of the deck and zero if no deck exists. It might have made sense to categorize this variable to "deck exists or not (binary variable 1/0). Now you have a zero-inflated variable which is cumbersome to work with.
Lets drop these zero-inflated variables for now and select the features which don't have this characteristic.
# Select non zero-inflated continuous features as ames_cont
ames_cont = None
# Your code here
Store your final features in a DataFrame features_final
:
# Your code here
Great! You've now got some hands-on practice transforming data using log transforms, feature scaling, and normalization!