Expanding Bayes theorem to account for multiple observations and conditional probabilities drastically increases predictive power. In essence, it allows you to develop a belief network taking into account all of the available information regarding the scenario. In this lesson, you'll take a look at one particular implementation of a multinomial naive Bayes algorithm: Gaussian Naive Bayes.
You will be able to:
- Explain the Gaussian Naive Bayes algorithm
- Implement the Gaussian Naive Bayes (GNB) algorithm using SciPy and NumPy
Multinomial Bayes expands upon Bayes' Theorem to multiple observations.
Recall that Bayes' Theorem is:
Expanding to multiple features, the Multinomial Bayes' formula is:
Here y is an observation class while
With that, let's dig into the formula a little more to get a deeper understanding. In the numerator, you multiply the product of the conditional probabilities
To calculate each of the conditional probabilities in the numerator,
With that, you have:
Where
From there, each of the relative posterior probabilities are calculated for each of the classes.
The largest of these is the class which is the most probable for the given observation.
With that, let's take a look in practice to try to make this process a little clearer.
First, let's load in the Iris dataset to use to demonstrate Gaussian Naive Bayes.
from sklearn import datasets
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = iris.feature_names
y = pd.DataFrame(iris.target)
y.columns = ['Target']
df = pd.concat([X,y], axis=1)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | Target | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | 0 |
1 | 4.9 | 3.0 | 1.4 | 0.2 | 0 |
2 | 4.7 | 3.2 | 1.3 | 0.2 | 0 |
3 | 4.6 | 3.1 | 1.5 | 0.2 | 0 |
4 | 5.0 | 3.6 | 1.4 | 0.2 | 0 |
For good measure, it's always a good idea to briefly examine the data. In this case, let's check how many observations there are for each flower species:
df.Target.value_counts()
2 50
1 50
0 50
Name: Target, dtype: int64
Next, you calculate the mean and standard deviation within a class for each of the features. You'll then use these values to calculate the conditional probability of a particular feature observation for each of the classes.
aggs = df.groupby('Target').agg(['mean', 'std'])
aggs
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |||||
---|---|---|---|---|---|---|---|---|
mean | std | mean | std | mean | std | mean | std | |
Target | ||||||||
0 | 5.006 | 0.352490 | 3.418 | 0.381024 | 1.464 | 0.173511 | 0.244 | 0.107210 |
1 | 5.936 | 0.516171 | 2.770 | 0.313798 | 4.260 | 0.469911 | 1.326 | 0.197753 |
2 | 6.588 | 0.635880 | 2.974 | 0.322497 | 5.552 | 0.551895 | 2.026 | 0.274650 |
Take another look at how to implement point estimates for the conditional probabilities of a feature for a given class. To do this, you'll simply use the PDF of the normal distribution. (Again, there can be some objection to this method as the probability of a specific point for a continuous distribution is 0. Some statisticians bin the continuous distribution into a discrete approximation to remedy this, but doing so requires additional work and the width of these bins is an arbitrary value which will potentially impact results.)
from scipy import stats
def p_x_given_class(obs_row, feature, class_):
mu = aggs[feature]['mean'][class_]
std = aggs[feature]['std'][class_]
obs = df.iloc[obs_row][feature] #observation
p_x_given_y = stats.norm.pdf(obs, loc=mu, scale=std)
return p_x_given_y
p_x_given_class(0, 'petal length (cm)', 0) #Notice how this is not a true probability; you can get values >1.
2.1480249640403133
row = 100
c_probs = []
for c in range(3):
p = len(df[df.Target==c])/len(df) #Initialize probability to relative probability of class
for feature in X.columns:
p *= p_x_given_class(row, feature, c) #Update the probability using the point estimate for each feature
c_probs.append(p)
c_probs
[1.0567655687463235e-247, 2.460149009916488e-12, 0.023861042537402642]
While you haven't even attempted to calculate the denominator for the original equation,
That is, the probability
def predict_class(row):
c_probs = []
for c in range(3):
p = len(df[df.Target==c])/len(df) #Initialize probability to relative probability of class
for feature in X.columns:
p *= p_x_given_class(row, feature, c)
c_probs.append(p)
return np.argmax(c_probs)
Let's also take an example row to test this new function.
row = 0
df.iloc[row]
sepal length (cm) 5.1
sepal width (cm) 3.5
petal length (cm) 1.4
petal width (cm) 0.2
Target 0.0
Name: 0, dtype: float64
predict_class(row)
0
Nice! It appears that this predict_class()
function has correctly predicted the class for this first row! Now it's time to take a look at how accurate this function is across the entire dataset!
In order to determine the overall accuracy of your newly minted Gaussian Naive Bayes classifier, you'll need to generate predictions for all of the rows in the dataset. From there, you can then compare these predictions to the actual class values stored in the 'Target' column. Take a look:
df['Predictions'] = [predict_class(row) for row in df.index]
df['Correct?'] = df['Target'] == df['Predictions']
df['Correct?'].value_counts(normalize=True)
True 0.96
False 0.04
Name: Correct?, dtype: float64
Welcome Bayesian! You're well on your way to using Bayesian statistics in the context of machine learning! In this lesson, you saw how to adapt Bayes theorem along with your knowledge of the normal distribution to create a machine learning classifier known as Gaussian Naive Bayes.