In this lesson, you'll learn about negative binomial trials, and the negative binomial distribution!
You will be able to:
- Understand and explain the Negative Binomial Distribution and its uses
To understand the Negative Binomial Distribution, we first need to have a clear understanding of what it describes--Negative Binomial Trials. This sounds more intimidating that it actually is--the idea is actually pretty straightforward.
Consider the following question:
I have a fair coin. Let's consider heads a success, and tails a failure. How many times can I flip the coin before I fail 3 times?
The first thought you'll probably have is that there's no single answer to this--instead, the answer falls across a distribution of probabilities. It's possible that our first three flips in are all tails. It's also possible (but exceedingly unlikely) that we flip the coin 100 times (or 1,000, or 1,000,000 times) and not still have heads show up less than 3 times in total.
The Negative Binomial Distribution allows us to easily describe the probability distribution of the different ways a Negative Binomial Trial could work out.
In more formal terms, the Negative Binomial Distribution requires the following parameters:
The Negative Binomial Distribution describes the number of successes $ k $ until observing a pre-determined number of failures $ r $ where the probability of success for each independent trail is $ p $.
Note that since there's no such thing as half a tails or half a trial, this means that the Negative Binomial Distribution is a Discrete Distribution, since it's concerned with multiple discrete, independent events.
Note that depending on where you look, you'll see some sources that define
For the sake of simplicity, we'll define
You may recall the Binomial Distribution that we learned about previously. Comparing and contrasting it with the Negative Binomial Distribution helps us better understand what each is used for.
The Binomial Distribution describes the number of successes $ k $ achieved in $ n $ trials, where the probability of success is $ p $.
The Negative Binomial Distribution describes the number of successes $ k $ until observing $ r $ failures (or successes--this is arbitrary, and depends on how you phrase the question; it doesn't particularly matter if we define heads or tails as a failure, as long as we pick one). Note that these failures do not need to be consecutive, just cumulative!
Let's work through an example of phrasing a problem that would be described by the binomial distribution, and phrasing another problem that would be described by the negative binomial distribution.
Binomial Distribution: "I flip a fair coin 5 times. What are the chances that I get heads 0 times? 1 time? 2 times? Etc..."
Negative Binomial Distribution: I flip a fair coin 5 times. What are the chances it takes me two flips to get heads twice? How about 3 flips to get heads twice? 4 Flips? Etc..."
Now that we know what we know about the Negative Binomial Distribution, let's set some parameters for the coin-flipping experiment we described above and take a look at the corresponding Negative Binomial Distribution that describes it.
Let's define our problem statement as:
"I'm going to flip a fair coin 10 times. I want to see how long it takes for the coin to land on heads 2 times. What is the probability that this happens after 2 coin flips? After 3? ... After 10?"
The statement above describes a Negative Binomial Trial. Let's examine it and see if we can find the parameters that we can use to describe the corresponding Negative Binomial Distribution!
$ r = Number\ of\ Failures = 2 $, since we're interested in seeing how long it takes to land on heads a total of two times.
$ x = Number\ of\ Trials $--this can be any number greater than 2. It cannot be smaller than 2, because it is mathematically impossible to satisfy our pre-set condition if the number of trials is smaller than our target (it's impossible to get 2 heads out a single coin flip).
$ p = Fair\ Coin = 0.5$, since a fair coin has a 50/50 chance of landing on either heads or tails.
The easiest way to think of this is that the distribution has a Fixed Number r and a Random Variable X. When we perform Negative Binomial Trials, we know how many failures we're looking for. This number is denoted as the parameter
If we know the parameters, we can calculate our Negative Binomial Probability by pulling them into the following formula:
$b(x, r, P) =\ {x-1}C{\ r-1} * P^{\ r} * (1-P)^{\ x-r} $
Don't worry if this looks pretty overwhelming. We'll break it down.
Let's start by recalling our parameters:
$ r = 2 $
$ x = 10 $
$ P = 0.5 $
You may also be wondering what ${x-1}C{\ r-1}$ is the equation. This is a mathematical notation that stands for
A Note on This Equation: This equation is used to calculate the Negative Binomial Probability, which is just the probability for a given value of
When working with discrete probabilities, it sometimes helps to think of the corresponding trials as a tree diagram. Let's examine all the possible ways that that three coin flips can work out:
We could use our parameters and describe our problem as: What is the negative binomial probability of
However, we could also phrase it in much more simple terms--what are the odds that we get our 2nd heads on the 3rd coin flip?
Logically, it follows that in order for heads to appear for the 2nd time on the 3rd coin flip, that means that heads must have appeared exactly once by the second trial. We can generalize this statement further to say that in order for us to hit
This brings us to that potentially scary equation we saw above. As we mentioned before, the Probability Mass Function for the Binomial Distribution hiding inside of that equation. That's one half of the equation above. The other half of the equation is just calculating the odds that we reach
This means that we can break the equation down into two separate parts:
-
The probability that we have
$r-1$ failures on trail$x-1$ . In the negative binomial probability equation, this is denoted by ${x-1}C{\ r-1}$. -
The probability that we get failure
$r$ on trial$x$ . This is denoted by$P^{\ r} * (1-P)^{\ x-r}$
Since these trials are all independent, we can simply calculate our Negative Binomial Probability by just multiplying the two, giving us our original equation of:
$$b(x, r, P) =\ {x-1}C{\ r-1} * P^{\ r} * (1-P)^{\ x-r} $$
If we use this formula to and plug in the parameter values for our sample problem above, we get the following distribution:
# Coin Flips | Probability |
---|---|
2 | 0.25 |
3 | 0.25 |
4 | 0.1875 |
5 | 0.125 |
6 | 0.0781 |
>= 7 | 0.1094 |
The mean of the Negative Binomial Distribution is:
The variance of the Negative Binomial Distribution is:
Thanks to the wonders of numpy, we can avoid scary functions and calculate Negative Binomial Probabilities with a single line of code. Consider the following example code from the numpy documenation for negative binomial sampling function:
A company drills wild-cat oil exploration wells, each with an estimated probability of success of 0.1. What is the probability of having one success for each successive well, that is what is the probability of a single success after drilling 5 wells, after 6 wells, etc.?
The following sample code is provided in the documentation to demonstrate how to solve this problem using the negative_binomial()
function from the numpy.random
module:
import numpy as np
s = np.random.negative_binomial(1, 0.1, 100000)
for i in range(1, 11):
probability = sum(s<i) / 1000000
print("{} wells drilled, probability of success: {:.4f}%".format(i, probability * 100))
1 wells drilled, probability of success: 0.9934%
2 wells drilled, probability of success: 1.9007%
3 wells drilled, probability of success: 2.7131%
4 wells drilled, probability of success: 3.4476%
5 wells drilled, probability of success: 4.1028%
6 wells drilled, probability of success: 4.7032%
7 wells drilled, probability of success: 5.2279%
8 wells drilled, probability of success: 5.7135%
9 wells drilled, probability of success: 6.1398%
10 wells drilled, probability of success: 6.5318%
In this lesson, we learned all about the Negative Binomial Distribution, as well as related concepts such as Negative Binomial Trials and Negative Binomial Probability.