GithubHelp home page GithubHelp logo

dsc-1-09-06-the-probability-mass-function-ds-onboarding's Introduction

The Probability Mass Function (PMF)

Introduction

In this lesson we shall look at another way to represent a distribution - a probability mass function (pmf), which maps from each value to its probability. Recall that PMFs are exclusive to discrete data. We'll investigate a similar concept, the probability density function (PDF) in future lessons.

Objectives

You will be able to:

  • Develop a mathematical understanding of a discrete variable pmf
  • Develop an intuition for pmf while working with a toy example
  • Visualize the probability mass function and describe its relationship with histograms

What is pmf ?

A probability mass function (pmf), sometimes also called just a frequency function gives us probabilities for discrete random variables. We already know that discrete random variables from examples like coin flips and dice rolls etc. The discrete part in discrete distributions means that there is a known number of possible outcomes. For example, you can only roll a 1,2,3,4,5, or 6 on a die. Based on our observations of all the values from 1 to 6 in a number of dice rolls, we can develop a pmf for the dice showing the probability of each possible value occurring.

Here is a more formal understanding:

There is a probability that a discrete random variable X takes on a particular value x, so that P(X = x), denoted as f(x). The function f(x) is typically called the probability mass function, or pmf.

If X is a discrete random variable then its range RX is a countable set of all possible values of X. We can list the elements in $R_x$ as:

$R_x = { x1,x2,x3,... }$

where $x1,x2,x3 ...$ are the possible values of $x$. We are interested in quantifying the probability that x is equal to some given quantity k. That is, we want to know $P(k)$. In the case of our die, we might be interested in the probability of getting a 3 which would be $P(3) = 1/6$

Think of the event A , such that A = { X = xk } is defined as the set of outcomes s in the sample space S for which the corresponding value of X is equal to xk. This can be written as:

A = { s โˆˆ S | X(s) = xk }

The probabilities of events { X = xk } are formally shown by the probability mass function (pmf) of X.

Thus, the pmf is a probability measure that gives us probabilities of the possible values for a random variable.

PMF Intuition

At this stage, some of us may not be able to translate the mathematics above into an intuition. So let's work through a brief example calculating the pmf for a variable.

We have seen previously that probability is a number in the range [0,1] that is calculated as frequency expressed as a fraction of the sample size. So this would mean that in order to convert and random variable's frequency into a probability, we need to perform following steps:

  • Get the frequency of each possible value in the dataset
  • Divide the frequency of each value by the total number of values (length of dataset)
  • Get the probability of for each value

Let's show this using a simple toy example:

# Count the frequency of values in a given dataset

import collections
x = [1,1,1,1,2,2,2,2,3,3,4,5,5]
counter = collections.Counter(x)
print(counter)
print (len(x))
Counter({1: 4, 2: 4, 3: 2, 5: 2, 4: 1})
13

So now we have a dictionary that contains keys showing all possible items, and values of these keys set to the frequency of items. We can simply calculate pmf using step 2 above.

# Convert frequency to probability - divide each frequency value by total number of values
pmf = []
for key,val in counter.items():
    pmf.append(round(val/len(x), 2))
    
print(counter.keys(), pmf)
dict_keys([1, 2, 3, 4, 5]) [0.31, 0.31, 0.15, 0.08, 0.15]

The pmf is normalized so total probability is 1 as we can check by adding all the values in the pmf list above.

import numpy as np
np.array(pmf).sum()
1.0

Visualizing pmf

We can inspect pmf of a discrete variable by visualizing it in matplotlib. For this, we can use a simple bar graph to show the pmf using the probabilities calculated above. Here's the code:

import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

plt.stem(counter.keys(), pmf);
plt.title ("A Probability Mass Function");

png

This looks pretty familiar. Its essentially just a normalized histogram.

plt.hist(x);
plt.title('Histogram');

png

If you look carefully, the difference is only there in the y-axis. A histogram shows the frequency count of each value in a dataset, whereas a pmf, being a probability function shows the data as probabilities. And this is where we started off from. A pmf ingests frequencies of values and convert them into a function of their probability of occurance of those values.

NOTE: In some literature, the pmf is also called just the probability distribution. The phrase distribution function is usually reserved exclusively for the cumulative distribution function CDF.

Summary

In this lesson we looked at probability mass function, and how to get a list of probabilities for each possible value in a discrete random variable, by looking at their frequency of occurring. So we bassically moved from a purely frequenstist domain to a more probabilistic domain with these pmfs. Moving on, we'll see another couple of such probabilistic functions for discrete and continuous variables.

dsc-1-09-06-the-probability-mass-function-ds-onboarding's People

Contributors

loredirick avatar shakeelraja avatar mathymitchell avatar tkoar avatar

Watchers

Kevin Ernest Long avatar James Cloos avatar Kevin McAlear avatar  avatar Victoria Thevenot avatar Belinda Black avatar Soundarya avatar  avatar Joe Cardarelli avatar Sam Birk avatar Sara Tibbetts avatar The Learn Team avatar Sophie DeBenedetto avatar  avatar Jaichitra (JC) Balakrishnan avatar Antoin avatar Alex Griffith avatar  avatar Amanda D'Avria avatar  avatar Nicole Kroese  avatar  avatar  avatar Lisa Jiang avatar Vicki Aubin avatar Maxwell Benton avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.