This short lesson summarizes the topics we covered in section 08 and why they'll be important to you as a data scientist.
You will be able to:
- Understand and explain what was covered in this section
- Understand and explain why this section will help you become a data scientist
In this section, wee dug into a number of foundational concepts - from NumPy to the basics of Probability
- Under the hood, Pandas relies on NumPy for computationally efficient processing of large data sets
- In addition to providing a base for Pandas, NumPy has many useful features built right in - including the ability to perform random sampling
- A scalar is a quantity that can be fully described by a magnitude (a single number). A vector can only fully be described by multiple numbers - e.g. a magnitude and a direction
- NumPy supports a range of powerful Scalar and Vector mathematical operations
- Probability is "how likely" it is that an event will happen
- Sets in Python are unordered collections of unique elements
- The inclusion exclusion principle is a counting technique to calculate the number of elements in a collection of sets with overlapping elements
- The "sum rule" of probability states that
$P(A\cup B) = P(A) + P(B) - P(A \cap B) $ - Factorials provide the basis for calculating permutations
- The difference between permutations and combinations is that with combinations, order is not important
- The Bernoulli distribution can be used to describe a single, binary event
- The probability of n-independent Bernoulli events can be described by a binomial distribution
In this section, we introduced the binomial distribution. In the next section, we'll look at a number of other types of distributions and how they relate to data science.