Stats for Data Science Flashcards
Probability
This is useful to quantify uncertainty and describe random events and outcomes.
Probability allows us to measure uncertainty.
Set Theory
Set theory is a branch of mathematics based around the concept of sets.
A set is a collection of things.
Notationally, mathematicians often represent sets with curly braces.
A = {1,2,3,4,5}
Sets follow two key rules:
- Each element in a set is distinct.
- The elements in a set are in no particular order.
A = {1,2,3,4,5}
B = {5,4,3,2,1}
A == B –> True
Sets can also contain subsets. Set A is a subset of set B if all the elements in A exist within B.
A = {1,2,3}
B = {1,2,3,4,5,6}
A is a subset of B
Law of large numbers
As the number of observations goes up, the proportion of times an event is observed will converge on its true probability.
Experiments and Sample Spaces
In probability, an experiment is something that produces observation(s) with some level of uncertainty.
A sample point is a single possible outcome of an experiment.
A sample space is the set of all possible sample points for an experiment.
Given an experiment where we flip a coin twice and record whether each flip results in heads or tails.
There are four sample points in this experiment:
two heads (HH),
tails and then heads (TH),
heads and then tails (HT),
two tails (TT).
We can write the full sample space for this experiment as follows:
S={HH,TT,HT,TH}
A specific outcome (or set of outcomes) is known as an event and is a subset of the sample space.
The frequentist definition of probability is as follows:
If we run an experiment an infinite amount of times, the probability of each event is the proportion of times it occurs. Infinite is not possible, but we can choose a large number
P(Event)
Total # of occurrences/ total number of trials
Set Theory - Union
The union of two sets encompasses any element that exists in either one or both of them.
Given sets A and B.
A represents rolling an odd number with a six-sided die (the set {1, 3, 5}).
B represents rolling a number greater than two (the set {3, 4, 5, 6}).
The union of these two sets would be everything in either set A, set B, or both: {1, 3, 4, 5, 6}.
We can write the union of two events mathematically as (A or B).
Set Theory - Intersection
The intersection of two sets encompasses any element that exists in both of the sets.
Given sets A and B
A represents rolling an odd number with a six-sided die (the set {1, 3, 5}).
B represents rolling a number greater than two (the set {3, 4, 5, 6}).
The intersection includes any value that appears in both sets: {3, 5}.
We can write the intersection of two events mathematically as (A and B).
Set Theory - Complement
The complement of a set consists of all possible outcomes outside of the set.
Given set A from the above example (rolling an odd number on a 6-sided die). The complement of this set would be rolling an even number: {2, 4, 6}. We can write the complement of set A as AC. One key feature of complements is that a set and its complement cover the entire sample space. In this die roll example, the set of even numbers and odd numbers would cover all possible rolls: {1, 2, 3, 4, 5, 6}.
Independent vs. Dependent events
Two events are independent if the occurrence of one event does not affect the probability of the other (coin flip)
Two events are dependent if the occurrence of one event does affect the probability of the other (picking marbles out of a bag without replacement).
Dependent events are dealt with using conditional probability.
Two events are considered mutually exclusive if they cannot occur at the same time.
Probability Mass Function - PMF
Probability distribution that defines the probability of observing a particular value of a discrete random variable.
There are certain kinds of random variables (and associated probability distributions) that are relevant for many different kinds of problems. These commonly used probability distributions have names and parameters that make them adaptable for different situations.
For example, suppose that we flip a fair coin some number of times and count the number of heads. The probability mass function that describes the likelihood of each possible outcome (eg., 0 heads, 1 head, 2 heads, etc.) is called the binomial distribution.
Probability Density Function
Similar to how discrete random variables relate to probability mass functions, continuous random variables relate to probability density functions.
They define the probability distributions of continuous random variables and span across all possible values that the given random variable can take on.
When graphed, a probability density function is a curve across all possible values the random variable can take on, and the total area under this curve adds up to 1.
In a probability density function, we cannot calculate the probability at a single point. This is because the area of the curve underneath a single point is always zero.
We can calculate the area under the curve using the cumulative distribution function (CDF) for the given probability distribution.
Properties of Expectation
- The expected value of two independent random variables is the sum of each expected value separately
- Multiplying a random variable by a constant a changes the expected value to be a times the expected value of the random variable:
E(aX)=aE(X)
For example, the expected number of heads from 10 fair coin flips is 5. If we wanted to calculate the number of heads from this event run 4 times (40 total coin flips), the expected value would now be 4 times the original expected value, or 20.
- Adding a constant a to the distribution changes the expected value by the value a
E(X+a)=E(X)+a
EX = Grading on a curve - all grades + 2 points
Properties of Variance
- Increasing the values in a distribution by a constant a does not change the variance:
Var(X + a) = Var(X)Var(X+a)=Var(X)
This is because the variance of a constant is 0 (there is no range for a single number).
2.Scaling the values of a random variable by a constant a scales the variance by the constant squared:
Var(aX) = a^2 Var(X)
3.The variance of the sum of two random variables is the sum of the individual variances:
Var(X+Y)=Var(X)+Var(Y)
This principle ONLY holds if the X and Y are independent random variables.
Biased and Unbiased Estimators
Because the mean of the sampling distribution of the mean is equal to the mean of the population, we call it an unbiased estimator. A statistic is called an unbiased estimator of a population parameter if the mean of the sampling distribution of the statistic is equal to the value of the statistic for the population.
The maximum is one example of a biased estimator, meaning that the mean of the sampling distribution of the maximum is not centered at the population maximum.
Ordinal Variables and Mean Value
The mean is not interpretable for ordinal categorical variables because the mean relies on the assumption of equal spacing between categories.
While translating categories to numbers is often necessary to store and use the order of the categories (for calculating a statistic like the median, which only relies on ordering, not spacing), we should not use those numbers to calculate statistics — such as the mean — for which the distance between values matters.
EX: Rate happiness from 1-5, is the distance between 1 and 2 the same?
Many other statistics we might normally use for numerical data rely on the mean. Because of this, these statistics aren’t appropriate for ordinal data. Remember that the standard deviation and variance both depend on the mean, without a mean, we can’t have a reliable standard deviation or variance either!
Instead, we can rely on other summary statistics, like the proportion of the data within a range, or percentiles/quantiles.