Stats for Data Science Flashcards

1
Q

Probability

A

This is useful to quantify uncertainty and describe random events and outcomes.

Probability allows us to measure uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Set Theory

A

Set theory is a branch of mathematics based around the concept of sets.
A set is a collection of things.

Notationally, mathematicians often represent sets with curly braces.

A = {1,2,3,4,5}

Sets follow two key rules:

  • Each element in a set is distinct.
  • The elements in a set are in no particular order.

A = {1,2,3,4,5}
B = {5,4,3,2,1}
A == B –> True

Sets can also contain subsets. Set A is a subset of set B if all the elements in A exist within B.
A = {1,2,3}
B = {1,2,3,4,5,6}
A is a subset of B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Law of large numbers

A

As the number of observations goes up, the proportion of times an event is observed will converge on its true probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Experiments and Sample Spaces

A

In probability, an experiment is something that produces observation(s) with some level of uncertainty.

A sample point is a single possible outcome of an experiment.

A sample space is the set of all possible sample points for an experiment.

Given an experiment where we flip a coin twice and record whether each flip results in heads or tails.
There are four sample points in this experiment:
two heads (HH),
tails and then heads (TH),
heads and then tails (HT),
two tails (TT).

We can write the full sample space for this experiment as follows:
S={HH,TT,HT,TH}

A specific outcome (or set of outcomes) is known as an event and is a subset of the sample space.

The frequentist definition of probability is as follows:
If we run an experiment an infinite amount of times, the probability of each event is the proportion of times it occurs. Infinite is not possible, but we can choose a large number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

P(Event)

A

Total # of occurrences/ total number of trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Set Theory - Union

A

The union of two sets encompasses any element that exists in either one or both of them.

Given sets A and B.
A represents rolling an odd number with a six-sided die (the set {1, 3, 5}).
B represents rolling a number greater than two (the set {3, 4, 5, 6}).

The union of these two sets would be everything in either set A, set B, or both: {1, 3, 4, 5, 6}.

We can write the union of two events mathematically as (A or B).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Set Theory - Intersection

A

The intersection of two sets encompasses any element that exists in both of the sets.

Given sets A and B
A represents rolling an odd number with a six-sided die (the set {1, 3, 5}).
B represents rolling a number greater than two (the set {3, 4, 5, 6}).

The intersection includes any value that appears in both sets: {3, 5}.

We can write the intersection of two events mathematically as (A and B).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Set Theory - Complement

A

The complement of a set consists of all possible outcomes outside of the set.

Given set A from the above example (rolling an odd number on a 6-sided die). The complement of this set would be rolling an even number: {2, 4, 6}. We can write the complement of set A as AC. One key feature of complements is that a set and its complement cover the entire sample space. In this die roll example, the set of even numbers and odd numbers would cover all possible rolls: {1, 2, 3, 4, 5, 6}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Independent vs. Dependent events

A

Two events are independent if the occurrence of one event does not affect the probability of the other (coin flip)

Two events are dependent if the occurrence of one event does affect the probability of the other (picking marbles out of a bag without replacement).

Dependent events are dealt with using conditional probability.

Two events are considered mutually exclusive if they cannot occur at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Probability Mass Function - PMF

A

Probability distribution that defines the probability of observing a particular value of a discrete random variable.

There are certain kinds of random variables (and associated probability distributions) that are relevant for many different kinds of problems. These commonly used probability distributions have names and parameters that make them adaptable for different situations.

For example, suppose that we flip a fair coin some number of times and count the number of heads. The probability mass function that describes the likelihood of each possible outcome (eg., 0 heads, 1 head, 2 heads, etc.) is called the binomial distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Probability Density Function

A

Similar to how discrete random variables relate to probability mass functions, continuous random variables relate to probability density functions.

They define the probability distributions of continuous random variables and span across all possible values that the given random variable can take on.

When graphed, a probability density function is a curve across all possible values the random variable can take on, and the total area under this curve adds up to 1.

In a probability density function, we cannot calculate the probability at a single point. This is because the area of the curve underneath a single point is always zero.

We can calculate the area under the curve using the cumulative distribution function (CDF) for the given probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Properties of Expectation

A
  1. The expected value of two independent random variables is the sum of each expected value separately
  2. Multiplying a random variable by a constant a changes the expected value to be a times the expected value of the random variable:

E(aX)=aE(X)

For example, the expected number of heads from 10 fair coin flips is 5. If we wanted to calculate the number of heads from this event run 4 times (40 total coin flips), the expected value would now be 4 times the original expected value, or 20.

  1. Adding a constant a to the distribution changes the expected value by the value a
    E(X+a)=E(X)+a
    EX = Grading on a curve - all grades + 2 points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Properties of Variance

A
  1. Increasing the values in a distribution by a constant a does not change the variance:
    Var(X + a) = Var(X)Var(X+a)=Var(X)

This is because the variance of a constant is 0 (there is no range for a single number).

2.Scaling the values of a random variable by a constant a scales the variance by the constant squared:
Var(aX) = a^2 Var(X)

3.The variance of the sum of two random variables is the sum of the individual variances:
Var(X+Y)=Var(X)+Var(Y)
This principle ONLY holds if the X and Y are independent random variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Biased and Unbiased Estimators

A

Because the mean of the sampling distribution of the mean is equal to the mean of the population, we call it an unbiased estimator. A statistic is called an unbiased estimator of a population parameter if the mean of the sampling distribution of the statistic is equal to the value of the statistic for the population.

The maximum is one example of a biased estimator, meaning that the mean of the sampling distribution of the maximum is not centered at the population maximum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ordinal Variables and Mean Value

A

The mean is not interpretable for ordinal categorical variables because the mean relies on the assumption of equal spacing between categories.

While translating categories to numbers is often necessary to store and use the order of the categories (for calculating a statistic like the median, which only relies on ordering, not spacing), we should not use those numbers to calculate statistics — such as the mean — for which the distance between values matters.

EX: Rate happiness from 1-5, is the distance between 1 and 2 the same?

Many other statistics we might normally use for numerical data rely on the mean. Because of this, these statistics aren’t appropriate for ordinal data. Remember that the standard deviation and variance both depend on the mean, without a mean, we can’t have a reliable standard deviation or variance either!

Instead, we can rely on other summary statistics, like the proportion of the data within a range, or percentiles/quantiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hypothesis Testing

A

a framework for asking questions about a dataset and answering them with probabilistic statements. There are many different kinds of hypothesis tests.

Step_1 - Ask a question
Step_2 - Define the Null and alternative hypothesis
Step_3 - Determine the null distribution
Step_4 - Calculate a P-Vale(Confidence Interval)
Step_5 - Interpret the Results

Researchers collect data and conduct statistical analysis to either reject or fail to reject the null hypothesis based on the evidence obtained. If the data provides strong evidence against the null hypothesis, it suggests that there is a significant effect or difference and supports the alternative hypothesis. However, if the data does not provide enough evidence to reject the null hypothesis, it implies that any observed differences or effects are likely due to random chance.

17
Q

Hypothesis Testing - Ask A question

A

You’re asking what the relationship is between two or more variables.

Ex:
If 100 students take a prep course and then score higher on an exam, what is the relationship between the score and the prep course. Could the higher score be random chance?

18
Q

Hypothesis Testing - Define Null and Alternative

A

Create two competing hypotheses about the population that your samples are coming from and that you’re trying to learn something about.

Null (H0)
The null hypothesis should state that the results of your sampling are due to random chance. Even if you get a mean in your sample that is different from the mean in the population, you can still just be looking at randomness and chance.

Alternative (H1 or Ha)
The alternative hypothesis, is a statement that contradicts the null hypothesis. It is typically formulated to reflect the research question or the anticipated outcome. It states the specific relationship that researchers expect to find and suggests that there is a meaningful and statistically significant relationship or difference between variables.

We are dealing with 2 imaginary populations. One where there is no relationship (H0) and one where there is a relationship (H1).

Which is the sample data reflecting?

19
Q

Hypothesis Testing - Determine Null Distribution

A

Now that we have our null hypothesis, we can generate a null distribution: the distribution (across repeated samples) of the statistic we are interested in if the null hypothesis is true. Statistical theory allows us to estimate the shape of this distribution using a single sample.

Simulate the null distribution by
Taking many random samples (equal n’s) from the population

Calculate and store the average score for each sample.

Plot a histogram of the average scores. (Because of the CLT, mean of the histogram should reflect the mean of the population)

If the null hypothesis is true, the sample values (even divergent ones) are just values coming from different parts of this null distribution.

So what are the chances we would see these values?

20
Q

Hypothesis Testing - Calculate P-value

A

Given that the null hypothesis is true, how likely is it that the sample scores we’re seeing are random chance.

The probability of any exact average score is very small, so we really want to estimate the probability of a range of scores.

Given a sample of 100 with a mean of 31.36
A null population mean of 29.92
The alternative hypothesis population mean is greater than the null population mean?

We can also ask if the population mean is different from the sample mean (2 tails). Or less than the sample mean (1 tail on the left.

21
Q

Hypothesis Testing - Interpret Results

A

Given a p-value of 0.031. The interpretation of this number is as follows:

If the 100 students at Statistics Academy were randomly selected from the full population (which had an average score of 29.92), there is a 3.1% chance of their average score being 31.16 points or higher.

This means that it is relatively unlikely, but not impossible, that the Statistics Academy students scored higher (on average) than their peers by random chance, despite no real difference at the population level.

The observed data is unlikely if the null hypothesis is true.

We have directly tested the null hypothesis, but not the alternative hypothesis!

We therefore need to be a little careful about how we interpret this test: we cannot say that we’ve proved that the alternative hypothesis is the truth — only that the data we collected would be unlikely under null hypothesis, and therefore we believe that the alternative hypothesis is more consistent with our observations.

22
Q

Error Types

A

Whenever we run a hypothesis test using a significance threshold, we expose ourselves to making two different kinds of mistakes:

type I errors (false positives)
Calculating a significant P-value when we shouldn’t, the error is a p-value that is falsely significant

When we run a hypothesis test with a significance threshold, the significance threshold is equal to the type I error (false positive) rate for the test.

type II errors (false negatives):
Calculating a non-significant p-value when we shouldn’t. The error is that the p-value is falsely insignificant.

Multiple significance tests increase the rate of false positives.
1 Test = 0.95
2 Tests = 0.95*0.95 = 0.925
etc. etc.