Basics of Stats Flashcards
What are the steps to calculate the standard deviation?
Characteristics of Normal Distributions
- Symmetric around the mean
- The mean, median, and mode are identical
- Denser in the center, less dense in the tails
- Defined by the mean and the standard deviation
- 68% of the distribution falls within ≈ 1 standard deviation of the
mean - 95% of the distribution falls within ≈ 2 standard deviations of the
mean - Many distributions in the real world approximate normal distributions
Formula for Standardisation
Law of Large Numbers
The Law of Large Numbers states that the average of the results
obtained from a large number of independent random samples (x¯)
converges to the true value (µ).
The observations in each sample have to be independently and
identically distributed (i.i.d.), meaning each observation is equally likely
to be sampled and sampling one observation does not make it more
likely that another specific observation will be sampled → random
sample
Central Limit Theorem
The Central Limit Theorem states that given a population with mean µ
and variance σ
2
, the sampling distribution of the mean (distribution of
many random samples) approaches a normal distribution with a mean
of µ and a variance of σ
2
n
, as the sample size n increases.
The Central Limit Theorem implies:
Confidence Intervals
We can construct a set of values that contains the true population
mean µ with a certain probability (the percentage of times that the set
of values would contain the true population mean if we repeated the
sampling process indefinitely).
We specify the probability that this set of values contains the true
population mean in advance (e.g. 80% or 95%). This prespecified
probability is called the confidence level.
The set of values we construct based on the confidence level is called a
confidence interval.
How to Calc Confidence Interval
To calculate a confidence interval using standard error, you can add and subtract a multiple of the standard error (SE) from the sample mean. The multiplier depends on the desired confidence level:
95% confidence interval: To calculate a 95% confidence interval, you can use the formula: ˉy−1.96×se(y);ˉy+1.96×se(y). This means that the interval is equal to the mean plus or minus 1.96 times the standard error.
68% confidence interval: To calculate a 68% confidence interval, you can use the multiplier 1.
99% confidence interval: To calculate a 99% confidence interval, you can use a multiplier larger than 2.
Finding Crititcal Values
Once you’ve chosen what confidence level you want to use, you
determine the corresponding critical value (zα/2) — the number by which
you multiply the standard error to get the appropriate margin of error.
α refers to the significance level, which is just 1 − the confidence level
(e.g. a 90% confidence level corresponds to a significance level of
α = 0.1)
We often use the critical values corresponding to α/2 instead of α to
account for the two extremes or tails of the distribution
What is a t distribution
When the population standard deviation is unknown, it’s therefore more
accurate to use the t distribution, a slight variant on the standard
normal distribution.
Confidence Intervals using t distribution
Hypotheses
Testing the Null Hypothesis
Using inferential statistics, we calculate a p-value (or probability value).
The p-value indicates the probability that we observe a certain outcome
or statistic in the data assuming that the null hypothesis is true.
If the p-value is low, that means there is not much chance we would
observe the outcome or statistic we see in the data if the null
hypothesis were true. That suggests to us that the null hypothesis
might not be true.
→ if the p-value is low enough, we reject the null hypothesis
Steps for Testing the Null Hypothesis
Specify the null hypothesis
* Specify the significance level α (usually 0.05 or 0.01)
* Compute the p-value
* Compare the p-value with α. Reject the null hypothesis if the p-value
is lower than α
Z score vs T stat for p value