Week 2: Normal Distribution, Inference, and Confidence Intervals Flashcards
What is a normal distribution?
A normal (or Gaussian) distribution is a continuous, bell-shaped, symmetric distribution where the mean, median, and mode coincide at the centre.
What are the main properties of a normal curve?
- Bell-shaped and symmetric
- Mean = median = mode
- Continuous: the variable can take any value
- Asymptotic: Tails approach but never touch the x-axis
- Total area under the curve = 1 (area in each half of the distribution is 0.5)
Describe the Empirical Rule for a normal distribution.
- 68% of data lie within 1 standard deviation (SD) of the mean
- 95% within 2 SDs
- 99.7% within 3 SDs
What is a standard normal distribution?
It’s a normal (or z) distribution with a mean of 0 and a standard deviation of 1. Its units are denoted by z scores (also called standard units/scores)
Define z-score and its purpose.
Tables for probabilities of z-scores can be used to find areas (i.e., probabilities) under the standard normal curve. In real world applications, data may be normally distributed but unlikely to have mean = 0 and SD = 1. We can standardise normal distributions to Z distributions. A z-score measures how many standard deviations a data point is from the mean. It allows comparison across different distributions by converting values to a standard scale. Z scores can be negative or positive.
What is the z-score formula?
z = x−μ / σ
Where x is a value from the dataset, μ is the mean, and σ is the standard deviation
What do positive and negative z-scores represent?
A positive z-score means the data point is above the mean, while a negative z-score indicates it is below the mean.
What does a z-score table provide?
It shows the probability (area under the curve) for a given z-score, useful for finding the proportion of data above, below, or between values in a normal distribution.
Explain inferential statistics.
Inferential statistics make educated guesses about a population parameter based on a sample statistic, allowing conclusions beyond the immediate data.
Differentiate between parameter and statistic.
- Parameter: A characteristic of a population (e.g., population mean μ).
- Statistic: A characteristic of a sample, used to estimate a parameter (e.g., sample mean).
What is sampling error?
Sampling error is the difference between a sample statistic and the actual population parameter, often due to chance variation in sampling.
Define confidence interval (CI) and confidence level.
A CI is an interval within which the population parameter is expected to fall, expressed with a specific level of confidence (e.g., 95%).
A confidence level refers to the probability that the parameter is within a certain range that includes our sample statistic.
How do you calculate a confidence interval?
For a mean, use x̅ ± z × SE where x̅ is the sample mean, z is the z-score for the confidence level, and SE is the standard error.
What is the Central Limit Theorem (CLT)?
The CLT states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population distribution shape. This holds true especially for sample sizes over 30 where the sample mean x̅ is approximately normally distributed.
Why is the Central Limit Theorem important?
It enables the use of normal probability for sample means, even if the population is not normally distributed, facilitating confidence interval and hypothesis testing. The mean of a sampling distribution of the mean is an unbiased estimate of the population mean.