Unit 8: Central Limit Theorem Flashcards
t score
mean is 50, sd is 10
z score (standard score)
mean is 0, sd is 1
types of Population
Target population
Study population
Target population
a large set of people/all the individuals in the group you are trying to draw conclusions (inferences) about
Study population
sampling frame (or sample pool)
Sample
- A set of individuals selected from the population
- Descriptive statistics
- Tells us about a set of data
- Example: sample mean
- Inferential statistics
Calculated from the sample data
and tells us about the underlying
population
statistic
is a number that describes some characteristic of a sample.
The value of a statistic can be computed directly from the sample data.
We often use a statistic to estimate an unknown parameter.
parameter
is a number that describes some characteristic of the
population. In statistical practice, the value of a p
statistics come from
samples
parameters come from
populations
Statistical inference
use samples to say something about an underlying population
- How many M&Ms are in a fun size bag?
- How many red M&Ms (proportion wise) in a bag?
Random sampling
the process of obtaining a random sample such that:
- All members have same probability of being chosen
- All chosen members are independent of one another
Sampling error
degree to which the sample differs from the population; due to chance
- How much does my sample of M&M fun size bags differ from the population’s?
- How much does my sample of red M&Ms differ from that of the population? (by chance)
Sampling distribution
Probability distribution of a given statistic (see below for examples) based on random
sampling
* Mean
* Median
* Standard deviation
* Correlation, etc.
We need to be able to describe the sampling distribution of the possible values of the
mean in order to perform statistical inference.
The Distribution of Sample Means
- Draw multiple random samples from a larger population
- Draw with replacement
- For each sample, we calculate a mean value
- We plot the mean on a frequency distribution
SRS
simple random sample
sample size
number of observations in the sample
* n = 100 means you have 100 observations
number of samples
number of observations
- 10 samples, each with sample size of 100
- How many observations are there in total?
The concept of sampling “with replacement” suggests that a person’s selection in one sample does NOT affect their probability of being selected in another sample
TRUE OR FALSE
true
Central Limit Theorem
Distribution of sample means will be normal, even if the underlying distribution (of the population) is not normal
* This happens because of the process of random sampling
helps us go from sample → population
* The sampling distribution of the sample statistic will be normal
* As sample size increases, the sampling error decreases (you become more precise with your estimate)
* You randomly select participants
* Theoretically, every one should have equal chance of being selected as a participant
Law of large numbers
as the sample size increases the sampling
error will decrease
- A larger sample drawn from the population will better emulate the characteristics of that population, thus reducing sampling error
Sampling Bias
- More systematic sampling error
- Have implications for your generalizability
Even with random sampling, you can still have sampling error - Sampling error → Quantified in standard error of mean
- Population parameter:
what you want to estimate
* Example: mean
Kurtosis
how extreme the tails are
* Often conflated with the “peak”:
* Sharper peak tend to have more outliers, heavier tails: higher degree
of kurtosis
* But kurtosis does not measure “peak
When I plot a distribution of sample means, the Central Limit
Theorem says that my distribution of sample means will only be normal if the underlying population is also normal.
A. True
B. False
False (doesn’t matter what the population looks like)
what does n=100 mean
you have 100 observations, but the number of samples and the number of observations is not equal
3 main distributions
population probability distribution (typically unknown)
sample frequency distribution (what we observe)
sampling distribution of the statistic (theoretical concept, we typically only use one sample but it needs infinite samples to make the curve)
can we collect enough samples to approximate the population parameter
no, but as n gets bigger the distribution becomes more normal and the dispersion decreases