week 5 & 6 -- Sampling distributions, Standard Error, & CLT Flashcards
Ultimate goal of our quest?
To find IMPROBABLE data that supports our hypothesis
Population distribution
The ditribution of a value in our population of interest
Sample distribution
The distribution of that value within the sample we took
Sample statistic
A statistic we calculate from our sample
Sampling distribution
The theoretical distribution of a sample statistic if we take many more random samples of the same size
simulation of all the distribution of all proportions from all possible samples
NOT distribution of sample (display of actual data collected) but a display of theoretical summary statistics (like p-hat) for many different samples
IMAGINE the results from all the random samples we didn’t take
We SEE only the sample that we actually drew, but by simulating or modeling, we can IMAGINE what we might have seen had we drawn other possible random samples
Proportion, p
p-hat = observed proportion in a sample
PARAMETER!
(not Greek because symbol would be pi, and pi )
the fraction of the total that possesses a certain attribute
proption x 100 = percentage
q
q-hat
the fraction of the total that DOESN’T possess a certain attribute
q = 1 - p
Proportions come with a freebie!
Once we know the mean p, we automatically know the standard deviation
–> as long as n is “large enough”, we can model the distribution of the sample proportions with a Normal model centered at p with a standard deviation of squre of pq/n
(ch 18, page 414)
letter-hat
indicates that the hatted letter – the observed proportion in our data – is our ESTIMATE of the parameter letter (no hat is the probability of having the attribute according to our model)
theoretical gist of the Normal model
If we draw repeated random samples of the same size, n, from some poupalation and measure the proportion, p-hat, we see in each sample, then the collection of these proportions will pile up around the underlying population proportion, p, and that a historgam of the sample proportions can be modeled well by a Normal model
Central Limit Theorem
If we take repeated samples of a certain siez and plot the distribution of their means, then for larger and larger samples
- extreme means become rare
middling means become more common (means close to the true population mean)
the distribution of the means becomes more like the Normal distribution
Central Limit theorem assumptions (Laplace)
The sampling distribution of ANY mean becomes more nearly Normal as the sample size grows
We don’t even care about the shape of the population distribution
(this is unintuitve, surprising and weird)
- samples must be large
- observations in sample must be independent
population distribution must have a well-defined center and spread
standard error
Sample –> standard deviation
sampling distribution –> standard error
SE (ȳ) = SD (y) / √n,
Normal, log-Normal, Exponential
week 6 ????
mean is only a good summary for Normal (not for other too), parameters are different
Power Law
When most observations have small impact but some rare, high-impact observations occur (mean is NOT a good summary)
Sampling Distribution model for a proportion – goal
an attempt to show the distribution from ALL the random samples
think of the sample proportion as a random variable taking on a different value in each random sample
then we can say something about the distribution of those values – this is the fundamental insight about statistics! Sampling models are what makes Statistics work! they inform us about the amount of variation we should expect when we sample
The sampling model quantifies the variability, tell us how surprising any sample proportion is.
Sampling distribution models act as a bridge from the real world of data to the imaginary model wof the statistic
This is the huge leap of statistics: these models allow us to say something about the ENTIRE population when all we have is data from the REAL WORLD SAMPLE
Our data is just a variable – any given value is just one of many we might have seen had we chosen a different random sample
centering
for proportions: sampling distribution is centere at the population proportion
for means: centered at population mean
remember the difference between the real world of data and a magical mathematical model world
real: we draw random samples of data (HISTOGRAM)
magic: we describe how the sample means and proportion behave as random variables in all the random samples we might have drawn. ((SAMP. DISTRIB MODEL, Normal based on CLT)