week 5 & 6 -- Sampling distributions, Standard Error, & CLT Flashcards
Ultimate goal of our quest?
To find IMPROBABLE data that supports our hypothesis
Population distribution
The ditribution of a value in our population of interest
Sample distribution
The distribution of that value within the sample we took
Sample statistic
A statistic we calculate from our sample
Sampling distribution
The theoretical distribution of a sample statistic if we take many more random samples of the same size
simulation of all the distribution of all proportions from all possible samples
NOT distribution of sample (display of actual data collected) but a display of theoretical summary statistics (like p-hat) for many different samples
IMAGINE the results from all the random samples we didn’t take
We SEE only the sample that we actually drew, but by simulating or modeling, we can IMAGINE what we might have seen had we drawn other possible random samples
Proportion, p
p-hat = observed proportion in a sample
PARAMETER!
(not Greek because symbol would be pi, and pi )
the fraction of the total that possesses a certain attribute
proption x 100 = percentage
q
q-hat
the fraction of the total that DOESN’T possess a certain attribute
q = 1 - p
Proportions come with a freebie!
Once we know the mean p, we automatically know the standard deviation
–> as long as n is “large enough”, we can model the distribution of the sample proportions with a Normal model centered at p with a standard deviation of squre of pq/n
(ch 18, page 414)
letter-hat
indicates that the hatted letter – the observed proportion in our data – is our ESTIMATE of the parameter letter (no hat is the probability of having the attribute according to our model)
theoretical gist of the Normal model
If we draw repeated random samples of the same size, n, from some poupalation and measure the proportion, p-hat, we see in each sample, then the collection of these proportions will pile up around the underlying population proportion, p, and that a historgam of the sample proportions can be modeled well by a Normal model
Central Limit Theorem
If we take repeated samples of a certain siez and plot the distribution of their means, then for larger and larger samples
- extreme means become rare
middling means become more common (means close to the true population mean)
the distribution of the means becomes more like the Normal distribution
Central Limit theorem assumptions (Laplace)
The sampling distribution of ANY mean becomes more nearly Normal as the sample size grows
We don’t even care about the shape of the population distribution
(this is unintuitve, surprising and weird)
- samples must be large
- observations in sample must be independent
population distribution must have a well-defined center and spread
standard error
Sample –> standard deviation
sampling distribution –> standard error
SE (ȳ) = SD (y) / √n,
Normal, log-Normal, Exponential
week 6 ????
mean is only a good summary for Normal (not for other too), parameters are different