CHAPTER 7 and 8 VOCAB Flashcards
What is the difference between the distribution of a sample and a sampling distribution?
A distribution of a sample is just a histogram of the DATA in a sample. A sampling distribution is made from an bunch of sample STATISTICS. It is the distribution of each statistic that was calculated from those many many samples.
Are models what really happens?
No. A model train is not a real train. We use models to say what kind of happens.
What does CLT say about the distribution of the population?
just that it doesn’t matter what it is.. With large samples.. The SAMPLING dist will be approx normal (dist of stats.. NOT DATA) for ANY SHAPED population distribution.
What are the mean and standard deviation of a sampling distribution for a proportion?
mean is p and sdandard deviation is root pq/n (look at formula sheet) N(p, root (pq/n) )
What does Central Limit Theorem Say?
It basically says.. NO MATTER WHAT SHAPE THE POPULATION IS (normal, bimodal, uniform, skewed, crazy.. ) If you make a histogram of a bunch of means taken from a bunch of samples, that histogram will be unimodal and symmetric WITH LARGE ENOUGH SAMPLES.. Close to normal. So A nerdy way to say it is: The sampling distribution of means is approximately normal no matter what the population is shaped like. The larger the sample size, the closer to normal. (the normal curve is just a model.. the sampling distribution is close to it, but not it! we use the model anyway!)
What is difference between population of interest and parameter of interest?
Population is the WHO (subjects you measure, beads, trees, people these are the population) Parameter is the actual number you want (like % of red beads, avg height of trees, or % brown eyes )
What does the CLT say about the distribution of actual sample data?
Nothing. The sample will be distributed similar to the population. The CLT only talks about samplING distributions, the distributions (histograms) of sample statistics, which are groups of means.., NOT OF INDIVIDUALS!!!! NOT DATA
N ( ?1 , ?2 ) what does this mean?
it means NORMAL model centered at ?1 With a standard deviation of ?2
Describe the distribution of a sample
It will look like the population. The distribution of a sample is a histogram made from the sample DATA, which will look kind of like the population. If the population is bimodal, then the distribution of the sample is bimodal. The larger the sample, the more it will look like the population. The SAMPLING distribution of a bunch of means, however, will look normalish.
Why is randomization a condition?
Because we understand randomness. We study it. We call it probability.
How do statistics from big samples compare to small? (notice this doesn’t ask about DATA)
statistics from larger samples have less variablility, so statistics from them are closer to the parameter and eachother. Statistics from smaller samples are more variable and more likely to be far away from true parameter.
Do parameters vary?
NO!!! There is only one. Statistics vary. they vary from sample to sample. PARAMETERS DO NOT VARY!
What are the conditions that have to be met in order to use a normal model for the distribution of sample proportions? (sampling distribution of proportions).. (the distribution of p-hats)..
- RANDOMIZATION: (this helps with assumption of independence
- SMALL ENOUGH SAMPLE: 10% condition (this is the upper limit of our sample size. above this, the sampling distribution starts looking leptokurtic (thinner and taller), not normal)
- LARGE ENOUGH SAMPLE. success/failure: np and nq > 10. this is the lower limit of our sample size. This is when the sampling distribution starts looking normal. FOR 2 SAMPLES YOU NEED BOTH SAMPLES TO MEET THESE REQUIREMENTS!
what is a statistic
some numerical summary of a sample.. Could be the mean of a sample, the standard deviation of a sample, the proportion of successes in a sample, the slope calculated from a sample, a difference of 2 means from 2 samples, a difference of 2 proportions from 2 samples, a difference of 2 slopes from 2 samples.. you can make sampling distributions for any of these, and they will all be centered around the parameter.
what is a parameter?
some numerical summary of a population. Often called “the parameter of interest.” It is what we are often trying to find.. It doesn’t vary. It is out there and STUCK at some value, it is the truth, and you’ll probably not ever know it! We try to catch them in our confidence intervals, but sometimes we don’t (and we don’t know it!). It Could be the mean of a population, the standard deviation of a population, the proportion of successes in a population, the slope calculated from a population, a difference of 2 means from 2 population, a difference of 2 proportions from 2 populations.
What is the Fundemental Theorem of Statistics? (just the name of it)
The CLT.
The Central Limit Theorem!
What is sampling variability?
same as sampling error.
The natural variation of sample statistics.. NOT DATA.. Samples vary and so do their statistics.. Parameters do not vary.
What is sampling error?
same as sampling variability..
The natural variability between STATISTICS.. NOT DATA!!! . We call it error EVEN THOUGH YOU MADE NO MISTAKES!!!
What is an unbiased estimator?
When the sampling distribution (pile of sample stats) is centered on the true population parameter.
what is a biased estimator?
When the sampling distribution (pile of sample stats) is NOT centered on the true population parameter. Like if you only weighed students in the men’s room to find average weight of all students. That would be a biased estimator. Or if you use the population SD (divide by n) formula when you have a sample. It will underestimate the true parameter. That’s why we divide by n-1.
What are the mean and standard deviation of a sampling distribution for a mean?
mean is mu and standard deviation is sigma/root n (look at formula sheet) N(mu, sigma/rootn)
What if you want more confidence in your interval?
get a bigger net. Increasing your confidence make interval wider (or you could increase sample size and keep the same net)
What is “statistically significant?”
When our observed statistic was so far from what we were expecting that we think something weird is going on. Wow factor. When you are like “WOW, that’s strange” When p-value is below the alpha, we say “statistically significant”.. Low p-values are statistically significant. When our sample most likely didn’t happen randomly, that is statistically significant.
when do you need crits?
in confidence intervals (and old fashioned hyp tests.. We look at Z to see if greater than crit.)