UNIT 5, 6, 7 part A Flashcards
What is a test statistic?
It is a z score- It is the number of SE your statistic is away from the null parameter.
what is a p-value
At the end of a hypothesis test, it is the likelihood of getting your results if the null was true.
normcdf ( test stat, 999 )
notation: what is z*
critical z, how many SE you are reaching up and down in a confidence interval for proportions
notation: what is Ho
The NULL, the dull, the “things haven’t changed” hypothesis
notation: What is Ha
The alternative. This is what you are trying to prove.
What is the difference between the distribution of a sample and a sampling distribution?
A distribution of a sample is just a histogram of the DATA in a sample. A sampling distribution is made from an bunch of sample STATISTICS. It is the distribution of the statistic that was calculated from those many many samples.
What is a sampLING distribution?
a pile of statistics. A pile of p-hats or x-bars.
Are models what really happen?
No. A model train is not a real train. We use models to say what kind of happens.
If the null parameter is in your confidence interval, can you reject it?
No. It is still plausible.
What is “statistically significant?”
When our sample statistic is so far away from what we were expecting that we don’t think that it was due to random sampling error. Then is statistically significant. When p-value is below the alpha, we say “statistically significant”.. Low p-values are statistically significant.
What is the difference between standard error and standard deviation?
Standard error is the typical distance a STATISTIC is from the mean in a sampling distribution (pile of a bunch of sample’s statistics) and Standard DEVIATION is the typical distance a DATUM is from the mean in a pile of raw data.
What does CLT say about the distribution of the population?
Not much… just that it doesn’t matter what it is.. With large samples.. The SAMPLING dist will be approx normal (dist of stats.. NOT DATA)
What are the mean and standard deviation of a sampling distribution for a proportion?
mean is p and sdandard deviation is root pq/n (look at formula sheet) N(p, root (pq/n) )
What does Central Limit Theorem Say?
It basically says.. NO MATTER WHAT SHAPE THE POPULATION IS (normal, bimodal, uniform, skewed, crazy.. ) If you make a histogram of a bunch of means taken from a bunch of samples, that histogram will be unimodal and symmetric WITH LARGE ENOUGH SAMPLES.. Close to normal. So.. A nerdy way to say it is: The sampling distribution of means is approximately normal no matter what the population is shaped like. The larger the sample size, the closer to normal. (the normal curve is just a model.. the sampling distribution is close to it, but not it! we use the model anyway!)
What is difference between population of interest and parameter of interest?
Population is the WHO (subjects you measure, beads people) Parameter is the actual number you want (like % of or AVG)
What happens to a pile of statistics if you take larger samples?
All of the x-bars or all of the p-hats will get closer to each other, and closer to the parameter ( mu or p). There is less variability in the sampling distribution (in the pile of stats).
What does the CLT say about the distribution of actual sample data?
Nothing. The sample will be distributed similar to the population. Bimodal populations have bimodal samples. The CLT only talks about distributions (histograms) of sample statistics, of summaries, which are groups of means.., NOT OF INDIVIDUALS!!!! NOT DATA
N (12, 22 )
What does this mean?
it means NORMAL models centered at 12 With a standard deviation of 22
Consider the distribution of a sample compared to the distribution of the population.
It will look like the population. The distribution of a sample is a histogram made from the sample, which will look kind of like the population. If the population is bimodal, then the distribution of the sample is bimodal. The SAMPLING distribution of a bunch of means, however, will look normalish.
What is a standard error?
The typical, or expected, error. It is how far off you are expecting your statistic to be from the parameter. It is calculated like the standard deviation, but we are using sample statistics.. We don’t know the true parameters, so we estimate with statistics adding error to our calculation
How do statistics from a bunch of big samples compare to statistics from a bunch of small samples? (notice this doesn’t ask about DATA)
Larger sample statistics have less variablility, so statistics from larger samples are closer to eachother and to the parameter. Statistics from smaller samples are more spread out, further away from true parameter.
What is statistical inference?
Using a statistic to infer something about a parameter.. Basically, using a sample to say something about a population.
what is a statistic?
some numerical summary of a sample.. Could be the mean of a sample, the standard deviation of a sample, the proportion of successes in a sample, the slope calculated from a sample, a difference of 2 means from 2 samples, a difference of 2 proportions from 2 samples, a difference of 2 slopes from 2 samples.. you can make sampling distributions for any of these, and they will all be centered around the parameter…
what is a parameter?
some numerical summary of a population. Often called “the parameter of interest.” It is what we are often trying to find.. It doesn’t vary. It is out there and STUCK at some value, it is the truth, and you’ll probably not ever know it! We try to catch them in our confidence intervals, but sometimes we don’t (and we don’t know it!). It Could be the mean of a population, the standard deviation of a population, the proportion of successes in a population, the slope calculated from a population, a difference of 2 means from 2 population, a difference of 2 proportions from population
What is the Fundemental Theorem of Statistics?
The CLT!! The Central Limit Theorem!
Piles of x-bars are approximately normal even if the population is skewed or bimodal when n>30!
What is sampling variability?
same as sampling error. The natural variation of sample statistics.. NOT DATA.. Samples vary. so do their statistics.. Parameters do not vary!
What is sampling error?
same as sampling variability.. The natural variability between STATISTICS.. NOT DATA!!! . We call it error EVEN THOUGH YOU MADE NO MISTAKES!!!
What is a point estimate?
Your statistic. It is the best “one - number” guess at the parameter.
What is a biased estimator? Give an example
When the pile of statistics is not centered on the true parameter (p-hats not centered at p, or x-bars not centered at mu).
If you are looking for the average time high school students can hold their breath, and you only take samples with kids from swim teams, your pile of x-bars will be centered higher than the true mu because swimmers can hold their breath longer. Biased sampling methods give biased estimators.