WINTER BREAK REVIEW VOCAB Flashcards
What points have leverage in regression?
Those far to the left and right from x-bar
What points are outliers in regression?
Those that don’t follow the flow.
What points have influence in regression?
Those that would change the slope if removed (they are outliers that have leverage)
Interpret r^2 ?
The percent of variablility in Y explained by the model with X
Interpret y-intercept?
When X=0, the model predicts this much Y.
Interpret SLOPE
For every 1 unit of x, there is a change of SLOPE units of y
Find P( Z > 1.5) ?
normcdf( 1.5, 9999)
Describe independence and association with quantitative examples.
Height and IQ are independent. Height and weight are associated.
Describe independence and association with categorical examples.
Grade and pizzsa preference are independent, gender and gaming status are associated
function to find area under normal curve?
normcdf
function to find a percentile in normal model?
INVNORM
What are the measures of spread we use?
standard deviation, variance, range, interquartile range, standard error
What are the measures of center we use?
mean, median, mode
How do you describe the distribution of a single data set? (a histogram)?
SHAPE (#modes, skewness), CENTER (measure of center), SPREAD (measure of spread), STRANGE (outliers or gaps)
How do you describe an association between two quantitative variables? (scatter plot)
DIRECTION (pos/neg) FORM (linear,curved) STRENGTH (strong, moderate, report “r” value)
What does rSy/Sx mean?
slope formula. For each SD in X, you go r SD in y
What does SD of residuals tell us?q
Average distance to the model. About how far off we expect model to be.
What graphs for QUANTITATIVE data?
histogram, box/whisker, stemplot, dot plot, ogive, time plot, line graph
What graphs for CATEGORICAL data?
segmented bar, bar, pie, mosaic
Diff between standard deviation and standard error?
Standard deviation is typical distance to mean for a data point, Standard error is typical distance to parameter for a statistic in a sampling distribution.
What is variance?
A measure of spread- the average squared distance to the mean. SD^2
What is a Z score?
the number of SD a data value is away from the mean
What is a test statistic?
The number of SE a statistic is away from the hypothesized parameter.
What is formula for nCr ?
n! / r! (n-r) !
What is margin of error?
Distance you reach up and down when making CI. It is CRIT * SE
What is error?
Distance from a statistic to the parameter. How far off your stat is from the truth.
What is a confidence interval?
A parameter catcher. It tries to catch the truth.
What does “95% confident” mean?
If you took 100 samples and made 100 confidence intervals, about 95 would contain the parameter and about 5 would not.
What is alpha?
It is the rejection threshold. Reject Ho when p-value is below alpha.
What is a p-value?
The likelihood you obtained your statistic or one more extreme due to just chance if the Null was actually true.
Suppose p value = 0.003. How would you interpret?
With a p-value this low (0.003 < 0.05), I reject the Ho, there is enough evidence to say [Ha in context]
What are the the sample size requirements for inference for both means and proportions?
- You need a random sample. 2. (not too big) Less than 10% of population 10n30. For props, np>10 and nq>10.
Minimum sample size for means?
If population is normalish, then there is no minimum sample size. If it is skewed or bimodal or any other non-normal distribution, then n>30.
Minimum sample size for proportions?
You need at lease 10 successes, np>10, and 10 failures, nq > 10
What is the golden sentence?
I was curious about a population paramter, but a census was too costly so instead I took a sample and used the data to calculate a statistic and then made an inference about the parameter with that statistic.
What is probability?
Long run relative frequency. (the long run percent)
What is the Law of Large Numbers?
In the long run, after many many trials, the % of successes approaches the true probability. Think: if you flip a coin twice, you may get 0% heads, 50% heads or 100% heads. If you flip 10,000 times, you probably will have about 50% heads (def not 0 or 100)
What is a critical value?
1 for 68% confidence, 2 for 95 and 3 for 99.7. It is the number of SE you want to reach out in a confidence interval.
Where are outliers located in a data set ?
outside the fences. Lower Q1-1.5IQR and upper Q3+1.5IQR
What is a sampling distribution?
A pile of statistics taken from many many many samples
What are the two sampling distributions we have discussed?
MEANS: N ( mu, sigma/root n) and PROPORTIONS: N ( p, root (pq/n) )
When we combine random variables, what do we add?
Add means and add variances. DO NOT ADD ST DEV. You add variances and take the square root of the sum to find combined SD.