Statistics with SAS on Coursera: Week 4 Flashcards
Study materials associated with Week 4 of the Statistics with SAS course offered on Coursera.
What are the assumptions when performing a regression analysis?
1) A linear relationship fits the data adequately
2) The errors are normally distributed with a mean of 0
3) The errors have equal variance at each value of the predictor value
4) The errors are independent
What should the scatter plot look like when testing that a linear model fits the data adequately?
The data should hover around the regression line.
Check for possible outliers influencing the slope of the line as well as non-linear patterns in the data such as curvilinear relationships or autocorrelated data common in time series data.
What is a residual?
A residual is the the difference between each observed value of Y and its predicted value.
How can you check for violations of equal variances, linearity, and independence?
Examine the shape of the scatter in the residuals versus predicted values chart.
You want to see a random scatter of the residual values above and below the reference line at 0.
No patterns should be visible in the residuals.
This indicates that the model assumptions are valid.
What does it mean when you see pattern or trends in the residual values?
The linear regression models are not valid. The model might have problems.
What does the Q-Q plot look like when the residuals are normally distributed?
The Q-Q plot should appear to be a straight, diagonal line if the residuals are normally distributed,
Given the properties of the standard normal distribution, between which two values would approximately 95% of the studentized residuals fall?
-2 and 2
If we think of these STUDENT residuals as following the standard normal distribution and apply the 68/95/99% rule, we would expect 5% of them to fall beyond the -2, +2 limits, by chance.
What’s the difference between an outlier and an influential observation?
An outlier is an unusual data point, whereas an influential observation is an unusual data point that singlehandedly exerts influence on the regression model.
What might parts of your model might be affected by influential observations?
Influential observations could affect the model coefficients, the standard errors, or the predicted values.
For example, if deleting an observation results in a large change in parameter estimates, then that observation has a significant influence on the parameters.
If deleting an observation results in a change in the standard errors, then the observation influences the precision of the parameters.
Which diagnostic statistic may be used to detect outliers?
The STUDENT residuals (also known as studentized or standardized residuals)
What diagnostic statistics may be used to detect influential observations?
Cook’s D statistics, RSTUDENT residuals, and DFFITS statistics.
What diagnostic statistic do you use to determine which predictor variable is being influenced?
DFBETAS (difference in betas)
What does it mean if the RSTUDENT value differs from the STUDENT residual?
The observation is probably influential
Which statistic is most useful for identifying influential observations for explanatory models when the purpose of your model is parameter estimation?
Cook’s D
What does Cook’s D measure?
The Cook’s D statistic measures the distance between the set of parameter estimates with that observation deleted from your regression analysis, and the set of parameter estimates with all the observations in your regression analysis.