Regression The Linear Model Flashcards
What does a linear model with several predictors look like on a graph?
A 3d regression plane
SSr
Residual Sum of Squares. How well a linear model fits the data.
What is Cross validation of linear regression model
Ensures model accurately predicts samw outcome in a different group of people.
Methods of cross validation
Adjusted R squared
Steins method
data splitting
What does adjusted R squared do?
Tells how much variance in Y would be accounted for if the model had been derived from population.
What does Steins formula do?
Tells how well model cross validates
2 oversimplified common rules of thumb for sample size when using linear model?
10-15 cases per predictor
What is a good method of deciding desired sample size?
Desired effect size
Amount of power wanted for statistical significance
Size of sample for large effect
77 participants with up to 20 predictors
If medium effect expected use sample size of
55-150 (20 predictors)
If small effect expected use sample size of
1043 cases with 20 predictors
3 main stages in fitting a linear model
Initial data checks
Run initial regression
Check residuals
4 Steps in initial checks when fitting linear model
Check linearity and unusual cases
Graphs: scatter plots
If lack of linearity
Transform data
Fitting linear regression model: run initial regression
Save diagnostic statistics
Fitting linear regression model : check residuals
Use zpred and zresid graphs to check 3 things Linearity Homodasticity Independence Check normality with histogram
Fit general linear model: If glm assumptions met and no bias
Model can be generalized
Fit general linear model: If heteroscedasticity found.?use either
Weighted least squares regression
OR
Bootstrap and transform data
Fit general linear model: If no normality
Bootstrap and transform
OR
Use a multi level model
Fit general linear model: If data lacks independence
Use a multi level model
Glm : multicollinearity defn
Strong corellation between 2+ predictor variables
Is less than perfect collinearity avoidable?
No. It is virtually unavoidable
What does perfect collinearity mean?
One predictor variable has a perfect linear correlation
Glm: multicollinearity and untrustworthy b’s
As correlation between two predictor variables increases the standard error of b’s increases. This increases the chance b is unrepresentative of the population
Compare glm with one predictor: what does large value of r squared mean
Better fit of model
Compare glm with one predictor: test Stat for assessing significance of r sqaured
F statistic
How much variability model explains relative to what isn’t explained.
What is b value?
Gradient of regression line. Strength of relationship.
What does multicollinearity do to r
Limits the size
Why does multicollinearity limit the size of r
The correlation of two or more predictor variables mean they account for same variance portions. Give less unique variance to r squared.
Why multicollinearity is a problem
It makes it difficult to assess the importance of each predictor variable
Two steps for Identifying multicollinearity
Correlation matrix
Variance inflation factor
Identify multicollinearity by scan correlation matrix
Find highly correlated predictor variables.
R>=. 8 or. 9
Identifying multicollinearity: variance inflation factor indicates
If predictor variable has strong linear corellation with another predictor variable
Identifying multicollinearity : interpretation of variance inflation factor
Largest vif = 10 tolerance =. 10 serious problem indicated
Average vif substantially greater than 1 regression may be biased
Tolerance
What is an eigenvalue
The length of the line that goes from each side of an eclipse drawn around a scatter plot.
What is an eigenvector?
The two lines that go from each side of an ellipse drawn around the scattorplot of data.
1what happens to residuals when a model is a poor fit?
Residuals will be large.
What are three types of residuals?
standardised, unstandardised, and studentised.
What are unstandardised residuals?
The raw difference between predicted and observed scores.
What are standardised residuals?
unstandardised residuals converted to z scores.
What are studentised residuals?
Unstandardised residuals divided by the estimation of the standard deviation.
Name six ways to assess influential cases.
Mahalanobis distance, cooks distance, deleted residuals studentised deleted residuals, leverage (hat) values, DFFit.
What is adjusted predicted value?
the predicted value of the outcome for a case from a model where the case has been deleted.
What is the deleted residual
The different between the adjusted predicted value and the observed value.
What is the studentised deleted residual?
A deleted residual divided by the standard error.
What is the leverage (hat) value?
Influence of observed value of outcome variable over the predictor variable
What is mahalanobis distance?
The distance of cases from the mean of the predictor variable.
What is cooks distance?
A measure of the overall influence of a case on the model.
What is DFFit?
The difference between the adjusted predicted value and the original predicted value.
What is DFBeta?
The difference between a parameter estimated using all cases and estimated when one case is excluded.
What is covariance ratio (CVR)?
quantifies the degree to which a case influences the variance of the regression parameters.
Check the assumptions of heterodasticity and linearity of residuals by?
plot standardised predicted values vs standardised residual values. If random array data is linear and homoscedastic.
Partial plot: residuals of outcome variable vs. each predictor variable evenly spaced dots around line indicate homoscedasticity.
Test normality of residuals?
histogram, probability plot
What statistics does bayesian regression give?
estimation of b, 95% credible of intervals for model parameters, eg: 95% probability population value of b lies between…