STAT 2 - LAB 2 Flashcards
fit a linear model with 2 coefficients
lm.multiple <- lm(formula = medv ~ rm+chas,
data=Boston)
summary(lm.multiple)
fit a model with 2 coefficients and their interaction
lm.multiple.add <- lm(formula = medv ~ rm*chas, data=Boston)
summary(lm.multiple.add)
test ß3 = 0 using anova
anova(lm.multiple,lm.multiple.add)
1. model with 2 coefficient
2. model with interaction terms
create a fitted vs residuals plot
plot(fitted(lm.simple),resid(lm.simple),xlab = “Fitted values”,ylab=”Residuals”)
abline(h=0,col=”orange”)
Fit a polynomial of order 2 to predict medv through rm
lm.poly <- lm(formula = medv ~ rm+ I(rm^2), data=Boston)
Fit a polynomial of order 4 to predict medv through rm
lm.fit.nonlinear4 <- lm(formula = medv ~ poly(rm,4,raw=T), data=Boston)
Create four diagnostic plots of the model in EQ. 4 using the function plot.
plot(lm.poly)
Are there any points of large leverage?
sum(hatvalues(lm.poly) > 2*mean(hatvalues(lm.poly)))
Are there any point with a large residual?
sum(abs(rstandard(lm.poly))>2)
Are there any influential points
cooks.lm.poly <- cooks.distance(lm.poly)
sum(cooks.lm.poly>4/length(cooks.lm.poly))
Fit the model in EQ. 4 on the Boston data if we remove the influential points.
no.cooks <- cooks.lm.poly <=4/length(cooks.lm.poly)
lm.no.cooks <- lm(formula = medv ~ rm + I(rm^2), data=Boston, subset = no.cooks)
after you remove influential points, check the fitted vs residuals
plot(fitted(lm.no.cooks),residuals(lm.no.cooks), xlab=”Fitted values”, ylab=”Residuals)
abline(h=0,col=”orange”)
Fit the multiple linear regression model with medv as the response variable, and all other variables as predictors except rad.
lm.fit.multiple(formula= medv ~ . -rad, data=Boston)
Calculate variance inflation factor (VIF) score for the predictor variables. Your comments?
library(car)
car::vif(lm.fit.multiple)
the ones above 3 are quite high