data analysis: summary and ANOVA tables Flashcards
p≥0.1
no evidence against null hypothesis
0.01≤p≤0.1
low/moderate evidence against null hypothesis
0.001≤p≤0.01
strong evidence against null hypothesis
p<0.001
very strong evidence against null hypothesis
how can we determine whether our model is good from the summary(model) output
a good model has 0≤R²≤1 as close to 1 as possible
Since if R² = 1 then SSR/SST = 1 which implies SSE = 0. Then eᵢ = 0 for all I so we say the model has a “perfect fit”
how can we determine MSE (mean square error) from the summary(model) output?
MSE = “residual standard error”²
how can we determine R² from the summary(model) output?
R² = “multiple R-squared” = SSR/SST
β₀ is always the ….
…. intercept
how would you determine a confidence interval from the summary(model) output?
estimate ± c.v (standard error)
note: c.v = t(ɑ/2),(n-2) and can be looked up using t-tables
how can the F-stat calculated for ANOVA be verified
F-stat is part of the output in summary(model)
note that k and n-p are also outputted here
how do we calculate F from anova(model)?
MSR/MSE
how do we calculate DF from anova(model)?
DF for regression = k (given as part of summary(model) )
DF for error = n-p (given as part of summary(model) )
total DF is always n-1
how do we calculate SSR from anova(model)?
sum sq for x1 + sum sq for x2
how do we calculate SSE from anova(model)?
sum sq for residuals
how do we calculate SST from anova(model)?
SSR + SSE
how do we calculate MSR from anova(model)?
SSR/DF (regression)
how do we calculate MSE from anova(model)?
SSE/DF (error)
what is the definition of SSR
measure of how well our line fits the data
how do we interpret SSR
higher SSR, better model fit
what is adj R squared
adjusting for number of variables in regression. more variables can make R²adj lower
what is the definition of SSE
measure of how much of the variability is in the error term
what is the definition of SST
dispersion of the observed variables around the mean
what is the definition of MSE
averaged squared difference between the estimated values and actual values
how do we interpret MSE
greater MSE, less likely the regression is significant
how do we interpret F
large values of F support the conclusion that the overall relationship is statistically significant