Model Checking Flashcards
Var[e_i]=?
Cov[e_i,e_j]=
Standardize residuals
Studentise residuals
Replace σ^2 in Standardized residual with S^2
Constant varíance?
Homoscedasticity
To check linearity of model
Plot r_i against x_i
To check homoscedasticity
Plot r_i against y^^_i (fitted valúes)
h_ii =
Rule of thumb for outlier observations when standardised
If abs value >2, outlier
Large leverage?
Very large leverage
Cook’s distance?
Statistic to measure influence of an observation
Determine if cook’s stat is unusually large?
If D_i is bigger than 50th percentile of(where p is #parameters):
Pure error?
Replications
More than one observation for some valúes of an explanatory variable;
Y_ij for x_i
When múltiple observations at single x_i
Sum of squares for residuals (Y_ij for x_i)
Puré error sum of squares
Lack of fit sum of squares
In SLRM SS_E =
SS_LoF + SS_PE
ANOVA table columns
Source of variation, d.f., SS, MS, VR
E(SS_PE) =
(N-m)σ^2
If SLRM is true then E(SS_LoF) =
MS_PE and MS_LoF give estimators?
Both give unbiased estimators of var
But latter only if SLRM is true
F test for lack of fit:
-H_0?
SLRM is true
F test for lack of fit:
H_1?
F test for lack of fit:
-2Stats?
F test for lack of fit:
-F stat under H_0
Can only ro F test for LoF if
We have replications (not repeated measurements of same sampling unit)
0 vector
h_ii is?
ite diagonal element of Hat matrix
ith mean response? (Matrix)
Estimator of ith mean response
Varíance of estimator of ith mean response
Estimator of varíance of estimator of ith mean response
Varíance of estimator of beta zero
Vector of residuals
Múltiple regression model written in vectors
Vector of fitted valúes
ith fitted valúe? (Vector)
h_ii indicates?
Properties of h_ii:
As var(e_i) = σ^2 (1-h_ii)
Properties of h_ii:
h_ii is usually small/large when?
Centroid?
The vector of means of each feature across all data points
Properties of h_ii:
When p=2?
SLRM,
Properties of h_ii:
Range of value for h_ii
1/n < h_ii < 1
Properties of h_ii:
Sum of h_ii
Average leverage
p/n
High leverage
h_ii > 2p/n
Very high leverage
h_ii > 3p/n
Cooks distance in vectors
Cooks distance in vectors (reduced)
PRESS residuals
PRediction Error Sum of Squares
PRESS
Sum of squares of press residuals
PRESS residuals simplified
What does PRESS assess?
The model’s predictive ability
-used for calculating predicted R^2
Predicted R^2 defined?
When is Predicted R^2 used?
In MLRM to indicate how well the model predicts responses to new observations
A Good modele ould have R^2
And R^2(pred) both high and close to each other
Large discrepancy in R^2 and R^2(pred)
Means model May b over fitted
If (below) is singular, then?
No uni que least square estimators exist
Singularity of below caused by
Linear dependence among explanatory variables
Problems of Multicollinearity
-some or all estimators will have large variances
-very different models May fit equally well therefore variable selection may b difficult
- some params May have wrong sign
What is use of varíance inflación factor?
Used to indicate when multi collinearity May b a problem
VIF_j=?
For regression w p-1 predictors, model with X_j as function of remaining p-2 exp variables, R^2_j coefficient of determination (not as a %)
The scatterplot of the residuals vs fitted values is useful to check if
-the variance of the error is constant
- if there is any trend in the residuals (thus some term is missing in the regression model)
- if there is any outlier