4. F-Tests and Standardisation Flashcards
How do we test significance of overall model?
F-test
What is an F-Test?
Process of testing the statistical significance of the test stat (F-ratio)
What is not sufficient to test the significance of the overall model?
Testing individual predictors and R2
What is the F-ratio?
Ratio of explained variance to unexplained variance
f = (SS model/ Df model) / (SS residual/ Df residual)
so
f = MS model / MS Residual
Tests the null hypothesis that all the regression slopes in a model are all zero so that our predictors tell us nothing about our outcome/don’t explain variance
What are mean squares?
Mean squares are sums of squares calculations divided by the associated degrees of freedom.
How is the null hypothesis explained in terms of F-test?
The null hypothesis for the model says that the best guess of any individuals y value is the mean of y plus error.
Or, that the x variables carry no information collectively about y.
i.e. the slopes all = 0
What would the different results from a F-Test demonstrate?
Big F-ratio = ^ Model significance as more model variance than residual variance involved
F > 1 = ^ More model than residual
F = close to one if null is true
How do you test if your F-test is significant?
- Select alpha level
- Calculate critical value of F
- Compare value to critical value
- F-ratio is evaluated against an F-distribution with df model and df residual & pre-defined alpha
If it is more extreme than critical value then we reject the null
What is an f-distribution?
Test for equality of variances from two normal populations
What are degrees of freedom?
Number of independent values associated with the different calculations
Df are typically the combination of sample size and the number of things you need to calculate/estimate.
What are the three different types of degrees of freedom? (Name only)
Residual DF
Total DF
Model DF
What are residual degrees of freedom?
Remaining dimensions that you could use to generate a new data set, that looks like current data set
n-k-1
SS residual calculation is based on our model, in which we estimate k β terms (-k) and an intercept (-1)
What are total DF?
SS total calculation based on observed yi & mean of y
n-1
What is model DF?
Number of parameters in model that are estimated from data = K
SS model is dependent on the slope (beta)
What are unstandardized coefficients?
When the coefficients are in the same units they are as when the data was collected
It is useful when the units are meaningful
What are standardized coefficients?
Coefficients that have been z-scored (dividing individual deviations by mean deviations)
The interpretation of the coefficients becomes the increase in y in standard deviation units for every standard deviation increase in x
Why is standardization useful?
Useful for comparison if variables are on different scales
Useful if scales are arbitrary
What happens to R2, F-test, T-test and B0 when coefficients are standardized?
R2, F test and T test stay the same
B0 = Zero when standardised
Why should we be cautious in using standardization?
Just because you can put regression coefficients on a common metric doesn’t mean they can be meaningfully compared.
The SD is a poor measure of spread for skewed distributions, therefore, be cautious of their use with skewed variables
What does standardization do to the correlation?
Standardized slope ( ^β∗1) = correlation coefficient (r) for a linear model with a single continuous predictor.
They are the same:
r is a standardized measure of linear association
^β∗1 is a standardized measure of the linear slope.
Something similar is true for linear models with multiple predictors.
Slopes are equivalent to the part correlation coefficient
What is the value of the intercept when continuous variables are standardised?
0