4. F-Tests and Standardisation Flashcards

Question 1

Q

How do we test significance of overall model?

Question 2

Q

What is an F-Test?

Answer

A

Process of testing the statistical significance of the test stat (F-ratio)

Question 3

Q

What is not sufficient to test the significance of the overall model?

Answer

A

Testing individual predictors and R2

Question 4

Q

What is the F-ratio?

Answer

A

Ratio of explained variance to unexplained variance

f = (SS model/ Df model) / (SS residual/ Df residual)

so

f = MS model / MS Residual

Tests the null hypothesis that all the regression slopes in a model are all zero so that our predictors tell us nothing about our outcome/don’t explain variance

Question 5

Q

What are mean squares?

Answer

A

Mean squares are sums of squares calculations divided by the associated degrees of freedom.

Question 6

Q

How is the null hypothesis explained in terms of F-test?

Answer

A

The null hypothesis for the model says that the best guess of any individuals y value is the mean of y plus error.

Or, that the x variables carry no information collectively about y.

i.e. the slopes all = 0

Question 7

Q

What would the different results from a F-Test demonstrate?

Answer

A

Big F-ratio = ^ Model significance as more model variance than residual variance involved

F > 1 = ^ More model than residual

F = close to one if null is true

Question 8

Q

How do you test if your F-test is significant?

Answer

A

Select alpha level
Calculate critical value of F
Compare value to critical value
F-ratio is evaluated against an F-distribution with df model and df residual & pre-defined alpha

If it is more extreme than critical value then we reject the null

Question 9

Q

What is an f-distribution?

Answer

A

Test for equality of variances from two normal populations

Question 10

Q

What are degrees of freedom?

Answer

A

Number of independent values associated with the different calculations

Df are typically the combination of sample size and the number of things you need to calculate/estimate.

Question 11

Q

What are the three different types of degrees of freedom? (Name only)

Answer

A

Residual DF

Total DF

Model DF

Question 12

Q

What are residual degrees of freedom?

Answer

A

Remaining dimensions that you could use to generate a new data set, that looks like current data set

n-k-1

SS residual calculation is based on our model, in which we estimate k β terms (-k) and an intercept (-1)

Question 13

Q

What are total DF?

Answer

A

SS total calculation based on observed yi & mean of y

n-1

Question 14

Q

What is model DF?

Answer

A

Number of parameters in model that are estimated from data = K

SS model is dependent on the slope (beta)

Question 15

Q

What are unstandardized coefficients?

Answer

A

When the coefficients are in the same units they are as when the data was collected

It is useful when the units are meaningful

Question 16

Q

What are standardized coefficients?

Answer

A

Coefficients that have been z-scored (dividing individual deviations by mean deviations)

The interpretation of the coefficients becomes the increase in y in standard deviation units for every standard deviation increase in x

Question 17

Q

Why is standardization useful?

Answer

A

Useful for comparison if variables are on different scales

Useful if scales are arbitrary

Question 18

Q

What happens to R2, F-test, T-test and B0 when coefficients are standardized?

Answer

A

R2, F test and T test stay the same

B0 = Zero when standardised

Question 19

Q

Why should we be cautious in using standardization?

Answer

A

Just because you can put regression coefficients on a common metric doesn’t mean they can be meaningfully compared.

The SD is a poor measure of spread for skewed distributions, therefore, be cautious of their use with skewed variables

Question 20

Q

What does standardization do to the correlation?

Answer

A

Standardized slope ( ^β∗1) = correlation coefficient (r) for a linear model with a single continuous predictor.

They are the same:

r is a standardized measure of linear association
^β∗1 is a standardized measure of the linear slope.
Something similar is true for linear models with multiple predictors.

Slopes are equivalent to the part correlation coefficient

Question 21

Q

What is the value of the intercept when continuous variables are standardised?