Multiple Linear regression Flashcards

Understand Multiple linear regression

1
Q

What is the difference between simple and multiple linear regression?

A
  • Simple only takes into account one co-variate

- Multiple takes into account more than one co-variate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the best way to display the results of multiple linear regression?

A

Using a scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the general equation for predicting y given any x in multiple linear regression?

A

Y = Bo + B1X1 + B2X2 + … + BnXn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are Factor variables?

A

Variables fitted as factors allows the response to vary with the value of the co-variate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Treatment contrast variables?

A

one level forms the base line, remaining levels have corresponding co-coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Dummy Variables?

A

Variables that switch on (x=1) and switch off (x=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the value types co-variates take and what type does the response take in multiple linear regression?

A
  • Continuous
  • Categorical (use dummy variables)
  • The response takes continuous values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the ordinary R^2 value mean?

A

Describes the absolute fit of data without taking into account different numbers of co-variates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the adjusted R^2 value mean?

A
  • Describes the absolute fit of data, taking into account the different number of co-variates
  • No longer readily interpret-able as it no longer lies between 0 and 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some of the problems with model selection in multiple linear regression?

A
  • Including too few variables throws away data
  • Including too many variables raises the standard error and p value substantially
  • Too simple or too complex models have poor predicting abilities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is used to test if Variables are colinear?

A

Variance Inflation Factors (VIF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is VIF calculated?

A

one divided by one minus the R^2 value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the average VIF score at which models selection should be altered?

A

When VIF score is greater than 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the best way of dealing with colinearity?

A
  • Checking the VIF score of the suspected colinear variable

- remove the covariate from the model if the VIF score is greater than 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the hypothesises of multiple linear regression in regards to factor coefficients?

A

Ho: all coefficients are equal to 0
H1: at least one coefficient is not equal to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can the F-test be used to compare in Multiple linear regression?

A

The F-test can be used to compare nested models with the original model

17
Q

What can be used to compare two non-nested equations?

A

AIC or BIC can be used to compare non nested equations

18
Q

What is Occam’s razor?

A

when comparing models of equal explanatory power, the simplest model should be chosen

19
Q

What does a small AIC or BIC score show?

A

A good model of the data

20
Q

What is the difference between BIC and AIC?

A
  • BIC employs a penalty for the sample size (N)

- AIC employs a penalty for the number of Paramters used (P)

21
Q

What is forward selection?

A

Starts with a simple model and adds variables

22
Q

What is backwards selection?

A

Starts with the most complex model and removes variables

23
Q

Why might AICc be used instead of AIC?

A

When N»P

24
Q

What are AIC weights based off of?

A

Based on the difference between the fir of each model and the best one of those models

25
Q

Describe what an interaction term is

A

An interaction term is a covariate in which changing its value changes the slope coefficient for x and y
-Similar to synergy in chemistry

26
Q

In the equation:
Y = Bo + B1X + B2X.F
What is the X.F term and what does it do?

A

The X.F term is an interaction and it changes the gradient of the regression line

27
Q

What are the four assumption errors in Multiple Linear regression?

A
  • Described by the normal distribution
  • Constant Variance
  • Related to the mean of the fitted value
  • Independent of one another
28
Q

What are the two main ways of assessing Normality of Data?

A
  • Qualitative (Histograms, QQplots)

- Formal testing (Shapiro-Wilks, Kolmogorov-Smirnov tests)

29
Q

What are QQ norm-plots?

A

Plots that plot the residuals in order against their standardised quantities for a given range of probabilities

30
Q

What is the Null hypothesis and Alternative hypothesis of the Shapiro-Wilks test?

A

Ho: The Data is normally distributed
H1: The Data is not normally distributed
-SW test is the only one where the Null is the one you actually expect/want to happen

31
Q

What should we see on a QQ-norm plot if the data has a constant error variance?

A

We should see a patternless horizontal band rather than a slope

32
Q

What are partial residuals used for?

A

They are used when we have multiple covariates in a model and we may not be using the correct model
-They are found by adding the estimated relationship to the residuals for the model

33
Q

What are some of the diagnostic properties of partial residuals?

A
  • The slope is the regression coefficient
  • Extent of the scatter tells us about the support for the function/model implemented
  • Can identify large residuals
34
Q

How might we use a bootstrap to solve the problem of interactions with partial residuals?

A
  • Sample the data without replacement to make an equivalently dimensioned data set to the original one
  • Fit the model to the new data set
  • Repeat this several times and take a confidence interval
35
Q

How is correlation indexed?

A

Indexed by the correlation coefficient which lies between -1 and 1

36
Q

What is the causal correlation fallacy?

A

Causality implies correlation but correlation may not always imply causality

37
Q

What is a type 1 multiple linear regression test?

A

The sum of squares are considered sequentially

38
Q

What is a type 2 multiple linear regression test?

A

The sum of square are considered as if the interaction term was last in the model