Linear Regression Flashcards

simple linear regression, multiple linear regression, model selection, diagnostics

1
Q

What is regression?

A

A way to study relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main reasons we’d use regression?

A
  • description and explanation (genuine interest in the nature of the relationship between variables)
  • prediction (using variables to predict others)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are linear regression models?

A
  • contain explanatory variable(s) which help us explain or predict the behaviour of the response variable
  • assume constantly increasing or decreasing relationships between each explanatory variable and the response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What structure does a linear model have?

A

response = intercept + (slope x explanatory variable) + error

yi = β0 + β1xi + ∈i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the intercept of a linear model?

A

β0

  • response variable when the explanatory variables are 0
  • where the regression cuts the vertical axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the slope of a linear model?

A

β1, gradient of the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the error term of a linear model?

A

∈i

  • not all data follows the relationship exactly
  • ∈i allows fo deviations
  • normally distributed in the y dimension (zero mean, variance is estimated as part of the fitting process)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Least Square (LS) Criterion?

A
  • can be used to fit the regression
  • finds parameters that minimise:

Σ (data - model)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a residual?

A

The vertical distance between the observed data and the best fit line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the slope estimated?

A

β1(hat) = (Σ (xi-x̄) * yi) / (Σ (xi-x̄)^2)

x̄ is the mean explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is the intercept estimated?

A

β0(hat) = y̅ - (β1(hat) * x̄)

x̄ is the mean explanatory variable
y̅ is the mean of the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the variance estimate calculated?

A

s^2 = (1/(n - k - 1))*Σ (yi - yi(hat))^2

n is number of observations, k is number of slope parameters estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we work out how much of the total observered variation has been explained?

A

Work out the proportion of unexplained variation and - from 1:

R^2 = 1 - ((Σ(yi - y(hat))^2)/(Σ(yi - y̅)^2))

R^2 = 1 - (SSerror/SStotal)

numerator: square error
demoninator: total sum of squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the definition of the best line?

A

One that minimises the residual sums-of-squares.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the main reasons to use multiple covariates?

A
  • description (interest in findinf relationship between such variables)
  • prediction (knowledge of some will help us predict others)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is added to a simple regression model to make it a multiple regression model?

A

More explanatory variables (of the form βp*xpi).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What model is used for the noise of a multiple regression model?

A

Normal distribution, 0 mean, variance σ^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are dummy variables?

A
  • switch on (x=1) or off (x=0) depending on level of the factor variable
  • first of the group acts as baseline, rest switch on when applicable (n-1 variables)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is parameter inference?

A

In order to make general statements about model parameters we can generate ranges of plausible values for these parameters and test “no-relationship” hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What test statistic value is used when calculating the confidence intervals for slope parameters?

A

t(α/2, df=N-P-1)

N: total number of observations
P: number of explanatory variables fitted in the model

21
Q

What is the null hypothesis for parameter inference?

A

H0: βp(hat) = 0

H1: βp(hat) does not equal 0

22
Q

What is the equation for the adjusted R^2?

A

Adjusted R^2 = 1 - ((N - 1)*(1 - R^2)/N - P - 1)

N: total number of observations
P: number of explanatory variables fitted in the model
R^2: squared correlation

23
Q

What is the standard error for the prediction on xp (xp any value)?

A

se(y(hat)) = sqrt(MSE * ((1/n)+(((xp - x̄)^2)/(Σ(xi - x̄^2)))))

MSE: mean square error/residual from ANOVA table

24
Q

Why do we want an appropriate amount of covariates in our model? What happens if theres too many/few? What if the model is too simple/complex?

A

too few: throw away valuable info
non-esential variables: se and p-value tend to be too large
too simple/complex: model will have poor predictive abilities

25
What happens when collinear variables are put together in a model?
- model is unstable | - inflated se
26
What are Variance Inflation Factors (VIFs)?
Detect collinearity. VIF = 1/(1 - R^2) R^2 squared correlation
27
How should variables be removed?
One at a time.
28
How does p-value based model selection work?
- covariates with one associated coefficient, retention can be based on the associated p-value (large p-values suggest omission)
29
What type of regression models does the F-test work on? What can we use for other models?
Nested models. Can use AIC or BIC on both nested and non-nested models.
30
What is Akaike's Information Criterion (AIC)?
The smaller the AIC value, the better the model. AIC = -2*log-likelihood value + 2P P: number of est. parameters log-likelihood: calculated using the est. parameters in the model
31
What is AICc?
Used when sample size isnt much larger than the number of parameters in the model. AICc = AIC + (2P(P + 1))/(N - P - 1) N>>P then AICc -> AIC
32
What is BIC?
Differs from AIC by employing a penalty that charges with the sample size (N). BIC = -2*log-likelihood value + log(N)*P
33
What values of BIC represent a better model?
Smaller BIC values.
34
How are AIC weights calculated?
Δi(AIC) = AICi - minimum AIC wi(AIC) = exp{-1/2*Δi(AIC)}/(Σ e{-1/2*Δk(AIC)})
35
What is interaction?
- Similar to 'syndergy' in chemistry. Non-additive effect (eg A = +10, B = +20, A+B = -10) - interaction term is significant then p-values associated with main effect are irrelevent - interactions should always come last in the sequence of predictors
36
What values can R^2 take?
Between 0 and 1.
37
What assumptions do we make about the errors of a linear model?
We assume one Normal distribution provides the (independant) noise.
38
How do we assess Normality?
- qualitative assessment from plotting (histogram of residuals, QQ-norm plot) - formal test of normality (wilks-shapiro)
39
What do QQ-norm plots tell us? And how are they formed?
- plot quantiles of two sets of data against one another - shapes are similar -> get straight line (y=x) -> data normally dist. - residuals in ascending order, standardised (divide by sd), plotted against normal dist.
40
How is the ith point on a QQ-norm plot found?
p(i) = i/(n+1)
41
What is the Shapiro-Wilks test?
- tests for normality | - H0: data is normally dist.
42
What is the Breusch-Pagan test?
- a model which satifies the constant error variance assumption would produce a plot with a horizontal line
43
How do we assess independance?
- Durbin-Watson test (H0: uncorreleated errors) | - independnce can be violated in ways that cannot be tested (eg pseudoreplication)
44
How can we tell what variable in a signal causes non-linearity?
Use partial (residual) plots. These are found by adding the estimated relationship (for pth predictor βp*xpi) to the residuals (ri) of the model.
45
When do we bootstrap (for linear regression models)?
- horrible dist of residuals - reasonably happy with signal model - independant isnt and issue
46
What values can correlation take?
The correlation coefficient (r) can take values between -1 and 1. (These pretty much correspond to gradients of straight lines).
47
How is the significance of r calculated?
t = r*sqrt(n - 2) / sqrt(1 - r^2)
48
Causaulity implies causation. True/False?
True, but not the other way around.