Linear Regression Flashcards by Lucy Reid

What is regression?

A way to study relationships between variables.

How well did you know this?

Not at all

Perfectly

What are the two main reasons we’d use regression?

description and explanation (genuine interest in the nature of the relationship between variables)
prediction (using variables to predict others)

How well did you know this?

Not at all

Perfectly

What are linear regression models?

contain explanatory variable(s) which help us explain or predict the behaviour of the response variable
assume constantly increasing or decreasing relationships between each explanatory variable and the response

How well did you know this?

Not at all

Perfectly

What structure does a linear model have?

response = intercept + (slope x explanatory variable) + error

yi = β0 + β1xi + ∈i

How well did you know this?

Not at all

Perfectly

What is the intercept of a linear model?

β0

response variable when the explanatory variables are 0
where the regression cuts the vertical axis

How well did you know this?

Not at all

Perfectly

What is the slope of a linear model?

β1, gradient of the regression line

How well did you know this?

Not at all

Perfectly

What is the error term of a linear model?

∈i

not all data follows the relationship exactly
∈i allows fo deviations
normally distributed in the y dimension (zero mean, variance is estimated as part of the fitting process)

How well did you know this?

Not at all

Perfectly

What is the Least Square (LS) Criterion?

can be used to fit the regression
finds parameters that minimise:

Σ (data - model)^2

How well did you know this?

Not at all

Perfectly

What is a residual?

The vertical distance between the observed data and the best fit line.

How well did you know this?

Not at all

Perfectly

How is the slope estimated?

β1(hat) = (Σ (xi-x̄) * yi) / (Σ (xi-x̄)^2)

x̄ is the mean explanatory variable

How well did you know this?

Not at all

Perfectly

How is the intercept estimated?

β0(hat) = y̅ - (β1(hat) * x̄)

x̄ is the mean explanatory variable
y̅ is the mean of the response

How well did you know this?

Not at all

Perfectly

How is the variance estimate calculated?

s^2 = (1/(n - k - 1))*Σ (yi - yi(hat))^2

n is number of observations, k is number of slope parameters estimated

How well did you know this?

Not at all

Perfectly

How do we work out how much of the total observered variation has been explained?

Work out the proportion of unexplained variation and - from 1:

R^2 = 1 - ((Σ(yi - y(hat))^2)/(Σ(yi - y̅)^2))

R^2 = 1 - (SSerror/SStotal)

numerator: square error
demoninator: total sum of squares

How well did you know this?

Not at all

Perfectly

What is the definition of the best line?

One that minimises the residual sums-of-squares.

How well did you know this?

Not at all

Perfectly

What are the main reasons to use multiple covariates?

description (interest in findinf relationship between such variables)
prediction (knowledge of some will help us predict others)

How well did you know this?

Not at all

Perfectly

What is added to a simple regression model to make it a multiple regression model?

More explanatory variables (of the form βp*xpi).

How well did you know this?

Not at all

Perfectly

What model is used for the noise of a multiple regression model?

Normal distribution, 0 mean, variance σ^2.

How well did you know this?

Not at all

Perfectly

What are dummy variables?

switch on (x=1) or off (x=0) depending on level of the factor variable
first of the group acts as baseline, rest switch on when applicable (n-1 variables)

How well did you know this?

Not at all

Perfectly

What is parameter inference?

In order to make general statements about model parameters we can generate ranges of plausible values for these parameters and test “no-relationship” hypotheses.

How well did you know this?

Not at all

Perfectly

What test statistic value is used when calculating the confidence intervals for slope parameters?

Study These Flashcards

t(α/2, df=N-P-1)

N: total number of observations
P: number of explanatory variables fitted in the model

What is the null hypothesis for parameter inference?

Study These Flashcards

H0: βp(hat) = 0

H1: βp(hat) does not equal 0

What is the equation for the adjusted R^2?

Study These Flashcards

Adjusted R^2 = 1 - ((N - 1)*(1 - R^2)/N - P - 1)

N: total number of observations
P: number of explanatory variables fitted in the model
R^2: squared correlation

What is the standard error for the prediction on xp (xp any value)?

Study These Flashcards

se(y(hat)) = sqrt(MSE * ((1/n)+(((xp - x̄)^2)/(Σ(xi - x̄^2)))))

MSE: mean square error/residual from ANOVA table

Why do we want an appropriate amount of covariates in our model? What happens if theres too many/few? What if the model is too simple/complex?

Study These Flashcards

too few: throw away valuable info
non-esential variables: se and p-value tend to be too large
too simple/complex: model will have poor predictive abilities

What happens when collinear variables are put together in a model?

- model is unstable | - inflated se

What are Variance Inflation Factors (VIFs)?

Detect collinearity. VIF = 1/(1 - R^2) R^2 squared correlation

How should variables be removed?

One at a time.

How does p-value based model selection work?

- covariates with one associated coefficient, retention can be based on the associated p-value (large p-values suggest omission)

What type of regression models does the F-test work on? What can we use for other models?

Nested models. Can use AIC or BIC on both nested and non-nested models.

What is Akaike's Information Criterion (AIC)?

The smaller the AIC value, the better the model. AIC = -2*log-likelihood value + 2P P: number of est. parameters log-likelihood: calculated using the est. parameters in the model

What is AICc?

Used when sample size isnt much larger than the number of parameters in the model. AICc = AIC + (2P(P + 1))/(N - P - 1) N>>P then AICc -> AIC

What is BIC?

Differs from AIC by employing a penalty that charges with the sample size (N). BIC = -2*log-likelihood value + log(N)*P

What values of BIC represent a better model?

Smaller BIC values.

How are AIC weights calculated?

Δi(AIC) = AICi - minimum AIC wi(AIC) = exp{-1/2*Δi(AIC)}/(Σ e{-1/2*Δk(AIC)})

What is interaction?

- Similar to 'syndergy' in chemistry. Non-additive effect (eg A = +10, B = +20, A+B = -10) - interaction term is significant then p-values associated with main effect are irrelevent - interactions should always come last in the sequence of predictors

What values can R^2 take?

Between 0 and 1.

What assumptions do we make about the errors of a linear model?

We assume one Normal distribution provides the (independant) noise.

How do we assess Normality?

- qualitative assessment from plotting (histogram of residuals, QQ-norm plot) - formal test of normality (wilks-shapiro)

What do QQ-norm plots tell us? And how are they formed?

- plot quantiles of two sets of data against one another - shapes are similar -> get straight line (y=x) -> data normally dist. - residuals in ascending order, standardised (divide by sd), plotted against normal dist.

How is the ith point on a QQ-norm plot found?

p(i) = i/(n+1)

What is the Shapiro-Wilks test?

- tests for normality | - H0: data is normally dist.

What is the Breusch-Pagan test?

- a model which satifies the constant error variance assumption would produce a plot with a horizontal line

How do we assess independance?

- Durbin-Watson test (H0: uncorreleated errors) | - independnce can be violated in ways that cannot be tested (eg pseudoreplication)

How can we tell what variable in a signal causes non-linearity?

Use partial (residual) plots. These are found by adding the estimated relationship (for pth predictor βp*xpi) to the residuals (ri) of the model.

When do we bootstrap (for linear regression models)?

- horrible dist of residuals - reasonably happy with signal model - independant isnt and issue

What values can correlation take?

The correlation coefficient (r) can take values between -1 and 1. (These pretty much correspond to gradients of straight lines).

How is the significance of r calculated?

t = r*sqrt(n - 2) / sqrt(1 - r^2)

Causaulity implies causation. True/False?

True, but not the other way around.

Linear Regression Flashcards

simple linear regression, multiple linear regression, model selection, diagnostics (48 cards)