Multiple Linear regression Flashcards by jack totten

What is the difference between simple and multiple linear regression?

Simple only takes into account one co-variate

- Multiple takes into account more than one co-variate

How well did you know this?

Not at all

Perfectly

What is the best way to display the results of multiple linear regression?

Using a scatter plot

How well did you know this?

Not at all

Perfectly

What is the general equation for predicting y given any x in multiple linear regression?

Y = Bo + B1X1 + B2X2 + … + BnXn

How well did you know this?

Not at all

Perfectly

What are Factor variables?

Variables fitted as factors allows the response to vary with the value of the co-variate

How well did you know this?

Not at all

Perfectly

What are Treatment contrast variables?

one level forms the base line, remaining levels have corresponding co-coefficients

How well did you know this?

Not at all

Perfectly

What are Dummy Variables?

Variables that switch on (x=1) and switch off (x=0)

How well did you know this?

Not at all

Perfectly

What are the value types co-variates take and what type does the response take in multiple linear regression?

Continuous
Categorical (use dummy variables)
The response takes continuous values

How well did you know this?

Not at all

Perfectly

What does the ordinary R^2 value mean?

Describes the absolute fit of data without taking into account different numbers of co-variates

How well did you know this?

Not at all

Perfectly

What does the adjusted R^2 value mean?

Describes the absolute fit of data, taking into account the different number of co-variates
No longer readily interpret-able as it no longer lies between 0 and 1

How well did you know this?

Not at all

Perfectly

What are some of the problems with model selection in multiple linear regression?

Including too few variables throws away data
Including too many variables raises the standard error and p value substantially
Too simple or too complex models have poor predicting abilities

How well did you know this?

Not at all

Perfectly

What is used to test if Variables are colinear?

Variance Inflation Factors (VIF)

How well did you know this?

Not at all

Perfectly

How is VIF calculated?

one divided by one minus the R^2 value

How well did you know this?

Not at all

Perfectly

What is the average VIF score at which models selection should be altered?

When VIF score is greater than 5

How well did you know this?

Not at all

Perfectly

What is the best way of dealing with colinearity?

Checking the VIF score of the suspected colinear variable

- remove the covariate from the model if the VIF score is greater than 5

How well did you know this?

Not at all

Perfectly

What is the hypothesises of multiple linear regression in regards to factor coefficients?

Ho: all coefficients are equal to 0
H1: at least one coefficient is not equal to 0

How well did you know this?

Not at all

Perfectly

What can the F-test be used to compare in Multiple linear regression?

Study These Flashcards

The F-test can be used to compare nested models with the original model

What can be used to compare two non-nested equations?

Study These Flashcards

AIC or BIC can be used to compare non nested equations

What is Occam’s razor?

Study These Flashcards

when comparing models of equal explanatory power, the simplest model should be chosen

What does a small AIC or BIC score show?

Study These Flashcards

A good model of the data

What is the difference between BIC and AIC?

Study These Flashcards

BIC employs a penalty for the sample size (N)

- AIC employs a penalty for the number of Paramters used (P)

What is forward selection?

Study These Flashcards

Starts with a simple model and adds variables

What is backwards selection?

Study These Flashcards

Starts with the most complex model and removes variables

Why might AICc be used instead of AIC?

Study These Flashcards

When N»P

What are AIC weights based off of?

Study These Flashcards

Based on the difference between the fir of each model and the best one of those models

Describe what an interaction term is

An interaction term is a covariate in which changing its value changes the slope coefficient for x and y -Similar to synergy in chemistry

In the equation: Y = Bo + B1X + B2X.F What is the X.F term and what does it do?

The X.F term is an interaction and it changes the gradient of the regression line

What are the four assumption errors in Multiple Linear regression?

- Described by the normal distribution - Constant Variance - Related to the mean of the fitted value - Independent of one another

What are the two main ways of assessing Normality of Data?

- Qualitative (Histograms, QQplots) | - Formal testing (Shapiro-Wilks, Kolmogorov-Smirnov tests)

What are QQ norm-plots?

Plots that plot the residuals in order against their standardised quantities for a given range of probabilities

What is the Null hypothesis and Alternative hypothesis of the Shapiro-Wilks test?

Ho: The Data is normally distributed H1: The Data is not normally distributed -SW test is the only one where the Null is the one you actually expect/want to happen

What should we see on a QQ-norm plot if the data has a constant error variance?

We should see a patternless horizontal band rather than a slope

What are partial residuals used for?

They are used when we have multiple covariates in a model and we may not be using the correct model -They are found by adding the estimated relationship to the residuals for the model

What are some of the diagnostic properties of partial residuals?

- The slope is the regression coefficient - Extent of the scatter tells us about the support for the function/model implemented - Can identify large residuals

How might we use a bootstrap to solve the problem of interactions with partial residuals?

- Sample the data without replacement to make an equivalently dimensioned data set to the original one - Fit the model to the new data set - Repeat this several times and take a confidence interval

How is correlation indexed?

Indexed by the correlation coefficient which lies between -1 and 1

What is the causal correlation fallacy?

Causality implies correlation but correlation may not always imply causality

What is a type 1 multiple linear regression test?

The sum of squares are considered sequentially

What is a type 2 multiple linear regression test?

The sum of square are considered as if the interaction term was last in the model

Multiple Linear regression Flashcards

Understand Multiple linear regression (38 cards)