Linear regression Flashcards by James Clinch

In linear regression, what is the notation used to represent the intercept and the slope (regression coefficient), respectively (based on sample data)

a and b

How well did you know this?

Not at all

Perfectly

In a regression equation, what does Yi denote?

Observed scores on the DV

How well did you know this?

Not at all

Perfectly

In a regression equation, what does Xi denote?

Observed scores on the IV

How well did you know this?

Not at all

Perfectly

In a regression equation, what does Y(hat) denote?

Predicted scores on the DV

How well did you know this?

Not at all

Perfectly

In a regression equation, what does ei denote?

Residual scores in the regression model (ie difference between observed and predicted scores)

How well did you know this?

Not at all

Perfectly

In regression analysis, what does OLS stand for?

Ordinary Least Squares

How well did you know this?

Not at all

Perfectly

Ordinary Least Square (OLS) estimates are biased T/F

FALSE

They are unbiased

How well did you know this?

Not at all

Perfectly

In a regression equation, what does p denote?

Number of partial regression coefficients

this applies in multiple regression analysis, where you have multiple IVs

How well did you know this?

Not at all

Perfectly

Ordinary Least Square (OLS) estimates are very efficient T/F

TRUE

How well did you know this?

Not at all

Perfectly

What metric do we use to calculate the strength of prediction of our overall regression model?

How well did you know this?

Not at all

Perfectly

What does r2 actually tell us

The proportion of the variance accounted for by our regression model

How well did you know this?

Not at all

Perfectly

What is the range of possible values for r squared?

0-1

How well did you know this?

Not at all

Perfectly

To calculate the confidence interval on r squared, you need the upper and lower degrees of freedom associated with the F statistic, T/F?

TRUE

How well did you know this?

Not at all

Perfectly

R squared is biased and consistent, T/F

TRUE

The bias means you need to get the adjusted R squared too

How well did you know this?

Not at all

Perfectly

If you have lots of IVs and a small sample size, what should you do to that r2

Adjust it

How well did you know this?

Not at all

Perfectly

So you want to compare the strength of two partial regression coefficients. What are your two options?

STANDARDISE IT

2. Use a SEMI PARTIAL CORRELATION

How well did you know this?

Not at all

Perfectly

How can you tell you are looking at an R output containing standardised regression coefficients?

There will be no intercept presented

How well did you know this?

Not at all

Perfectly

When looking at standardised regression coefficients, what is the unit they are expressed in?

Study These Flashcards

SDs

What are you looking at when you’re looking at (squared) semi partial correlation?

Study These Flashcards

The proportion of the variance in the DV explained by the IV you have isolated, assuming all other IVs are held constant

‘This IV uniquely accounts for x % of variation in the DV’

When commenting on the proportion of the variance that a given IV accounts for in the DV, should you used semipartial correlation or SQUARED semipartial correlation?

Study These Flashcards

SQUARED!

What are the four statistical assumptions underlying the linear regression model?

Study These Flashcards

Independence of scores
Linearity (of what?)
Homoscedasticity (constant variance of residuals)
Normality of residual scores

In the context of assumptions for regression models… what does LINEARITY refer to

Study These Flashcards

The assumption that scores on the DV are a linear function of scores on the IV

In the context of assumptions for regression models… what does HOMOSCEDASTICITY refer to

Study These Flashcards

It means…

The variance of the residual scores… is the same for any score on each dependent variable

In the context of regression analysis, how do we check for LINEARITY …

and what are we looking for when we do the looking

Study These Flashcards

Scatterplot matrix of… ALL DVs and IVs
Scatterplot matrix of… residual scores and observed IVs AND residual scores and predicted IVs
Marginal model plots of… scores on DV and observed IVs AND predicted IV

We are looking for straight lines

In the context of regression analysis, how do we check for HOMOSCEDASTICITY ... and what are we looking for when we do the looking

1. Scatterplot matrix of... residuals and observed IVs 2. Scatterplot matrix of... residuals and predicted IVs And in both cases, we're looking for a shaft not a triangle 3. Bruesch-Pagan test In which case, we're looking for a large P value

In the context of assumptions for regression models... what does NORMALITY OF RESIDUALS refer to

Residual scores are normally distributed

In the context of regression analysis, how do we check for NORMALITY OF RESIDUALS ... and what are we looking for when we do the looking

The usual ways... histograms, qqplots, boxplots

What happens if one the four assumptions for regression analysis are violated?

It will fuck with your CIs

When do we use a Breusch-Pagan test?

In the context of regression analysis, when checking your HOMOSCEDASTICITY assumption (constancy of variance of residuals)

When doing a Breusch-Pagan test, what are we actually looking for

The p value, and we want it to be LARGE

What is the Bruesch-Pagan test actually applied to? (3)

1. each IV alone 2. all IVs together 3. the regression model itself

What's the R function for Bruesch-Pagan?

ncvTest()

In the context of regression analysis, how do we look for outliers

Using Studentized residuals

When do we use STUDENTISED RESIDUALS

When checking for outliers (in the context of regression analysis)

What counts as a large score STUDENTISED RESIDUALS

3 in large to moderate samples | maybe 3.5 to 4 in smaller samples

In the context of regression analysis, how do we look for influential cases

Using COOK's d!

What do we use Cook's d for?

When checking for influential cases (in the context of regression analysis)

What counts as a large score Cook's d?

Anything more than 1

In regression notation, what does k represent

Number of IVs

How do you calculate degrees of freedom for a linear regression?

df = n - k - 1

If you have a linear regression with three IVs and 100 observations, how many degrees of freedom do you have?

df = n - k - 1 df = 100 - 3 - 1 df = 96

If your degrees of freedom decreases, what happens to your (unadjusted) r squared value?

It must increase... hence the need for adjusting...

What does adjusting r squared do for you?

Compensates for the reduced power of the model that occurs when you have low degrees of freedom

Linear regression Flashcards

(43 cards)