Linear regression Flashcards

1
Q

In linear regression, what is the notation used to represent the intercept and the slope (regression coefficient), respectively (based on sample data)

A

a and b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a regression equation, what does Yi denote?

A

Observed scores on the DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a regression equation, what does Xi denote?

A

Observed scores on the IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a regression equation, what does Y(hat) denote?

A

Predicted scores on the DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In a regression equation, what does ei denote?

A

Residual scores in the regression model (ie difference between observed and predicted scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In regression analysis, what does OLS stand for?

A

Ordinary Least Squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ordinary Least Square (OLS) estimates are biased T/F

A

FALSE

They are unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a regression equation, what does p denote?

A

Number of partial regression coefficients

this applies in multiple regression analysis, where you have multiple IVs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ordinary Least Square (OLS) estimates are very efficient T/F

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What metric do we use to calculate the strength of prediction of our overall regression model?

A

R2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does r2 actually tell us

A

The proportion of the variance accounted for by our regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the range of possible values for r squared?

A

0-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To calculate the confidence interval on r squared, you need the upper and lower degrees of freedom associated with the F statistic, T/F?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

R squared is biased and consistent, T/F

A

TRUE

The bias means you need to get the adjusted R squared too

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If you have lots of IVs and a small sample size, what should you do to that r2

A

Adjust it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

So you want to compare the strength of two partial regression coefficients. What are your two options?

A
  1. STANDARDISE IT

2. Use a SEMI PARTIAL CORRELATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you tell you are looking at an R output containing standardised regression coefficients?

A

There will be no intercept presented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When looking at standardised regression coefficients, what is the unit they are expressed in?

A

SDs

19
Q

What are you looking at when you’re looking at (squared) semi partial correlation?

A

The proportion of the variance in the DV explained by the IV you have isolated, assuming all other IVs are held constant

‘This IV uniquely accounts for x % of variation in the DV’

20
Q

When commenting on the proportion of the variance that a given IV accounts for in the DV, should you used semipartial correlation or SQUARED semipartial correlation?

A

SQUARED!

21
Q

What are the four statistical assumptions underlying the linear regression model?

A
  1. Independence of scores
  2. Linearity (of what?)
  3. Homoscedasticity (constant variance of residuals)
  4. Normality of residual scores
22
Q

In the context of assumptions for regression models… what does LINEARITY refer to

A

The assumption that scores on the DV are a linear function of scores on the IV

23
Q

In the context of assumptions for regression models… what does HOMOSCEDASTICITY refer to

A

It means…

The variance of the residual scores… is the same for any score on each dependent variable

24
Q

In the context of regression analysis, how do we check for LINEARITY …

and what are we looking for when we do the looking

A
  1. Scatterplot matrix of… ALL DVs and IVs
  2. Scatterplot matrix of… residual scores and observed IVs AND residual scores and predicted IVs
  3. Marginal model plots of… scores on DV and observed IVs AND predicted IV

We are looking for straight lines

25
Q

In the context of regression analysis, how do we check for HOMOSCEDASTICITY …

and what are we looking for when we do the looking

A
  1. Scatterplot matrix of… residuals and observed IVs
  2. Scatterplot matrix of… residuals and predicted IVs

And in both cases, we’re looking for a shaft not a triangle

  1. Bruesch-Pagan test

In which case, we’re looking for a large P value

26
Q

In the context of assumptions for regression models… what does NORMALITY OF RESIDUALS refer to

A

Residual scores are normally distributed

27
Q

In the context of regression analysis, how do we check for NORMALITY OF RESIDUALS …

and what are we looking for when we do the looking

A

The usual ways… histograms, qqplots, boxplots

28
Q

What happens if one the four assumptions for regression analysis are violated?

A

It will fuck with your CIs

29
Q

When do we use a Breusch-Pagan test?

A

In the context of regression analysis, when checking your HOMOSCEDASTICITY assumption (constancy of variance of residuals)

30
Q

When doing a Breusch-Pagan test, what are we actually looking for

A

The p value, and we want it to be LARGE

31
Q

What is the Bruesch-Pagan test actually applied to? (3)

A
  1. each IV alone
  2. all IVs together
  3. the regression model itself
32
Q

What’s the R function for Bruesch-Pagan?

A

ncvTest()

33
Q

In the context of regression analysis, how do we look for outliers

A

Using Studentized residuals

34
Q

When do we use STUDENTISED RESIDUALS

A

When checking for outliers (in the context of regression analysis)

35
Q

What counts as a large score STUDENTISED RESIDUALS

A

3 in large to moderate samples

maybe 3.5 to 4 in smaller samples

36
Q

In the context of regression analysis, how do we look for influential cases

A

Using COOK’s d!

37
Q

What do we use Cook’s d for?

A

When checking for influential cases (in the context of regression analysis)

38
Q

What counts as a large score Cook’s d?

A

Anything more than 1

39
Q

In regression notation, what does k represent

A

Number of IVs

40
Q

How do you calculate degrees of freedom for a linear regression?

A

df = n - k - 1

41
Q

If you have a linear regression with three IVs and 100 observations, how many degrees of freedom do you have?

A

df = n - k - 1

df = 100 - 3 - 1

df = 96

42
Q

If your degrees of freedom decreases, what happens to your (unadjusted) r squared value?

A

It must increase…

hence the need for adjusting…

43
Q

What does adjusting r squared do for you?

A

Compensates for the reduced power of the model that occurs when you have low degrees of freedom