L6 - Linear Regression Flashcards

1
Q

What is the purpose of a simple linear regression?

A

Simple linear regression involves fitting a line of best fit to a scatterplot of figures representing two linearly related variables (X) and (Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 6 features of a simple linear regression?

A
  1. Line of best fit
  2. Method of least squares
  3. Variance partitioning and residuals
  4. Coefficient of determination (R-squared)
  5. Regression equations/interpretation
  6. Coefficients and beta values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we develop a ‘line of best fit’

A

Find the y axis and then the intercept

Y = MX + C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between a multiple regression and a linear regression?

A

Multiple regression has multiple predictors

Linear regression has a single predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the total variance in a regression?

A

How far your actual score is from the y bar (estimated value)

Total variance is calculated as Y - Ybar (Y = actual score, Ybar is original estimate)

  • Y actual = top*
  • Y1 (regression line) = middle*
  • Y original prediction (i.e. class average) = Ybar*
  • Y1 is an improvement on the original predictor*
  • This is explained variance (i.e. improvement)- this is the “regression component”, the bit you have explained*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between Y and Y1 in this example called?

A

The Residual (Error) of the model

How far off the model is from the actual number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If you take Y actual from Y1 (predicted) of every person, square them and then add them up, what do you get?

A

The Error Sum of Squares

Amount of variance we have not been able to explain with the variables

“Error variance”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the total sum of squares?

A

Tot SS = difference between actual scores and the mean value of Y.

For every Y person in the study, we square their results and then add the numbers up (DOUBLE CHECK NOT SURE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Regression Sum of Squares (RSS)?

A

The Explained Variance.

The difference between the original average estimate and the closer estimate after a regression has been done

Calculation: sum of (predicted value - roughest estimate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the calculation of R2 (R-squared)?

A

RSS/TSS

(TSS = total sum of squares)

(RSS = regression sum of squares)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the multiple coefficient of determination (R2)?

A

Coefficient of determination (R2) is the proportion of total variance explained by the model

  • When it is high, the line is really close to the actual points. The variables are capturing the variance (the rough estimate of the mean). It’s doing really well at explaining the variance. When it is low, you are not.*
  • Tells us how correlated the Y variable is related to the X variable.*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you know if (R2) is meaningful (significant)? (e.g. is 20% of the variance meaningful?)

A

F test

F Ratio of systematic explained variance to error variance

Between sum of squares and within sum of squares

If between is bigger, then you have a difference between your groups that is bigger than the difference within your groups and then it is significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a significant F ratio mean?

A

Your model is significant in explaining a meaningful amount of variance in the results (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is a small F ratio still likely to be significant?

A

When the population size is smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you interpret a regression equation?

Example: Y= 45.67 + 0.67Age

Always has an exam question on this

A

Example: Y= 45.67 + 0.67Age,

means that 1 unit changes in age lead to 1 unit (.67 unit) increases in Y.

Each increase in age, is associated with a .67 increase in Y

What does the coefficient (.67) mean?

Each unit increase in age is associated with a .67 increase in Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a Beta Value?

A

Standardised values that can be compared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What happens when you have a strong beta value?

A

The stronger they are, the stronger the relative importance of each predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you calculate a Beta Value (standardise the coefficients)

A

Get SD of dependent measure and divide by SD of particular predictor and multiple that by each coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the main issue with a regression coefficient?

A

It doesn’t tell you which variable is most important, there’s no effect size

The size of the coefficient will differ depending on the measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Comparing regression coefficients tells you nothing about the imporatance of the variable.

How do you understand its importance?

A

You have to do is standardise the coefficients

Get SD of dependent measure and divide by SD of particular predictor and multiply that by each coefficient.

This is a standardised beta.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are residuals in regression?

A

The variance you haven’t explained

The actual score minus the predicted score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The name for (R2) is…

A

Coefficient of determination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the two “principles” of multiple regression?

A

Explanation vs prediction

Prediction: Doesn’t care about theory, just what group you belong to

(e.g. gathering data online to predict behaviour, google)

More practical, which people are most likely to be e.g. problem gamblers

Explanation: Explaining what variable is the best predictor

E.g. Bronfenbrenner model. What is the variables that are most likely to impact child behaviour

Which of the levels of influence is MOST influential when I test them against each other, how much variance is attributed to each predictor variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Multicollinearity?

A

Means that many of your variables are correlated

It means when you explain variance, x1 x2 x3 are all related and so each individually correlates with Y but when you put them together they all eat up each others variance since they are related.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the function for a simple linear regression?

A

Y=mx + c

Where m is the slope

and c= the Y-intercept

X is the independent variable

Y is the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do we obtain a “line of best fit”?

A

method of least squares

involves the minimization of the squared deviations between the actual scores of Y vs. those predicted by the resultant regression equation (Y’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Regression involves calculating three sum of squares

What are they?

A

Total sum of squares, Error sum of squares, Regression sum of squares.

28
Q

What is the total sum of squares?

A

Total SS = difference between actual scores and the mean value of Y.

29
Q

What is the error sum of squares?

A

Error SS (unexplained) = difference between Y’ and the actual values of Y.

30
Q

What is the regression sum of squares?

A

Reg SS (explained) = difference between Y’ (predicted) and the mean of Y.

31
Q

In linear regression, what is the coefficient of determination?

What is the value equal to?

A

The R-squared value.

The value is equal to:

regression sum of squares (variance explained) / total sum of squared

32
Q

What is the F-value in a linear regression?

A

This is the ratio of the;

Regression Mean square/ Error mean square

33
Q

What is the formula for getting the F-value?

A

Regression mean square = RSS/ DFregg

ie. divide RSS by the degrees of freedom (of that component)
* RSS = regression sum of squares*
* DF = degrees of freedom*

34
Q

How do we obtain degrees of freedom for regression mean square?

A

DFregg = No. coefficients estimated (including the constant) – 1.​

35
Q

What is the formula for the error mean square (ESS) for obtaining the F-value?

F-value = RSS/ESS

A

Error mean square = ESS/ DFerror = Total observations (N)– No. coefficients (k) –1.

Error mean square divided by degrees of freedom for error sum of squares

36
Q

How do we obtain the degrees of freedom for the error sum of squares?

A

DFerror = Total observations - total coefficiants - 1

37
Q

Interpret the following linear regression example

Y= 45.67 + 0.67Age

important, always an exam question on this

A

For each unit of increase in age, Y increases by .67.

Y= 45.67 + 0.67Age

For every 1 unit change in age leads to 1 unit (.67 is 1 unit) increase in Y.

38
Q

When we are using a linear regression equation, do we use standardised or unstandardised values?

A

Unstandardised

We only standardise afterwards to obtain some measure of the relative importance of variables

39
Q

In linear regression, we standardise coefficients to understand the relative importance of each variable.

How do we standardise coefficients?

What is this called?

A

By multiplying them by the ratio of the standard deviation of the variable and the standard deviation of the dependent measure

This is called the Beta value.

ie. ratio = Sy / Sx

Sy = SD of dependent measure

Sx = SD of particular predictor

40
Q

Once we obtain beta values for the coefficients of a linear regression, what test do we use to determine the relative significant of a coefficient in comparison to others?

A

t-test

41
Q

What is the difference between a simple linear regression and a multiple regression

A

Simple linear regression: 1 predictor variable

Multiple linear regression: 2 or more predictor variables

42
Q

Exactly the same coefficients, statistical tests etc. apply to a multiple regression as a simple linear regression. The main difference lies in the selection of different methods for entering variables into the equation.

What are the two classifications to consider in multiple regression?

A

hierarchical vs. statistical

43
Q

What order to we add the variables in with hierarchical regression?

A

The variables to be entered in the order which is theoretically important.

44
Q

What order do we add the variables in with statistical regression?

A

Variables are entered in according to a specific statistical criterion

e.g., the one with the next highest correlation with the dependent measure.

45
Q

Which is considered more robust, hierarchical or statistical regression?

A

Hierarchical

You are controlling for what goes in and which order, based on a systematic or theory driven model.

46
Q

There are two types of adding variables to a multiple regression analysis.

What are those two types?

A

Standard vs. Stepwise

Standard: In what is called ‘Ordinary least squares regression’, all variables get entered in at
once.

Stepwise: In Stepwise procedures, including hierarchical or theory-driven entry procedures,
the variables go in Step by Step.

47
Q

What are the 3 types of stepwise regression?

A
  1. Forwards method: variables go in according to the highest first-order and then
    partial correlation.
  2. Backwards method: all variables go in, then the one with the lowest partial
    correlation gets moved, until there is no significant change in R-squared.
  3. Stepwise: combination of backwards and forwards.
48
Q

Stepwise is considered “dodgy”, why?

A

Requires a lot more power + atheoretical + open to wild goose chases, if a
correlation is a Type 1 error.

It’s influenced by type 1 errors.

  • Things can be significant by chance.*
  • This is model inconsistency or unreliability*
49
Q

Why is hierarchical regression not considered “dodgy”?

A

It is based on theory and not on chance.

No chance of the model being based on type 1 errors.

50
Q

What do squared semi-partial correlations tell us?

A

How much variation in the dependent measure that particular predictor uniquely explains.

51
Q

As squared semi-partial correlations tell us how much variation in the dependent measure that particular predictor uniquely explains, adding up all the squared part correlations will add up to R-squared (variance explained)

True or False?

A

False

There might be small amounts of variation explained by non-included variables (not in the equation).

A lot of the variance might be shared by 2 or more predictors.

52
Q

What are the two types of relationships that can be found in regression models?

A

Mediators

Moderators

53
Q

What is a mediated relationship?

A

Where a variable mediates the relationship between two variables.

e.g. B mediates the relationship between A and C. Variable B carries the relationship between A and C.

A only correlates with C because A gives rise to B, which in turn, gives rise to C.

54
Q

Is this example a mediated or a moderated relationship?

Example: No. of work hours (A) might correlate with decreases in work satisfaction (C).
However, this may only occur when increases in work leads to increases in Stress levels
(B), which is what decreases work satisfaction.

A

Mediated.

A and C are only correlated when B exists

55
Q

When can you test for mediation?

A

An analysis of mediation only makes sense if all 3 variables are correlated at least moderately.

If this weren’t the case, then the effect probably wasn’t there.

56
Q

The Baron and Kenny (1986) Method is a test of…

A

Mediation

57
Q

How would the Baron and Kenny (1986) method operate in this example?

A

We run 2 regressions

R1: Run a regression with the number of work hours as a predictor of Work satisfaction

R2: Run a regression with both variables in the equation.

Then, compare the beta coefficients for No. hours between R1 and R2.

Usually, beta value will be higher in R1, but if it has gone down, partial mediation has occurred. If it is fully reduced to 0, full mediation has occurred.

58
Q

What is the difference between a partial mediation and full mediation

A

When doing a mediation analysis (Baron and Kenny Method), if the second equation the beta coefficient has reduced in size it is partial mediation. If it has reduced to 0, it is full mediation.

59
Q

If there has been a partial mediation, how can we tell if the mediation is meaningful?

A

Sobel Test

This test can be used to test differences in the magnitude of beta coefficients between the first and second equations.

60
Q

When is using a Sobel Test appropriate?

A

When you have got a very large sample and where you can assume normality in the product term used to captual the indirect effect.

(i.e., product term of the coefficients corresponding to pathways between the predictor-mediator- outcome variable).

61
Q

What should you do if there is more the one mediator in your regression model?

A

A lot of people test these individually,

If you want to determine which one is the best mediator (based on the assumption that
the 2 are correlated), then it is better to run them all in the same model.

62
Q

What is a moderator in a regression model?

A

A moderator is a third variable which influences the nature of magnitude of the relationship between two other variables.

63
Q

How do we test for moderation in a regression model?

A

Ny testing for a significant A x B interaction.

  • We can obtain an interaction term simply by multiplying A and B’s scores together to give a product term.*
  • We then conduct a hierarchical analysis. On Step 1, we enter the main effects (B) and (A), and then the product term is entered on the Step 2.*
  • The idea is to show how much additional variation can be explained, or whether the interaction term, shows anything above and beyond what is already explainable in terms of main effects.*
64
Q

What is a significant interaction in a regression?

A

A significant interaction means that the relationship between two variables as expressed in the standardised slope coefficient (beta) is not consistent across the level of the other factor.

  • for example, if the coefficient for Age was 0.40 for Females (i.e., increases in being male increases memory), and –0.10 for Males,*
  • ie., increases in age for males slightly decreases predicted memory function. The main thing is that the two coefficients vary significantly.*
65
Q

What is the best way to analyse a regression interaction?

  • Explain how you would run a regression.*
  • e.g. Memory as predicted by age comparing males and females*
A

Break it down into simple linear effects.

We select Males only and then run a simple linear regression (Memory as
predicted) by age. This gives us an equation.

Then we do the same with Females only.

We thus have two equations Memory = Constant + Beta. Age.

By slotting in some made up values for age, e.g., 20, 25,30, up to 80, one can then get predicted memory scores for males and females separately. We then plot these two functions, and this gives us a clear depiction of the nature of the interaction.

66
Q

What are the 4 assumptions of linear regression?

A

1. Homoscedascity (The variance of residual is the same for any value of X)

2. Normality (For any fixed value of X, Y is normally distributed)

3. Linearity (The relationship between X and the mean of Y is linear)

4. Independence or Non-serial error dependence (Observations are independent of each other)