Linear & Multiple Regression Flashcards

1
Q

Who is touted as the ‘Father of Behavioural Statistics’?

A

Francis Galton

Inspiringly, he is also known for not being very great at traditional statistics and had a severe breakdown when studying it at Cambridge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe:

The relationship between regression models and variance.

A

Regression models involve predictor variables that account for at least some of the variance seen in the outcome variable.

Variance can be thought of the extent to which values differ from the mean.

Regressions are a predictive tool for this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When trying to identify a question suited for a regression model, what key word should you look out for?

(In most cases)

A

Predict(s)

(e.g. “what predicts romantic interest on a date?”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give the general equation for a regression model.

A

y’ = a + bx + e

y’ : outcome variable (what you are predicting).

a : the intercept (the mean value of y’ when all predictors are zero).

bx : predictor variable(s) (variables intended to explain variance seen in y’).

e : error (accounting for the fact no model is perfect).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why, in a regression model, is the intercept the value of the mean of the outcome variable?

A

When studying normally distributed populations, assuming a value will be close or equal to the mean should be correct more often due to chance alone.

The predictor variables should then ideally improve the accuracy of the prediction and account for variance in the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List:

The FOUR assumptions of a regression model.

A
  1. Independence of measurements.
  2. Normally distribution of variables.
  3. Linearity of predictor-outcome relationship.
  4. Homoscedasticity of residuals.

Homoscedasticity refers to whether or not the residuals (error of the model) are random or not. There shouldn’t be a ‘systematic error pattern’.

Note this is what was tested in PSYC232, but some sources seem to contradict it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain:

Residual plots should display random variance.

And how does this relate to regressions?

A

If a residual plot (indicator of error) showed a non-random pattern, this would mean the predictor variables are NOT sufficiently predicting or explaining the variance in the outcome variable.

It likely means there are stronger predictors not being considered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of the model coefficient box in a (multiple) regression output?

A

It provides important information on which exact predictor variables improve accuracy in predicting the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of the model fit box in a (multiple) regression output.

A

It gives holistic information on how well all of the variables in the model predict the outcome.

This is particularly denoted by ‘R’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does R2 represent in a (multiple) regression output?

A

The amount of variance in the outcome explained by the regression model.

This is an overall assessment, and does not tell you which variables contribute greater accuracy to others.

Having around or over 50% explained is considered impressive when studying human behaviour!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the adjusted R2 differ from R2 in a multiple regression output?

A

It accounts for the number of predictors used in the regression model.

Having a larger number of predictors may lead to an increased chance of accidentally being more accurate, and so the adjusted R2 reduces its value based on these factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When would the adjusted R2 value be the exact same as the R2 value?

(In a regression model output)

A

When there is only one predictor variable used in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the general equation for calculating error in a (multiple) regression output?

A

1 - R2

In words, this means it accounts for the remaining variance not explained by the regression model.

Around 10 - 30% error is considered relatively normal for behavioural research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the format for presenting the overall model test from a regression output?

A

F (df1 , df2) = F , p-value

Note, the second F refers to the test statistic from the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does having a statistically significant regression model imply?

A

That, on average, the model is more accurate at predicitng the true value of a measurement compared to simply guessing the mean of the outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the estimate (B) of a model coefficient box in a multiple regression output represent?

A

The amount of change in the value of the outcome associated with one unit of change in the predictor variable.

Note that this is the unstandardised estimate.

It may also be thought of as the effect size.

17
Q

What is the purpose of a standardised estimate (β) of the model coefficient box in a multiple regression output represent?

A

It allows us to compare predictors’ effect sizes when they use different scales.

18
Q

What is ONE major benefit of using a multiple regression over just a linear regression?

A

Multiple regressions allow you to control for other predictor variables when analysing the change in the outcome associated with the focus predictor variable.

(e.g. analysing perceived attractiveness, controlling for gender - as the specific study found women were less attracted on average than the men were).

19
Q

True or False:

You can include categorical variables in a regression.

(e.g. gender).

A

True

However, you can only have two groups and you have to numerically label each category so there is a one unit difference between them (i.e. 0 and 1, or 1 and 2, etc.).

20
Q

Define:

Regression toward the Mean

A

A statistical phenomenon brought about by the fact extreme values tend to be followed by values that are closer to the mean.

This occurs when measuring something more than once, and can give the false impression of a trend occurring.

It makes sense, as an extreme value is typically less common, and so is unlikely to be followed by yet another extreme value.

An example of this may be seen with the so-called ‘2nd Album Syndrome’, which leads people to believe doing really well with your first album ‘causes’ you to do worse in your next. This is not the case, it is just unlikely to have another extreme outlier in ratings.

Many things in life tend to stabilise over time.

21
Q

What factors may lead to an increased pattern of regression toward the mean?

A
  • The more extreme a sample/measurement is compared to the mean, the more noticeable the regression toward the mean.
  • The less correlated variables are too.