W2: Simple Linear Regression Flashcards

1
Q

Regression

A

Deriving an equation for predicting one variable from another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression model needs to:

A

How well can we predict Y, given a value of X?

How much variance in Y can we explain?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of regression line

A

Simple linear regression: a single independent variable = the one IV

Multiple linear regression: multiple independent variables = all the IVs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Total sums of squares (TSS)’s aim

A

Aim to explain or predict variability in y

Does not have anything to do with X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to know if the regression line is better predicting Y?

A

The closer the actual scores are to the predicted scores, the better the model predicts Y, the less variability around the line there is (better for generalisation)

The further away from the line (predicted scores) the actual scores are, the worse the model predicts Y, the more variability around the line there is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is residual?

A

Residual is the difference between predicted Y and actual Y for any given value of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Residual SS vs Regression SS

A

Residual sums of square → small value

Regression sums of squares → big values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Where abouts are the negative and positive residual?

A

(+) residual: above the regression line
(-) residual: below the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why does the residual sum always equal to 0? And what would happen to the sum if there is an outlier?

A

Residual sum would always be 0 - because there are half that are negative and half that are positive (regression line sits in the middle of data points)

Even with an outlier, the regression sum would be 0 as the regression line would skew to compensate and adjust

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Conditional and marginal distribution

A

Marginal distribution of Y: spread or variance of scores around the mean
* Wider and more spread out → more variability compared to the conditional distribution

Conditional distribution of Y: spread of variance of scores around the regression line, for any given value of X
* Narrower and tighter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

R^2: coefficient of determination

A

Is split into regression and error

If variability is explained → regression

If viability not explained → residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

R^2 effect:

A

R^2 ranges from 0-1 (often expressed as a percentage)
Closer to 0 (or 0%: the weaker the effect, the less variance the model explains
Closer to 1 (or 100%): the stronger the effect, the more variance that the model explains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Different ways to find correlation

A

pwcorr or The square root of R^2→ the correlation between the predicted values of Y and the actual values of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the effects tested in regression? What statistical signifance is used to test it?

A

Effects that are teste:
* Model-as-a-whole
* Individual variable or predictor effects

Statistical significance:
* Model as a whole: F ratio and p-value
* Individual prediction: t-statistics and p-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is mean square (MS) calculated?

A

sums of squares divided by degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

F Ratio (Model as a whole) H0 and H1

A

H0 (null hypothesis): the model is no better than the intercept only model (aka the null model)
H0: all regression coefficients = 0

H1 (alternative hypothesis): the regression model significantly better than the null model
H1: at least on regression coefficient is not equal to 0

17
Q

T statistics ( Individual variable) H0 and H1

A

H0: b=0
H1: b≠0

18
Q

Statistical signifiance of p-value

A

p<.05

19
Q

Model as a whole R^2 effect size

A

R-squared
* 2% - 12%: small effect
* 13% - 25%: medium effect
* 26% and above: large effect

20
Q

Individual variable statistacal significant via beta coefficient

A

Beta coefficient: unstandardised, no standard cut offs
Need to understand the variable’s scales to know whether it is big or small effect
Can tell whether of not significance by p-value
Beta coefficient: standardised
Interpret similarly to correlation coefficient

21
Q

Why is there a need for standardised regression coefficient?

A

Because unstandardised effect size are useful if the ‘natural’ scale is known
* Without a natural scale, direct comparison cannot be made as variables may be on different scales
* One-point increase can be a small small change or a big change (e.g. one point on a 1-7 scale is bigger than one point on a 0-100 scale)