W2: Simple Linear Regression Flashcards

Question 1

Q

Regression

Answer

A

Deriving an equation for predicting one variable from another

Question 2

Q

Regression model needs to:

Answer

A

How well can we predict Y, given a value of X?

How much variance in Y can we explain?

Question 3

Q

Types of regression line

Answer

A

Simple linear regression: a single independent variable = the one IV

Multiple linear regression: multiple independent variables = all the IVs

Question 4

Q

Total sums of squares (TSS)’s aim

Answer

A

Aim to explain or predict variability in y

Does not have anything to do with X

Question 5

Q

How to know if the regression line is better predicting Y?

Answer

A

The closer the actual scores are to the predicted scores, the better the model predicts Y, the less variability around the line there is (better for generalisation)

The further away from the line (predicted scores) the actual scores are, the worse the model predicts Y, the more variability around the line there is

Question 6

Q

What is residual?

Answer

A

Residual is the difference between predicted Y and actual Y for any given value of X

Question 7

Q

Residual SS vs Regression SS

Answer

A

Residual sums of square → small value

Regression sums of squares → big values

Question 8

Q

Where abouts are the negative and positive residual?

Answer

A

(+) residual: above the regression line
(-) residual: below the regression line

Question 9

Q

Why does the residual sum always equal to 0? And what would happen to the sum if there is an outlier?

Answer

A

Residual sum would always be 0 - because there are half that are negative and half that are positive (regression line sits in the middle of data points)

Even with an outlier, the regression sum would be 0 as the regression line would skew to compensate and adjust

Question 10

Q

Conditional and marginal distribution

Answer

A

Marginal distribution of Y: spread or variance of scores around the mean
* Wider and more spread out → more variability compared to the conditional distribution

Conditional distribution of Y: spread of variance of scores around the regression line, for any given value of X
* Narrower and tighter

Question 11

Q

R^2: coefficient of determination

Answer

A

Is split into regression and error

If variability is explained → regression

If viability not explained → residual

Question 12

Q

R^2 effect:

Answer

A

R^2 ranges from 0-1 (often expressed as a percentage)
Closer to 0 (or 0%: the weaker the effect, the less variance the model explains
Closer to 1 (or 100%): the stronger the effect, the more variance that the model explains

Question 13

Q

Different ways to find correlation

Answer

A

pwcorr or The square root of R^2→ the correlation between the predicted values of Y and the actual values of Y

Question 14

Q

What are the effects tested in regression? What statistical signifance is used to test it?

Answer

A

Effects that are teste:
* Model-as-a-whole
* Individual variable or predictor effects

Statistical significance:
* Model as a whole: F ratio and p-value
* Individual prediction: t-statistics and p-value

Question 15

Q

How is mean square (MS) calculated?

Answer

A

sums of squares divided by degrees of freedom

Question 16

Q

F Ratio (Model as a whole) H0 and H1

Answer

Study These Flashcards

A

H0 (null hypothesis): the model is no better than the intercept only model (aka the null model)
H0: all regression coefficients = 0

H1 (alternative hypothesis): the regression model significantly better than the null model
H1: at least on regression coefficient is not equal to 0

Question 17

Q

T statistics ( Individual variable) H0 and H1

Answer

Study These Flashcards

A

H0: b=0
H1: b≠0

Question 18

Q

Statistical signifiance of p-value

Answer

Study These Flashcards

A

p<.05

Question 19

Q

Model as a whole R^2 effect size

Answer

Study These Flashcards

A

R-squared
* 2% - 12%: small effect
* 13% - 25%: medium effect
* 26% and above: large effect

Question 20

Q

Individual variable statistacal significant via beta coefficient

Answer

Study These Flashcards

A

Beta coefficient: unstandardised, no standard cut offs
Need to understand the variable’s scales to know whether it is big or small effect
Can tell whether of not significance by p-value
Beta coefficient: standardised
Interpret similarly to correlation coefficient

Question 21

Q

Why is there a need for standardised regression coefficient?

Answer

Study These Flashcards

A

Because unstandardised effect size are useful if the ‘natural’ scale is known
* Without a natural scale, direct comparison cannot be made as variables may be on different scales
* One-point increase can be a small small change or a big change (e.g. one point on a 1-7 scale is bigger than one point on a 0-100 scale)

W2: Simple Linear Regression Flashcards

(21 cards)