Simple Regression & Multiple Regression (W9) ✅ Flashcards

Question 1

Q

State the main difference between bivariate linear correlation and regression?

Answer

A

Similarity: both are used when the relationship between x and y can be described with a straight line

Differences:
1. Correlation ONLY determines the strength of relationship between x and y

Regression:
- allows us to estimate how much y will change as a result of a given change in x
- regression also distinguish between the variable being predicted & variable used to predict (NOT manipulated)

x = predictor/independent/ explanatory variable
y = outcome/dependent/ criterion variable

=> HOWEVER! still not provide direct evidence of causality (NOT x causes y)

Question 2

Q

What are the 3 stages of regression?

Answer

A

Analyse the relationship between variables (find strength and direction of the relationship)
Proposed a model to explain that relationship
-> regression line = line of best fit)
Evaluate the model: assess goodness of fit
-> does our regression model better at predicting y than the simplest model (assume no relationship between x & y AND only show mean of all y-values)

Question 3

Q

What are the two properties of regression line?

Answer

A

a = the intercept
-> value of y when x is 0 (starting point)

b = the slope
-> how much y changes from x increasing by 1 unit

Formula to calculate y-value based on x-value:
y = bx + a
(when x and y are negatively correlated, b is negative)

Question 4

Q

How to calculate goodness of fit and assess it?

Answer

A

Calculating goodness of fit:

Calculate total variance (SST): the difference between the observed values of y and the mean of y (where b = 0)
-> variance not explained by the simplest model
Calculate SSR : the difference between the observed values of y and those predicted by the
regression line.
-> variance not explained by the regression model
Calculate SSM: reflects the improvement in prediction using the regression model compared to the simplest model
-> SST - SSR = SSM
=> The larger the SSM, the bigger the improvement

Assessing the goodness of fit -> using F-test
-> take the df into account
-> rather than using Sum of Squares (SS) values, use Mean Squares (MS) values
F = MSM / MSR

Question 5

Q

How to interpret the goodness of fit?

Answer

A

If the regression model is good at predicting y:
-> the improvement in prediction due to the model (MSM) will be large
-> level of inaccuracy of the model (MSR) will be small
=> F value further from 0
Assess probability: assume Null Hypothesis is true “the regression model and the simplest model are equal in terms of predicting y”
-> MSM = 0
Significant result (p < 0.05) suggests that the regression model provides a better fit
for the data than the simplest model

Question 6

Q

What is the assumption of linear regression? (as compared to linear correlation)

Answer

A

Linearity: x and y must be linearly related
Absence of outliers
-> regression extremely sensitive to outliers
-> may be appropriate to remove it
Normality, linearity and homoscedasticity, independence of residuals

Normality: residuals is normally distributed around the expected outcome
Linearity: residuals and outcome in a straight line relationship
Homoscedasticity: all variances of residual should be the same about the outcome

=> No non-parametric equivalent

Question 7

Q

What are the assumptions for: (1) Normal P-P plot and (2) Scatterplot of Regression Standardized Residual?

Answer

A

Ideally, data points will lie in a reasonably straight diagonal line, from bottom left to top right
-> no major deviations from normality
Ideally, residuals will be
roughly rectangularly distributed, with most scores concentrated in
the centre (0)
-> Don’t want to see systematic pattern to residuals (curvilinear, or higher on one side)
-> Outliers: standardised
residuals > 3.3 or < -3.3

Question 8

Q

What are the relationships between R, R^2 & Adjusted R^2 (in Model Summary Table)? Are R^2 and r^2 the same?

Answer

A

R (√R^2): strength of relationship between x and y
-> sign is not given

R^2: proportion of variance in y explained by model (SSM), relative to total variance in y (SST)
-> R^2 = SSM/SST

Are R^2 and r^2 the same?
-> If only one predictor then yes

Question 9

Q

Explain multiple regression and its assumptions as opposed to linear simple regression?

Answer

A

Multiple regression allows us to assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y), even when predictor variables are:
-> combined
-> considered separately
-> y = b1x1 + b2x2 + … + a
—-
Assumptions:

Linearity
Absence of outliers
Multicollinearity: ideally, predictors should be correlated to outcome variable (y), NOT with one another
-> chance of measuring the same thing if r = .9
Normality, linearity and homoscedasticity, independence of residuals
Sufficient sample size
-> results might be over-optimistic (not generalisable) if too few Ps

=> the assumptions for P-P plot and Scatterplot are the same with simple regression

Simple Regression & Multiple Regression (W9) ✅ Flashcards

(9 cards)