Simple Regression & Multiple Regression (W9) ✅ Flashcards

1
Q

State the main difference between bivariate linear correlation and regression?

A

Similarity: both are used when the relationship between x and y can be described with a straight line

Differences:
1. Correlation ONLY determines the strength of relationship between x and y

  1. Regression:
    - allows us to estimate how much y will change as a result of a given change in x
    - regression also distinguish between the variable being predicted & variable used to predict (NOT manipulated)

x = predictor/independent/ explanatory variable
y = outcome/dependent/ criterion variable

=> HOWEVER! still not provide direct evidence of causality (NOT x causes y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 stages of regression?

A
  1. Analyse the relationship between variables (find strength and direction of the relationship)
  2. Proposed a model to explain that relationship
    -> regression line = line of best fit)
  3. Evaluate the model: assess goodness of fit
    -> does our regression model better at predicting y than the simplest model (assume no relationship between x & y AND only show mean of all y-values)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two properties of regression line?

A

a = the intercept
-> value of y when x is 0 (starting point)

b = the slope
-> how much y changes from x increasing by 1 unit

Formula to calculate y-value based on x-value:
y = bx + a
(when x and y are negatively correlated, b is negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to calculate goodness of fit and assess it?

A

Calculating goodness of fit:

  1. Calculate total variance (SST): the difference between the observed values of y and the mean of y (where b = 0)
    -> variance not explained by the simplest model
  2. Calculate SSR : the difference between the observed values of y and those predicted by the
    regression line.
    -> variance not explained by the regression model
  3. Calculate SSM: reflects the improvement in prediction using the regression model compared to the simplest model
    -> SST - SSR = SSM
    => The larger the SSM, the bigger the improvement

Assessing the goodness of fit -> using F-test
-> take the df into account
-> rather than using Sum of Squares (SS) values, use Mean Squares (MS) values
F = MSM / MSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to interpret the goodness of fit?

A
  • If the regression model is good at predicting y:
    -> the improvement in prediction due to the model (MSM) will be large
    -> level of inaccuracy of the model (MSR) will be small
    => F value further from 0
  • Assess probability: assume Null Hypothesis is true “the regression model and the simplest model are equal in terms of predicting y”
    -> MSM = 0
  • Significant result (p < 0.05) suggests that the regression model provides a better fit
    for the data than the simplest model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the assumption of linear regression? (as compared to linear correlation)

A
  1. Linearity: x and y must be linearly related
  2. Absence of outliers
    -> regression extremely sensitive to outliers
    -> may be appropriate to remove it
  3. Normality, linearity and homoscedasticity, independence of residuals
  • Normality: residuals is normally distributed around the expected outcome
  • Linearity: residuals and outcome in a straight line relationship
  • Homoscedasticity: all variances of residual should be the same about the outcome

=> No non-parametric equivalent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the assumptions for: (1) Normal P-P plot and (2) Scatterplot of Regression Standardized Residual?

A
  1. Ideally, data points will lie in a reasonably straight diagonal line, from bottom left to top right
    -> no major deviations from normality
  2. Ideally, residuals will be
    roughly rectangularly distributed, with most scores concentrated in
    the centre (0)
    -> Don’t want to see systematic pattern to residuals (curvilinear, or higher on one side)
    -> Outliers: standardised
    residuals > 3.3 or < -3.3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the relationships between R, R^2 & Adjusted R^2 (in Model Summary Table)? Are R^2 and r^2 the same?

A

R (√R^2): strength of relationship between x and y
-> sign is not given

R^2: proportion of variance in y explained by model (SSM), relative to total variance in y (SST)
-> R^2 = SSM/SST

Are R^2 and r^2 the same?
-> If only one predictor then yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain multiple regression and its assumptions as opposed to linear simple regression?

A
  • Multiple regression allows us to assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y), even when predictor variables are:
    -> combined
    -> considered separately
    -> y = b1x1 + b2x2 + … + a
    —-
    Assumptions:
  1. Linearity
  2. Absence of outliers
  3. Multicollinearity: ideally, predictors should be correlated to outcome variable (y), NOT with one another
    -> chance of measuring the same thing if r = .9
  4. Normality, linearity and homoscedasticity, independence of residuals
  5. Sufficient sample size
    -> results might be over-optimistic (not generalisable) if too few Ps

=> the assumptions for P-P plot and Scatterplot are the same with simple regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly