Simple linear regression Flashcards

1
Q

What is the equation for simple linear regression?

A

Y = b0 + b1X - e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is b0?

A

The intercept
- the point at which the regression line crosses the Y axis
- The value of Yi when X = 0
(labelled as the constant in SPSS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is b1?

A

The slope/gradient
- a measure of how much Y changes as X changes
- regardless of sign (pos/neg), the larger the value of b1, the steeper the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is e?

A

Residual/prediction error
- difference between observed value of outcome variable and what the model predicts (e=Yobs - Ypred)
- represents how wrong we are in making the prediction for the particular case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the equation for Ypred?

A

Ypred = b0 + b1X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a regression line?

A

Line of best fit - line that best represents the data and minimises residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a prediction?

A

Best guess at Y given X
X doesn’t have to cause Y or come before Y in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What values show how well the model fits the observed data? (goodness of fit)

A

R2
F-ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the model refer to?

A

The regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What values show how the variables relate to each other?

A

The Intercept
Beta values (slope)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is residual sum of squares? (SSR)

A

Square residuals and then add them up - a gauge of how well the model (line) fits the data: The smaller SSR, the better the fit
- can also be error variance - hoe much error there is in the model
(Residual/error variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the equation for total sum of squares (SST)?

A

SSTotal = SSModel + SSResidual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the model sum of squares (SSM)?

A

Sum of squared differences between Ypred and sample mean - represents improvement from baseline model to regression model
(Model variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In any regression model, what is the overall variation of the outcome variable (Y) due to?

A
  1. Model/regression - how much variance in the observed Y the predicted values explain. This variance would be measured by the deviations of the predicted values from the sample mean, Y̅.
  2. Error/residual - how much variance is left over in observed Y after we accounted for the predicted values - measured by deviations of observed values from predicted values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Total sum of squares (SST)?

A

Total variance in outcome variable - partitioned into model variance and residual/error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the equation for R2?

A

R2 = SSM/SST
Variance in outcome explained by model / total variance in outcome variable to be explained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is R2?

A
  • provides proportion of variance accounted for by model
  • Value ranges between 0-1 (the higher the value, the better the model)
  • interpreted as a percentage eg. R2=.69 - x 100 - 69% of variance in outcome variable is explained by the model
18
Q

What is the equation for the F ratio?

A

F = MSM / MSR
Model mean squares / residual or error mean squares

19
Q

What is the equation for model mean squares (MSM)?

A

MSM = SSM / dfM

20
Q

What is the equation for residual/error mean squares?

A

MSR = SSR / dfR

21
Q

What is the F ratio?

A

The ratio of explained variance to unexplained variance (error) in the model
- MSM should be larger than MSR (F-statistic greater than 1)
- also called ANOVA - comparing ratio of systematic variance to unsystematic variance

22
Q

What is dfM?

A

K
Number of predictors

23
Q

What is the equation for dfR?

A

dfR = N-k-1 (N minus number of coefficients)

24
Q

What are the 2 ways the hypothesis (overall test) in regression can be phrased?

A

Can the scores on Y be predicted based on the scores on X and the regression line?
- Null hyp: Predicted values of Y are the same regardless of the value of X (or simply, there is no relationship between Y and X).
Does the model (Ypred) explain significant amount of variance in outcome variable (Yobs)?
- Null hyp: Populaion R2=0
- Ratio of model variance to error variance tested using F-test (ANOVA)
OR:
H1: The regression line is a significantly better model than the flat model
H0: The flat model

25
Q

What do the coefficients refer to?

A

The characteristics of the regression line:
- Beta values: the slope of the regression line
- The intercept

26
Q

What is the unstandardised beta?

A

The value of the slope (b1)
- for every one unit change in x, the change in the value of y
- in units of measurement
If b1 is 0, there is no relationship between x and y (flat line - as predictor variable changes, predicted value of outcome is constant and does not change)
- If variable significantly predicts outcome, b value should be different from 0 - tested using a t-test (H0: b = 0) - if test is significant, interpret as supporting that predictor variable contributes significantly to ability to estimate values of outcome.

27
Q

What is the standardised beta?

A
  • a measure of the slope
    The standardised change in y for one standard deviation change in x
  • As x increases by one standard deviation, y changes by b1 of a standard deviation
28
Q

In simple regression (1 predictor) what is b1 equal to?

A

b1 = r(xy)

29
Q

When should you use unstandardised b?

A
  • when you want coefficients to refer to meaningful units
  • when you want a regression equation to predict values of Y
30
Q

When should you use standardised β?

A

(independent of units)
- when you want an effect size measure eg. small/med/large β is equivalent to small/med/large r (.1/.3/.5)
- when you want to compare the strength of a relationship between predictor and outcome

31
Q

What is covariance?

A

The extent to which variables co-vary (change together)
High covariance means there is a large overlap between patterns of change (variance) observed in each variable

32
Q

What should you do before running a regression analysis?

A
  • Detect bias from unusual cases (outliers)
  • Check assumptions of linear regression
33
Q

What are outliers in linear regression?

A

An observation with a large residual (the differences between observed and predicted values - e = Yobs - Ypred)
- may distort results by pulling regression line away from line of best fit for most people
- has potential to be an outlier is standardised score (or Z score) on 1+ predictors/standardised score is in excess of +/-3.29

34
Q

Why are outliers an issue in regression?

A

They influence the model’s ability to predict all cases

35
Q

What does how influential an outlier is depend on?

A

Distance between Yobs and Ypred (residual) - the larger the distance, the weaker the prediction
Leverage (unusual value on predictor) - large leverage can either weaken or strengthen prediction depending where they lie related to the trend - on trend = strengthens results.
large leverage + large distance -> negative impact of pulling or tilting regression line away from LOBF

36
Q

What is the minimum value that makes standardised residuals or predictors potential influential outliers?

A

+/-3.29 (p<.001)

37
Q

How can outliers be dealt with?

A
  • check data were entered and coded correctly - can justifiably remove outliers that are due to errors in data entry/ppt procedure following (eg. reaction times that are impossibly short or long)
    Outliers CAN represent genuine data - for every 100ppts, expect 1 score beyond +/-3SD
38
Q

What are the 4 assumptions of linear regression?

A
  1. Linearity
  2. Independence
  3. Normality of residuals
  4. Homogeneity of variance (homoscedasticity)
39
Q

What is linearity?

A

The outcome (continuous variable) is linearly related to predictors

40
Q

What is the independence assumption?

A

Observations are randomly and independently chosen from population - residuals are not related to each other.
Residuals not independent in cases such as:
- repeated obs on same ppt
- obs from related ppts (twins, students in same class)
If this assumption is violates, model standard errors (SEs) will be invalid, as will confidence intervals (CIs) and sig tests based on them.
- ensure INDEPENDENT sampling in design

41
Q

What is the normality of residuals assumption?

A

Residuals (not IVs or DVs) should be normally distributed
- check using histogram and normal probability plot
want observed and expected frequencies to be very similar - 45 degree straight line.
- in small samples, a lack of normality invalidated confidence intervals and significance tests BUT in large samples, it will not due to central limit theorem.

42
Q

What is the homoscedasticity assumption (homogeneity of variance)?

A

The variability of residuals should be the same for all values of Ypred.

Violating these assumptions invalidates confidence intervals and significance tests

Check using residual scatterplot - should be NO funnelling of residuals.