Topic 5 - Linear Model Flashcards

1
Q

LO

A

LO5 Model and explain the relationship between two variables using linear regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Key steps in linear regression

A
  1. Produce a scatterplot
  2. Calculate correlation coeficient
  3. Produce a regression line
  4. Produce a residual plot
  5. Check assumptions fit
  6. Perform predictions with the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Step 1: Produce a scatterplot

A

Pair of variabless (x = IV, y = DV)
- Scatterplot allows us to do an IDA and get an initial impression if a linear model is appropriate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Step 2: Calculate correlation coefficient

A

Linear correlation
- How tightly the ‘cloud’ of values cluster around a line through the middle
- Tight cluster = strong correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Correlation Coefficient

A
  • ‘r’ is a numerical summary which measures the clustering of points around a line
  • It indicated both the sign and strength of the linear association
  • Between -1 and 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Population correlation coefficient

A

‘rpop’ is the mean of the product of the variables in standard units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Population Vs Sample

A

rpop = whole population
rsample = sample of population

  • Both formulas give the same result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Properties of the correlation coefficient

A

Value:
- when r = +/- 1, all point lie on the regression line

Symmetry:
- Correlation coefficient is NOT affected by interchanging variables, (swapping x & y aves = same r value)

Scaling/ Shifting:
- ‘r’ will always stay the same if variables shifted or multiplied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Step 3: Produce a regression line

A

Uses the 5 summaries (x̄, ȳ, SDx, SDy, r)

Regression line connects (x̄, ȳ) to
(x̄ + SDx, ȳ +SDy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Step 4: Produce a residual plot

A

Residual:
- Is the vertical distance/ gap of a point above & below the regression line
- Represents the error between the actual values and the prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Residual plot

A
  • Graphs the residuals Vs. ‘x’

If a linear regression is appropriate, then:
- The residual plot should show no pattern
- Should be random about a horizontal line at zero
- SHould have constant variance within vertical strips along the x axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Step 5: Check assumptions

A

2 main diagnostic checks:
- Does scatterplot look linear
- Does residual plot look random/ have homoscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Step 6: Perform Predictions

A

Only when satisfied with step 5, we can make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Most common mistakes in regression

A
  1. Interpret ‘r’ as a percentage
  2. Comparing 2 values of ‘r’ as percentages
  3. Underestimate effects of outliers on ‘r’
  4. Assuming that strong correlation means good fit for the regression line
  5. Assume that 3 datasets with similar r values will be similar to eachother
  6. Inflating the linear association by grouping data
  7. Mistaking causation for association
  8. Rearranging rather than refitting
  9. Extrapolating withoug justification
  10. Forgetting to check the scatterplot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Prediction error (RMS Error)

A

RMS error:
- Represents the average gap between the points and the regression line

squareroot (1-r^2) x SDy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly