Topic 5 - Linear Model Flashcards

Question 1

Q

LO

Answer

A

LO5 Model and explain the relationship between two variables using linear regression.

Question 2

Q

Key steps in linear regression

Answer

A

Produce a scatterplot
Calculate correlation coeficient
Produce a regression line
Produce a residual plot
Check assumptions fit
Perform predictions with the data

Question 3

Q

Step 1: Produce a scatterplot

Answer

A

Pair of variabless (x = IV, y = DV)
- Scatterplot allows us to do an IDA and get an initial impression if a linear model is appropriate

Question 4

Q

Step 2: Calculate correlation coefficient

Answer

A

Linear correlation
- How tightly the ‘cloud’ of values cluster around a line through the middle
- Tight cluster = strong correlation

Question 5

Q

Correlation Coefficient

Answer

A

‘r’ is a numerical summary which measures the clustering of points around a line
It indicated both the sign and strength of the linear association
Between -1 and 1

Question 6

Q

Population correlation coefficient

Answer

A

‘rpop’ is the mean of the product of the variables in standard units

Question 7

Q

Population Vs Sample

Answer

A

rpop = whole population
rsample = sample of population

Both formulas give the same result

Question 8

Q

Properties of the correlation coefficient

Answer

A

Value:
- when r = +/- 1, all point lie on the regression line

Symmetry:
- Correlation coefficient is NOT affected by interchanging variables, (swapping x & y aves = same r value)

Scaling/ Shifting:
- ‘r’ will always stay the same if variables shifted or multiplied

Question 9

Q

Step 3: Produce a regression line

Answer

A

Uses the 5 summaries (x̄, ȳ, SDx, SDy, r)

Regression line connects (x̄, ȳ) to
(x̄ + SDx, ȳ +SDy)

Question 10

Q

Step 4: Produce a residual plot

Answer

A

Residual:
- Is the vertical distance/ gap of a point above & below the regression line
- Represents the error between the actual values and the prediction

Question 11

Q

Residual plot

Answer

A

Graphs the residuals Vs. ‘x’

If a linear regression is appropriate, then:
- The residual plot should show no pattern
- Should be random about a horizontal line at zero
- SHould have constant variance within vertical strips along the x axis

Question 12

Q

Step 5: Check assumptions

Answer

A

2 main diagnostic checks:
- Does scatterplot look linear
- Does residual plot look random/ have homoscedasticity

Question 13

Q

Step 6: Perform Predictions

Answer

A

Only when satisfied with step 5, we can make predictions

Question 14

Q

Most common mistakes in regression

Answer

A

Interpret ‘r’ as a percentage
Comparing 2 values of ‘r’ as percentages
Underestimate effects of outliers on ‘r’
Assuming that strong correlation means good fit for the regression line
Assume that 3 datasets with similar r values will be similar to eachother
Inflating the linear association by grouping data
Mistaking causation for association
Rearranging rather than refitting
Extrapolating withoug justification
Forgetting to check the scatterplot

Question 15

Q

Prediction error (RMS Error)

Answer

A

RMS error:
- Represents the average gap between the points and the regression line

squareroot (1-r^2) x SDy

Topic 5 - Linear Model Flashcards

(15 cards)