Linear Model Flashcards
The scatter plot can be summarised by the following five numerical summaries…
- sample mean and sample SD of X
- sample mean and sample SD of Y
- correlation coefficient (r)
correlation coefficient (r)
a numerical summary which measures how points are spread around the line.
- It indicates both the sign and strength of the linear association.
- The r is between -1 and 1.
- If r is +ve: slopes up.
- If r is -ve: slopes down
The Correlation coefficient (r) is the mean of the product of the variables in standard units
True
Properties of the Correlation Coefficient
Symmetry - The correlation coefficient is not affected by interchanging the variables.
Scaling - The correlation coefficient is shift and scale invariant.
Outliers have no influence on ‘r’
False
Nonlinear association can be detected by the correlation coefficient
False
correlation coefficient in R
cor()
linear regression
lm(y~x)
how to put regression line on a plot
abline( lm(y~x), col=”…”)
Prediction error (residual)
vertical distance (or ‘gap’) of a point above or below the regression line (difference)
Residual plot
graphs the residuals vs x.
* If the linear fit is appropriate for the data, it should show no pattern (random points around 0).
* check appropriatness of linear model.
is extrapolating reliable?
no, it is a prediction error.
before predicting using a linear model you should…
check the scatter and residual plot
If the vertical strips on the scatter plot show equal spread in the y direction…
then the data is homoscedastic, otherwise the data is heteroscedastic.
homoscedastic
an assumption of equal or similar variances in different groups being compared