Chapter 3 Regression Flashcards
What is a regression line !
The best line of best fit you can possibly draw with data supplied
Why is line called least squares of residuals
Residual is the distance between the line and actual observed point, such that it’s point - line (below the line negative)
Sim of all residuals =0, hence they square them.
Line is positioned in such a way to minimise the total distance of the total squares residuals
Hence least squares
When do you use y on x x on y
When you have a X value and want to predict y, use y on x
If y to predict x , x on y
Now remember these aren’t the SAME LINE, will be similar but not case of reflecting and it’s calm or anything
= they won’t give the same lien of best fit
Looking at the derivation what do all reversion lines go through
The mean of data , x bar y bar
Therefore both lines intersect
When can we use y on x and x on y in certain situations s
When random non random, means we tryna predict y values from x
This y on x ONLY
But if they both random, such as weight and height can do both
Remember when calculating , it’s not sim x x y it’s also sum xy
Don’t lack
Similarly it’s not sum of y all squared, there is also sum of y ^2 DONT LACK
Interpolation be extrapolation
Why is extrapolation unreliable
Not possible to confirm the trend beyond the data points, only can interpolate to get a reliable estimate, but extrapolation, could give a good prediction but it’s UNRELIABLE, because we can’t confirm the trend
Sun of all residuals
0, hence that’s why we square then and minimise
Again definition of regression line
Aims to minimise the SUM of all speceicied ( y or x) residuals
What is coefficient of determination
R^2 explains the variation of y as a result of x
So if 0.5 , then 50% of variation in y so like house prices are explained by variation in x, and the rest is due to other reasons
When should we use regression line
Only if it is linear else we can’t