Regression Flashcards
What is a regression line
(Least sqaures regression line)e
Basically the line of best fit
It’s the line which minimised the VERTICAL distance from Esch data point, also known as LEAST SQAURES REGRESSION LINE
What will every regression line go through
Mean x mean y,
Why are there TWO regression limes one for y and one for x
Regression line fir h takes into minimising VERTICAL distances and so for any value of x will return y
If you want to put y in and return x, then your line must minimise the HORIZONTAL DISTANCES instead , and need a new formula
What is grsdient both times, y on x and x on y
Y on x is sxy/ sxx
X on y is sxy/syy
Okay so what ar W both equations then
Make it easy y on x
Y-Ybar = m (x-xbar)
X in y
X - xbar= m (y-ybar)
And here x =my +c
How to know Y ON X or X ON Y
Random on non random ?
When it’s RANDOM ON NON RANDOM, you obv want the non random to give yiu a random result, so you want x to give you a y so you want y on x.
- it wouldn’t make sense to do x on y, giving you a non random result froma random variable?
2) HOWEVER RANDOM ON RSNDOM, you can use both! So x on y and y on x!
Interpolation vs extrapolation
Extrapolate is predicting values outside of range, might be unreliable might be useful
Interpolation is between data points
Residuals?
So what is the least regression sqaure line mean
Is the distance between the collected point snd the predicted value for it, using the collected datasx point snd putting it into the line.
Least sqaures regression line is MINIMISING THE sqaures of the residuals (because they can be negative)
What’s the sum of all residuals
Sum of all residuals after derivation WILL ALWAYS BE EQUAL TO 0!!!!
Formula for residual
Y value - prosecuted y value
Y - (bx+a)
What does the value of equal in equations for regression
Why
We know regression always goes through mean
So in ewautiom, rearrange for a = y bar - bxbar
Again what does regression line aim to do with residuals
Miniminise the sum of sqaures of residuals as sum of normal is 0
When can we and when should we not use a regression lime (shape of curve)
If it doesn’t seem to be a linear association rather curved, csn’t use
- if some of data fits but the rest doesn’t kigjt not be aproprsote, might have to extrapolate, think about context
What is coeffeicmt of determination and how to use it
It’s r 2, PMCC2
It explains the proration of y explained by change in x
So if it’s 50%, it means the variation in y is 50% explained by x that’d it, the rest is OTHER FACTORS
Like location etc for house scenario