Chapter 3) Regression Flashcards
What is a regression line?
This is the BEST line of best fit you csn draw
Why is it called least sqaures of residuals
Residuals is the distance between each data plot and the line of best fit
You want a line that means the distance between these is minimised
But can’t use distances straight as negative values will cancel out
Thus we sqaure them and then minimise
This is called least sqaures of residuals line
When do u use y on x regression line and when do you use x on y
Are these bith lines give the same line of best fit?
NO, they give two different ones
Use y on x when you have an x value and want to use it to predict y value
Use x on y when you have y value want to predict x value
When can we use x on y and y in x and when can we ONLY use y on x based on the types of random data involved
Can only use y on x when it’s RANDOM on non random, so use your controlled x balues to predict t, means only using y on x
However when it’s random on random any can be used to predict each other !
Based on the formula for regression line, what does each regression line go through (point) and thus where do y in x and x in y intercept
They all go through the MEAN of x and the mean if Y!
Thus bith lines will intercept at the same place!
Why are you not a,OLED to extrapolate to predict y values?
What ar enough doing when using the values in between to predict?
When using vakues in between to predict, as yiu know definitely the trend between these, this is interpolation, so it’s accurate
But as soon a should extrapolate, you can not confirm the trend will follow, so it could give a good prediction but you can’t be sure, so overall it’s UNRELIABLE
What is a residual
This is the difference in between the ACTUAL value snd the PREDUCTED VALUE
Can be positbe negative
What are all the sum of the residuals
This will be 0!
That’s why we sqaure them as part of calculation
So again what does a regression,line aim to do
Defintion
It aims to MINIMISE the sum of all the specified SQUARE of the residuals
So y on x aims to minimise the Sum of the sqaured Y residuals
So the sum of the sqaure if the verticak distances from each point to the line (residual) is minised
How to find the COEFFICIENT of determination and how to use it
= to (PMCC)2 , R2
Basically it tells you how much by proportion the variation of y is explained by the variation of x
So say yiu have house price and floor area, and coefficient was 0.6
This tells you That THE FLOOR AREA ONLY EXPLAINS FOR 60% OF THE VARIATION OF Y, the remaining 40% is explained by OTHER FACTORS
- considering that house price depends on location, rooms bathroom etc it makes sense
So it gives you the proportion of the factor that actually explains the variation on y,
Simple terms what does coeffeicnt of determination do o
Gives a proportion that the variation in x explains the variation of y
So like 60% of y variation depends on x variation fsctor chosen
So when should we use regression line
This is only when linear association string, because if it looks curved then it won’t work
Thus check sctater diagram and determine this!