Chapter 3) Regression Flashcards

1
Q

What is a regression line?

A

This is the BEST line of best fit you csn draw

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is it called least sqaures of residuals

A

Residuals is the distance between each data plot and the line of best fit

You want a line that means the distance between these is minimised

But can’t use distances straight as negative values will cancel out

Thus we sqaure them and then minimise

This is called least sqaures of residuals line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When do u use y on x regression line and when do you use x on y

Are these bith lines give the same line of best fit?

A

NO, they give two different ones

Use y on x when you have an x value and want to use it to predict y value

Use x on y when you have y value want to predict x value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When can we use x on y and y in x and when can we ONLY use y on x based on the types of random data involved

A

Can only use y on x when it’s RANDOM on non random, so use your controlled x balues to predict t, means only using y on x

However when it’s random on random any can be used to predict each other !

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Based on the formula for regression line, what does each regression line go through (point) and thus where do y in x and x in y intercept

A

They all go through the MEAN of x and the mean if Y!

Thus bith lines will intercept at the same place!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are you not a,OLED to extrapolate to predict y values?

What ar enough doing when using the values in between to predict?

A

When using vakues in between to predict, as yiu know definitely the trend between these, this is interpolation, so it’s accurate

But as soon a should extrapolate, you can not confirm the trend will follow, so it could give a good prediction but you can’t be sure, so overall it’s UNRELIABLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a residual

A

This is the difference in between the ACTUAL value snd the PREDUCTED VALUE

Can be positbe negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are all the sum of the residuals

A

This will be 0!

That’s why we sqaure them as part of calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

So again what does a regression,line aim to do

Defintion

A

It aims to MINIMISE the sum of all the specified SQUARE of the residuals

So y on x aims to minimise the Sum of the sqaured Y residuals
So the sum of the sqaure if the verticak distances from each point to the line (residual) is minised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to find the COEFFICIENT of determination and how to use it

A

= to (PMCC)2 , R2

Basically it tells you how much by proportion the variation of y is explained by the variation of x

So say yiu have house price and floor area, and coefficient was 0.6

This tells you That THE FLOOR AREA ONLY EXPLAINS FOR 60% OF THE VARIATION OF Y, the remaining 40% is explained by OTHER FACTORS
- considering that house price depends on location, rooms bathroom etc it makes sense

So it gives you the proportion of the factor that actually explains the variation on y,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Simple terms what does coeffeicnt of determination do o

A

Gives a proportion that the variation in x explains the variation of y

So like 60% of y variation depends on x variation fsctor chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

So when should we use regression line

A

This is only when linear association string, because if it looks curved then it won’t work

Thus check sctater diagram and determine this!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly