week 5 Flashcards

1
Q

predictive analytics use historical data to

A

tell us somehting about the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

IV vs DV

A

IV - used to predict DV and on the x axis
DV- what we trying to predict based on other variables on y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are we trying to predict with lm

A

B0, B1 or coeffcients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are we trying to minimze for each variable and for each model

A

E for variablke and total error for model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why use SSE

A

magnify error deviation by squaring it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

issue with SSE and fix

A

depedns on number of points- more points SSE = higher
fix- by use RMSE: normalized by N and same variable as DV so if DV is price units will be in $

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

R square is high means

A

model fits well with data and error are small but not guarantee work well on unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MAPE vs MAE in high averages for data set

A

MAPE better if data set average high as will show as a %. MAE will be higher if data set higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is r squared

A

is percentage decrease in SSE, what percentage in SSE has actually dropped compared to baseline model (SST)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

it is hard to get model with good accuracy 0.8+ on real data so what values good

A

0.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

model r ^2 get better if you add more variables that are above 0 R^2 but at a

A

diminish rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • Not all variables should be used because
A

model over fit data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

issue with over fitting

A

is that it will perform badly on unseen data because it doesn’t know that data just memorized old data
 Will change coefficient to minimize error when given to make prediction on future will make error because is over fitted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

significance is based on confidence level we want if confidence is 95% pvalue is and what is insiginifcant

A

5% if greater then not signifacnt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

 Coefficient (beta) = 0.6 means

A

if IV increase by 1 unit then DV will increase by 0.6 units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

sign of overfitting

A

Adjusted r square can increase or decrease. Add new variable and adjusted r square goes down = overfitting the model

17
Q

goal is to include only significant variables in regression because

A

other variables will cause overfitting

18
Q

correlation does what

A

mirrors linear relationship between two variables. It measures the degree to which the two variables are linearly related to each other is between -1 and 1

19
Q

linear regression assumption about correlation

A

all variables independant so no correlation

x variables are going to be independent not dependant on other variables

20
Q

whats is sign for worry in correlation

A

-0.6-0.6

21
Q

why do we split data

A

model may just be trying to minimize error not make predictions, to see if perform well or just overfit, training should be 80%, using lm function with training data to build

22
Q

if coeefcient 0 means

A

no impact on dv from iv

23
Q

output of RMSE/ MAE

A

tells us predictions within # error on average differnece is RMSE gives more weigh to larger errors. making senssitve to outliers or large deviations

24
Q

MAE used when

A

average magnitude of errors, regardless of directions. less sensitive to outliers with focus on overall accuracy not punish deviations

25
Q

MAPE output tells us

A

avarge deviation as a % from actual price

26
Q

r^2 only useful for who adn solution

A

analysts, use RMSE, MAPE, MAE as easeier to understand

27
Q

when r = 0

A

indicates that there is no linear correlation between x and y. However, it does not necessarily imply that there is no relationship between them.

28
Q

best metrics for testing data

A

MAE, RMSE, MAPE

29
Q

what does -R^2 mean

A

o This means baseline model (average of price = price all wines) is doing better than your model and yours is useless. So build better model or just use baseline

30
Q

what is standard error

A

meausre of uncertantiy in estimate of coeffcient

31
Q

what are residuals in model summary

A
  • residuals are when you build a model, tell you the distribution of these errors