week 5 Flashcards

(31 cards)

1
Q

predictive analytics use historical data to

A

tell us somehting about the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

IV vs DV

A

IV - used to predict DV and on the x axis
DV- what we trying to predict based on other variables on y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are we trying to predict with lm

A

B0, B1 or coeffcients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are we trying to minimze for each variable and for each model

A

E for variablke and total error for model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why use SSE

A

magnify error deviation by squaring it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

issue with SSE and fix

A

depedns on number of points- more points SSE = higher
fix- by use RMSE: normalized by N and same variable as DV so if DV is price units will be in $

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

R square is high means

A

model fits well with data and error are small but not guarantee work well on unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MAPE vs MAE in high averages for data set

A

MAPE better if data set average high as will show as a %. MAE will be higher if data set higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is r squared

A

is percentage decrease in SSE, what percentage in SSE has actually dropped compared to baseline model (SST)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

it is hard to get model with good accuracy 0.8+ on real data so what values good

A

0.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

model r ^2 get better if you add more variables that are above 0 R^2 but at a

A

diminish rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • Not all variables should be used because
A

model over fit data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

issue with over fitting

A

is that it will perform badly on unseen data because it doesn’t know that data just memorized old data
 Will change coefficient to minimize error when given to make prediction on future will make error because is over fitted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

significance is based on confidence level we want if confidence is 95% pvalue is and what is insiginifcant

A

5% if greater then not signifacnt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

 Coefficient (beta) = 0.6 means

A

if IV increase by 1 unit then DV will increase by 0.6 units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

sign of overfitting

A

Adjusted r square can increase or decrease. Add new variable and adjusted r square goes down = overfitting the model

17
Q

goal is to include only significant variables in regression because

A

other variables will cause overfitting

18
Q

correlation does what

A

mirrors linear relationship between two variables. It measures the degree to which the two variables are linearly related to each other is between -1 and 1

19
Q

linear regression assumption about correlation

A

all variables independant so no correlation

x variables are going to be independent not dependant on other variables

20
Q

whats is sign for worry in correlation

21
Q

why do we split data

A

model may just be trying to minimize error not make predictions, to see if perform well or just overfit, training should be 80%, using lm function with training data to build

22
Q

if coeefcient 0 means

A

no impact on dv from iv

23
Q

output of RMSE/ MAE

A

tells us predictions within # error on average differnece is RMSE gives more weigh to larger errors. making senssitve to outliers or large deviations

24
Q

MAE used when

A

average magnitude of errors, regardless of directions. less sensitive to outliers with focus on overall accuracy not punish deviations

25
MAPE output tells us
avarge deviation as a % from actual price
26
r^2 only useful for who adn solution
analysts, use RMSE, MAPE, MAE as easeier to understand
27
when r = 0
indicates that there is no linear correlation between x and y. However, it does not necessarily imply that there is no relationship between them.
28
best metrics for testing data
MAE, RMSE, MAPE
29
what does -R^2 mean
o This means baseline model (average of price = price all wines) is doing better than your model and yours is useless. So build better model or just use baseline
30
what is standard error
meausre of uncertantiy in estimate of coeffcient
31
what are residuals in model summary
* residuals are when you build a model, tell you the distribution of these errors