week 5 Flashcards by Ryan Riggs

predictive analytics use historical data to

tell us somehting about the future

How well did you know this?

Not at all

Perfectly

IV vs DV

IV - used to predict DV and on the x axis
DV- what we trying to predict based on other variables on y axis

How well did you know this?

Not at all

Perfectly

what are we trying to predict with lm

B0, B1 or coeffcients

How well did you know this?

Not at all

Perfectly

what are we trying to minimze for each variable and for each model

E for variablke and total error for model

How well did you know this?

Not at all

Perfectly

Why use SSE

magnify error deviation by squaring it

How well did you know this?

Not at all

Perfectly

issue with SSE and fix

depedns on number of points- more points SSE = higher
fix- by use RMSE: normalized by N and same variable as DV so if DV is price units will be in $

How well did you know this?

Not at all

Perfectly

R square is high means

model fits well with data and error are small but not guarantee work well on unseen data

How well did you know this?

Not at all

Perfectly

MAPE vs MAE in high averages for data set

MAPE better if data set average high as will show as a %. MAE will be higher if data set higher

How well did you know this?

Not at all

Perfectly

what is r squared

is percentage decrease in SSE, what percentage in SSE has actually dropped compared to baseline model (SST)

How well did you know this?

Not at all

Perfectly

it is hard to get model with good accuracy 0.8+ on real data so what values good

0.3

How well did you know this?

Not at all

Perfectly

model r ^2 get better if you add more variables that are above 0 R^2 but at a

diminish rate

How well did you know this?

Not at all

Perfectly

Not all variables should be used because

model over fit data

How well did you know this?

Not at all

Perfectly

issue with over fitting

is that it will perform badly on unseen data because it doesn’t know that data just memorized old data
 Will change coefficient to minimize error when given to make prediction on future will make error because is over fitted

How well did you know this?

Not at all

Perfectly

significance is based on confidence level we want if confidence is 95% pvalue is and what is insiginifcant

5% if greater then not signifacnt

How well did you know this?

Not at all

Perfectly

 Coefficient (beta) = 0.6 means

if IV increase by 1 unit then DV will increase by 0.6 units.

How well did you know this?

Not at all

Perfectly

sign of overfitting

Study These Flashcards

Adjusted r square can increase or decrease. Add new variable and adjusted r square goes down = overfitting the model

goal is to include only significant variables in regression because

Study These Flashcards

other variables will cause overfitting

correlation does what

Study These Flashcards

mirrors linear relationship between two variables. It measures the degree to which the two variables are linearly related to each other is between -1 and 1

linear regression assumption about correlation

Study These Flashcards

all variables independant so no correlation

x variables are going to be independent not dependant on other variables

whats is sign for worry in correlation

Study These Flashcards

-0.6-0.6

why do we split data

Study These Flashcards

model may just be trying to minimize error not make predictions, to see if perform well or just overfit, training should be 80%, using lm function with training data to build

if coeefcient 0 means

Study These Flashcards

no impact on dv from iv

output of RMSE/ MAE

Study These Flashcards

tells us predictions within # error on average differnece is RMSE gives more weigh to larger errors. making senssitve to outliers or large deviations

MAE used when

Study These Flashcards

average magnitude of errors, regardless of directions. less sensitive to outliers with focus on overall accuracy not punish deviations

MAPE output tells us

avarge deviation as a % from actual price

r^2 only useful for who adn solution

analysts, use RMSE, MAPE, MAE as easeier to understand

when r = 0

indicates that there is no linear correlation between x and y. However, it does not necessarily imply that there is no relationship between them.

best metrics for testing data

MAE, RMSE, MAPE

what does -R^2 mean

o This means baseline model (average of price = price all wines) is doing better than your model and yours is useless. So build better model or just use baseline

what is standard error

meausre of uncertantiy in estimate of coeffcient

what are residuals in model summary

* residuals are when you build a model, tell you the distribution of these errors

week 5 Flashcards

(31 cards)