Regression Flashcards

1
Q

What questions can regression answer?

A

How do systems work?
ex: how many runs the avg homerun is worth
-effects of economic factors on pres. election

Make Predictions about what will happen in the future?
-height in the future
-price of oil in the future
-housing demand in next 6 months

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Simple Linear Regresssion

A

-one predictor
-y = response
x = predictor

Equation

y = a0 + a1x1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

general linear regression equation

A

with m predictors
y = response
x = predictor
y = a0 + sum from j =1 to m ajxj

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you measure the quality of a regression line’s fit?

A

the sum of squared errors

-distance between true response and our estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

simple linear regression prediction error

A

Yi - actual
yhati - prediction

Yi - Yhati or yi - (a0+a1xi1))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sum of squared errors equation

A

sum from i = 1 to n (yi-yhati)^2
or
sum from i = 1 to n (yi-(a0+a1xi1))^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the best fit regression SLR line?

A

minimizes sum of squared errors
-defined by a0 and a1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we measure the quality of a models fit?

A

likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is likelihood? What is maximum likelihood?

A

-measure the probability (density) for any parameter set; we assume the observed data is the correct value and we have information about the variance

-parameters that give the highest probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What Maximum Likelihood Estimation (MLE). What are you minimizing to calculate this?

A

the set of parameters that minimizes the sum of squared errors

zi = observations
yi = model estimates
minimize sum from i = 1 to n (zi-yi)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Maximum likelihood in the context of linear regression

A

LR - y = a0 + sum from j =1 to m ajxj
sum square errors = sum from i = 1 to n (zi-yi)^2

substitute regression equation for yi in sum of squared errors

minimize sum from i = 1 to n (zi-(a0 + sum from j =1 to m ajxj))^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you use likelihood to compare two different models?

A

the likelihood ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Akaike Information Criterion equation. What is the penalty terma nd what does it do?

A

L*: maximum likelihood value
K: # of parameters we’re investigating

AIC = 2k -2ln(L*)

Penalty term - (2k) balances likelihood with simplicity
-helps avoid overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

AIC with regression? Do you want AIC to be smaller or higher?

A

substitute maximum likelihood reg. equasion and the # of parameters is m+1

-we prefer models with smaller aic, aic smaller encourages fewer parameters and higher likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

corrected AIC

A

-works well if we have infitiely many data points
-this never happens

-add a corrections term

AICc = AIC 2k(k+1)/ n-k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Comparing models with AIC

A

relative likelihood that lower AIC model is better =
e^((AIC1-AIC2)/2)

17
Q

Bayesian Information Criterion (BIC)

A

L*: maximum likelihood value
K: # of parameters we’re investigating
n: number of data points

BIC = kln(n) - 2ln(L*)

18
Q

AIC VS BIC

A

BICs penalty term >AICs penalty term
-BIC encourages models with fewer parameters than AIC does

-only use bic when there are more data points than parameters

19
Q

BIC comparison between 2 modesl on the same dataset…

A

is abs(BIC1-BIC2) >10 the smaller bic model is very likely to be better

if between 6 and 10 smaller bic models is likely better

between 2 and 6 somewhat likely better

between 0 and 2 is slightly likely to be better

20
Q

Is there a hard an fast rule for choosing betweeen AIC, BIC, or maximum likelihood?

A

No, all 3 can give valuable information. Looking at all 3 can help you decide which is best

21
Q

Regression coefficients for predictions and forecasting

A

the response increases by the coeeficient * the variable

in other words if the variable= 1 , that increases the response by the coefficient amount (descriptive)

if we are forecasting
-same thing but the coefficient is increase the response by its amount when the variable =1 (predictive)

22
Q

Which of the components of analytics can regression be used for?

A

Descriptive and predictive analytics
not prescriptive

23
Q

Causation

A

one thing causes another thing

24
Q

correlation

A

two things tend to happen together or not together
- they don’t nescessarily cause each other

25
Q

When is there causation?

A

-cause is before effect
-idea of causation makes sense
-no outside factors that could cause the relationship
-be careful before claiming causation

26
Q

Transforming data

A

-adjust the data so the fit is linear
-quadratic regression
-response transform
-box-cox transformation

27
Q

variable interaction

A

ex- 2 yr olds height @ adulthood. if both parents are tall maybe the kid will be even taller ie their heights interact

-y = a0 + a1x1 + a2x2+a3(x1x2)
-the interaction term is a new column of data that we can use as a new input x3

28
Q

p-value of coefficient

A

estimate the probability that the coefficient is really 0
-form of hypothesis testing

if p value > 0.05
- can remove from model

-other thresholds can be used
-higher thresholds - more factors can be included
-possibilitie of including irrelevant factor

-lower thresholds - less factors can be included - possibility of leaving out relevant factor

29
Q

p-value warnings

A

with large amounts of data p values get small even when attributes are not at all related to the response

p values are only probabilities even when meaningful
-100 attributes p values of .02 each, 2% chance of not being significant
-expect 2 that are not really relevant

30
Q

confidence interval

A

where the coefficient probably lies and how close it is to 0

31
Q

T-statistic

A

the coefficient divided by it’s standard error
-related to p value

32
Q

interpreting coefficient

A

-sometimes you discover the coefficient when multiplied by attribute still doesn’t make much of a difference even if the pvalue is very low

ex: estimate household income with age as one of the attributes
-if the coefficient is 1
even with low p value the attribute really isn’t very important. its unlikely to mkae even a $100 difference

33
Q

R squared value (coefficient of determination)

A

-estimate of how much variability your model accounts for
-ex rsquared = 59%
-accounts for about 59% of the variability in the data
-the remaining 41% is either randomness or other factors

34
Q

adjusted r dquared

A

rsquared adjusted for # of attributes used

35
Q

interpreting r squared, what is a good value?

A

-some things aren’t easily modeled
-things can affect real life systems especially when humans are involved
-r-squared values of .4 or .3 are quite good

36
Q

what is the null hypothesis?

A

the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.

37
Q

r squared formual

A

1-SSEresiduals/SSEtota