Linear regression (week 3-5) Flashcards

1
Q

Linear regression formula

A

Y = alpha + BetaiXi +E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Y in Linear Regression?
What is alpha in Linear Regression?
What is Beta in Linear Regression?
What is X in Linear Regression?
What is E in Linear Regression?

A

Y is dependent var
Alpha is intercept parameter
Beta is regression coefficient
X is explanatory variable
E is i.i.d error term (use N(o, var))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Estimated Linear Regression

A

same, but put hat on all the coeff and E is Ɛ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

E vs Ɛ?

A

E is i.i.d error (capture uncertainty) and Ɛ is residual term (diff data from model, is better it act more like E, contains uncertainty and what not captured)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What if its not linear?

A

Use log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to minimize Ɛ

A

use minimize SSR (Sum Squared of Error)
1. Sum symbol (Y - Yhat)
2. derive! alpha and beta
3. Set the derivation to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is δ^2?

A

it represents (Σ(y-yhat)^2)/n-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why have n-2 in the δ^2?

A

it represent unbiased estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what if is too large?

A

use the formula alpha with squingy line on top and beta with squingy line and test the hypothesis for both alpha with squingy line on top and beta with squingy line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Goodness-of-fit measured using

A

R^2 = regression SS / Total SS
between 0% to 100%
calc test stats
use F1,n-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

R^2 means?

A

proportion of total data variability explained by model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Total Sum Of Square means?

A

Deviations between data and sample mean (total variability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regression SS mean?

A

Deviations between model estimate and sample mean (data variability explained by model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Residual SS mean?

A

Deviations between data and model estimate (data variability unexplained by model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does Y* means

A

its using the new Y, or Y in the future trs dibagi 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

prediction interval of Y*

A

(alpha + (beta times x) ± tn-2 times sqrt(δ^2) times sqrt (1 + 1/n+ (x- mean of old x)^2 / ∑x^2 - n times sum of old x^2)

17
Q

residual is .. - …
and good if

A

dependent var - fitted value
good if randomly scattered and normally distributed s

18
Q

model fitting

A

estimate alpha and beta and the δ^2

19
Q

Predict with 95% interval!

A
  1. model fitting
  2. goodness of fit
  3. plot residual (lebih scattered lebih bagus)
  4. prediction interval
20
Q

Multicollinearity is

A

When 2 or more explanatory variable highly correlated, so it become vague, imprecise, and unreliable parameter estimates.

21
Q

Adjusted R vs R^2

A

As explanatory var increase, R^2 also increase. However, overly complex model is also not good so we introduce adjusted R

22
Q

Adjusted R formula

A

(1-(n-1)(1-R^2))/(n-k-1)

23
Q

Forward selection

A

-start with single explanatory variable and see the adjusted R
-If the R is higher, its good then add the variable into the model

24
Q

Backward selection

A

-Start with all explanatory variable
-Remove one by one, see the adjusted R

25
Q

Interaction term

A

add x1:x2 if relationship between Y and x1 depends on x2

26
Q

Categorical variable is

A

-discrete categories/levels/classes
-qualitative by nature
-tackled by set dummy variable
-can include continuous and categorical var in linear regression
-can include interaction term

27
Q

Nominal vs Ordinal

A

Unordered, ordered

28
Q

How many dummy var needed?

A

n-1, because we put one of the explanatory var as the base

29
Q

What can we do to improve the adjusted R?

A

-Check the stars (significant or not)
-We can combine variable
-Do not forget to check any multicollinearity

30
Q

From where we can now to combine this var with others?

A

Try use the similar estimate (neighbours)

31
Q

If we test the cor and found it multicollinearity detected, we have to test ..?

A

Test if it is significant:
assume corr(x1,x2) = rho
t-test = rho * sqrt((n-2)/(1-rho^2))
use qt(0.975, n-2)