Linear regression (week 3-5) Flashcards by Cindy Patricia

Linear regression formula

Y = alpha + BetaiXi +E

How well did you know this?

Not at all

Perfectly

What is Y in Linear Regression?
What is alpha in Linear Regression?
What is Beta in Linear Regression?
What is X in Linear Regression?
What is E in Linear Regression?

Y is dependent var
Alpha is intercept parameter
Beta is regression coefficient
X is explanatory variable
E is i.i.d error term (use N(o, var))

How well did you know this?

Not at all

Perfectly

Estimated Linear Regression

same, but put hat on all the coeff and E is Ɛ

How well did you know this?

Not at all

Perfectly

E vs Ɛ?

E is i.i.d error (capture uncertainty) and Ɛ is residual term (diff data from model, is better it act more like E, contains uncertainty and what not captured)

How well did you know this?

Not at all

Perfectly

What if its not linear?

Use log

How well did you know this?

Not at all

Perfectly

How to minimize Ɛ

use minimize SSR (Sum Squared of Error)
1. Sum symbol (Y - Yhat)
2. derive! alpha and beta
3. Set the derivation to 0

How well did you know this?

Not at all

Perfectly

what is δ^2?

it represents (Σ(y-yhat)^2)/n-2

How well did you know this?

Not at all

Perfectly

why have n-2 in the δ^2?

it represent unbiased estimator

How well did you know this?

Not at all

Perfectly

what if is too large?

use the formula alpha with squingy line on top and beta with squingy line and test the hypothesis for both alpha with squingy line on top and beta with squingy line

How well did you know this?

Not at all

Perfectly

Goodness-of-fit measured using

R^2 = regression SS / Total SS
between 0% to 100%
calc test stats
use F1,n-2

How well did you know this?

Not at all

Perfectly

R^2 means?

proportion of total data variability explained by model

How well did you know this?

Not at all

Perfectly

Total Sum Of Square means?

Deviations between data and sample mean (total variability)

How well did you know this?

Not at all

Perfectly

Regression SS mean?

Deviations between model estimate and sample mean (data variability explained by model)

How well did you know this?

Not at all

Perfectly

Residual SS mean?

Deviations between data and model estimate (data variability unexplained by model)

How well did you know this?

Not at all

Perfectly

what does Y* means

its using the new Y, or Y in the future trs dibagi 100

How well did you know this?

Not at all

Perfectly

prediction interval of Y*

Study These Flashcards

(alpha + (beta times x) ± tn-2 times sqrt(δ^2) times sqrt (1 + 1/n+ (x- mean of old x)^2 / ∑x^2 - n times sum of old x^2)

residual is .. - …
and good if

Study These Flashcards

dependent var - fitted value
good if randomly scattered and normally distributed s

model fitting

Study These Flashcards

estimate alpha and beta and the δ^2

Predict with 95% interval!

Study These Flashcards

model fitting
goodness of fit
plot residual (lebih scattered lebih bagus)
prediction interval

Multicollinearity is

Study These Flashcards

When 2 or more explanatory variable highly correlated, so it become vague, imprecise, and unreliable parameter estimates.

Adjusted R vs R^2

Study These Flashcards

As explanatory var increase, R^2 also increase. However, overly complex model is also not good so we introduce adjusted R

Adjusted R formula

Study These Flashcards

(1-(n-1)(1-R^2))/(n-k-1)

Forward selection

Study These Flashcards

-start with single explanatory variable and see the adjusted R
-If the R is higher, its good then add the variable into the model

Backward selection

Study These Flashcards

-Start with all explanatory variable
-Remove one by one, see the adjusted R

Interaction term

add x1:x2 if relationship between Y and x1 depends on x2

Categorical variable is

-discrete categories/levels/classes -qualitative by nature -tackled by set dummy variable -can include continuous and categorical var in linear regression -can include interaction term

Nominal vs Ordinal

Unordered, ordered

How many dummy var needed?

n-1, because we put one of the explanatory var as the base

What can we do to improve the adjusted R?

-Check the stars (significant or not) -We can combine variable -Do not forget to check any multicollinearity

From where we can now to combine this var with others?

Try use the similar estimate (neighbours)

If we test the cor and found it multicollinearity detected, we have to test ..?

Test if it is significant: assume corr(x1,x2) = rho t-test = rho * sqrt((n-2)/(1-rho^2)) use qt(0.975, n-2)

Linear regression (week 3-5) Flashcards

(31 cards)