week 5 DSE Flashcards

1
Q

What is the disadvantage of running simple linear regressions separately instead of multiple?

A

ignores potential confounding factors or
synergy effect, leading to misleading results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If there are 2 or more independent vairbales, how to find error?

A

use least squares to find the regression plane that best fits the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If there are 2 predictors, how many parameters are we supposed to estimate

A

3

p+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the eror term of the ith point?

A

ϵi = yi − yˆi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does b2 =188 mean?

A

Holding the expenditure on x1 constant, every increase in x2 by 1 unit increases sales by 188 units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How many degrees of freedom does RSE have for multiple linear model

A

n-(p+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is R^2?

A

fraction of variance explained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a flaw of R^2 in multilinear regression?

A

value never decreases, even if we add redundant variables to the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What causes the flaw in R^2 in multilinear regression?

A

equation solves for the coefficients such that RSS are minimized

If the variable does not improves model fit, the estimated coefficient will be zero. BUT R2 remains unchanged.

If the variable improves model fit, the estimated coefficient will be nonzero ⇒ R2 increases.

r^2 cannot decrease, can only remain same or increase

Use adjusted R^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can adjusted R^2 be negative?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the formula for penalization factor?

It is always ______

A

(n-1)/ ( n-(p+1) )

It is always larger than 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does R^2 adjusted compare to R^2?

A
  • always smaller, can be negative
  • when new independent variable added, r^2 decreases

INCLUDES PENALIZATION FACTOR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is H0 and H1 for multiple lienar regression?

A

H0 : β1 = β2 = … = βp = 0
H1 : at least one βj is not zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What test for multi regerssion hypothesis testing?

A

F-statistic

refer to P-VALUE of f stats from R output table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is interpolation?

A

Predicting Y for a value of X that is within the range of the original data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is extrapolation

A

Predicting Y for a value of X that is outside the range

17
Q

How does overfitting arise?

A

Using too many variables or too complex of a model can often lead to overfitting

18
Q

How do you know when you are overfitting?

A

Wont be able to predict any other new observatio
Only able ot fit the current data perfectly

19
Q

What is the problem of collinearity in multi linear regres?

A

We can only hold factors constant figuratively.

However, when two or more Xs are highly correlated, we can’t even hold factors constant figuratively

20
Q

What are 2 solution to collinearity

A

Remove the redundant variable. You will need subject knowledge to understand which one is redundant.

Combine the collinear variables into a single variable

21
Q

How to write operations within linear regression model equations in R?

A

use I

lm_mpg2 = lm(mpg ~ horsepower + I(horsepowerˆ2), data = Auto)

22
Q

If x1= a and x2= a^2, is it still a linear model? Why?

A

Yes. Linear in coefficients

23
Q

How to exclude variables in R?

A

use .-

eg lm(mpg ~ . - name, ..)
.- means everything EXCEPT FOR NAME

24
Q

If you have a categorical variable, how many of coefficients for them will you see in R?

A

n-1

If you figure out 2, you can figure out hte last 1. ( If first 2 are 0, last one must be 1)

25
Q

How to add labels to nominal categorical data?

A

Auto$origin = factor(Auto$origin,
labels = c(“American”, “European”, “Japanese”))

26
Q

How to test whether there is a relationship between Y and any of the Xs?

A

look at the F-statistics and its p-value.

Reject h0 if p value small

27
Q

What are some modelling guidelines?

A

Start from simple to complex model

Make sure you understand what the parameters mean, especially after using transformations like log(Y ).

After fixing the initial model, add other variables one at a time, or by logical groups like demographics

Watch out for signs of collinearity (high or perfect correlation, large changes in estimated coefficients AFTER ADDING VARIABLE!!!, reversed signs, etc

s long as you know that conceptually a variable should be in the model, then the variable should be in the model

Run model diagnostics(make sure assumptions met, may need to try polynomial)

Check for interactions

Sometimes simplicity of presentation is preferred to a better fitting model

28
Q

Should you include a variable in a model if it is not statistically significant?

A

Yes, as long as you know that conceptually a variable should be in the model, then the variable should be in the model

29
Q

If differnece is major, is simplicity preferred to better fitting

A

NO!!