W3 Multiple Linear Regression Flashcards

1
Q

WHAT is multiple linear regression

A

the linear relationship between the dependent variable y and 2 or more independent x variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

equation of line in multiple regression

A

Y = B0 + B1X1 + B2X2 … + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is r^2 the coefficient of multiple determination

A

tells us how much of Y is explained by x independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why is using adjusted r^2 more reliable

A

never decreases degrees of freedom when a new x variable is added to the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a residual error

A

difference between actual and assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

should residuals be random or not

A

random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

confidence interval for the population slopeof b

A

coefficient of B +- t stat*standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to know when to reject null hypothesis

A

f stat > critical f reject

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how to test contributions of a single variable

A

test with all variables

test with all variables except the one we’re testting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why might you want to test contributions of a single variable

A

maybe the variable are just getting a leg up from others that already had an effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does the coefficient of parial determination tell us

A

how much of the variance is described by 1 variable when the others are held constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when are dummy variables used

A

when data is categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what two numbers are used rather than numerical data as the xs in equation including dummy variable

A

1 for present

0 for absent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to test interaction between independent variables

A

ssr(all) - ssr(all except new variables) / mse (all)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when is logistic regression used

A

when the Y variable is binary (a dummy variable) eg
prefer A or B
voted or didn’t vote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In what industry is logistic regression used

A

machine learning and AI

17
Q

what is odds ratio

A

prob of event of interest / 1 - prob of event of interest

18
Q

what is estimated odds ratio

A

e^ln(odds ratio)

19
Q

what is estimated probabilit

A

estimated odds ratio / 1 + estimated odds ratio

20
Q

when testing to see if a non liner model should be used, how do we pick the model with the best fit

A

the one with the highest r^2

21
Q

two types of transformations to transform non linear models into linearones

A

square root

log

22
Q

what is the problem with collinearity in regression

A

you cannot hold one variable constant because of the close relationship between variables

23
Q

what to do in the case of colinear variables

A

avoid regression or choose one to include

24
Q

indications that collinearity has happened

A

incorrect signs on coefficients
large change in value of previous coefficient when new one is added
variance of model increases when new variable is added

25
Q

how to lower probability of collinearity

A

remove unimportant variables

26
Q

purpose of a partial f test

A

to see the level of contribution of variables

27
Q

what is stepwise regression

A

adding variables 1 by 1

if r^2 goes up, keep the variable, otherwise dont

28
Q

how to create stepwise regression graphs

A

insert > scatter >

29
Q

how to add trend lines and equations of lines on scatter plots

A

right click on a point > add trend line and pick the type which gives the best r^2

30
Q

if you have to make a squared or rooted version of a variable, when you select data for regression do you not include the normal version or do you

A

you do

31
Q

if there is significant interaction between two variables, how can you make them into one variable to simplify the model

A

multiplying them together and include this new variable in regression analysis, check and see if it improves the model significantly

32
Q

if you had to construct a 95% confidence level interval of slope, where would you find this

A

in the coefficients table from regression

33
Q

how would you compute and interpret the coefficients of partial determination.

A
  1. perform anova on data with all independent variables
  2. perfrom anova on data with variable we are interested in finding the contribution of only
  3. find the absolute difference between both of their regression sum of squares
  4. get partial f stat by putting ans3^ over MS from first anova
  5. get r^2