Extensions Of Multiple Regression Flashcards

Question 1

Q

What is high leverage point vs outlier?

Answer

A

high-leverage point is an extreme value for one of the independent variables
outlier is an extreme value for the dependent variable

Question 2

Q

What’s are 2 methods for detecting influential data points? Which method is for high leverage points and which method is for outliers? LS

Answer

A

leverage measure (high leverage point/ independent variable)
studentized residuals (outliers/dependent variables)

Question 3

Q

What’s formula for leverage when trying to detect influential data points and when is the outlier considered to be significant? What is the leverage measure for?

Answer

A

hii = observation - mean value

if hii > 3*(k+1/n) an observation is considered to be significant

k = number of independent variables
n = number of observations

for identifying high-leverage points. It takes a value between 0 and 1 that quantifies the distance between the ith value for an independent variable and the mean value. A higher value indicates more influence.

Question 4

Q

What is the studentized residuals method steps for detecting influential data points?

Answer

A

runs regression
Deletes 1 observation and re-runs regression
residual (ei) = observed data point before re running regression - new regression line after re running the regression
calculate residual for each observation in data set
calculate standard deviation of the residuals
studentized residual (ti) = ei/ standard deviation of residuals

Question 5

Q

What does studentized residual measure?

Answer

A

ti (studentized residual) measures # of standard deviations away from regression line

Question 6

Q

When is the studentized residual considered an outlier?

Answer

A

if the studentized residuals absolute value (if negative turn to positive) (ti) is greater than 3 or greater than the critical value of the t statistic

t statistic = n-k-2

n = # of observations
k = # of dependent variables

Question 7

Q

What are the 2 values that dummy variables take on?

Answer

A

1 if true

0 if false

Question 8

Q

What do you need to use when there are more than 2 categories and why?

Answer

A

to avoid multicollinearity. using n-1 dummy variables, you create a model where one category is implicitly represented by the absence of all other dummy variables, which avoids the multicollinearity problem

Question 9

Q

What are the 3 types of dummy variables?

Answer

A

intercept dummies
slope dummies
intercept and slope dummies

Question 10

Q

How does adding an intercept dummy change a single linear regression formula?

Answer

A

single linear regression
y = b0 + b1x

single linear regression with dummy variable 0
y = b0 + b1x

single linear regression with dummy variable 1

y = b0 + dummy variable + b1x

Question 11

Q

How does adding an intercept dummy variable affect the regression on a graph?

Answer

A

if intercept dummy variable is 1 then it’ll move the intercept or regression line up but it’ll be parallel to a simple linear regression

Question 12

Q

How does adding a slope dummy change a single linear regression formula?

Answer

A

single linear regression
y = b0 + b1x

single linear regression with slope dummy variable 0
y = b0 + b1x

single linear regression with slope dummy variable 1

y = b0 + (b1+d1)x

d1 = dummy variable of 1

Question 13

Q

How does adding a slope dummy variable affect the regression on a graph?

Answer

A

intercept stays the same but with a dummy variable the regression line becomes steeper

Question 14

Q

If p value is greater than or less than 5% when can you reject and when can you not reject?

Answer

A

p value > 0.05 don’t reject the null hypothesis

p value < 0.05 reject the null hypothesis (indicates a significant result)

Question 15

Q

What’s the formula for odds of an event occurring?

Answer

A

P / (1-p)

p = probability

Question 16

Q

What’s logit odds and why use logit odds?

Answer

A

logistic regression (logit)

log odds = In (P/1-p)

we use log odds because regular odds the outcome value or regression line can be greater than 1. where as log odds transforms the data so the probability of odds falls between 0 and 1 and the regression line stays within 0 and 1

Question 17

Q

What is the regression equation for logistic transformation or logit and how can it be reorganized to isolate the probability?

Answer

A

In (p / 1-p) = b0 +b1X1 + b2X2 + b3X3 + e

P = 1 / (1 + exponent (- b0 +b1X1 + b2X2 + b3X3 ))

b = slope coefficient
x = mean of independent variable

Question 18

Q

What is Maximum likelihood estimation (MLE)?

Answer

A

statistical method for finding the parameters of a distribution that best fit a set of observed data, in essence we want to move our distribution over a set of data that will include the most amount of our data points which is usually the mean.

Question 19

Q

What is the likelihood ratio (LR) test, ratio, and what does high vs low LR mean, and what are the ranges an LR can take on?

Answer

A

LR is always negative

LR = assesses how well two statistical models fit a dataset by comparing their likelihoods (the probability of observing the data point given the models). Essentially, it helps determine if a more complex model significantly improves the fit over a simpler one

LR higher = better fit

LR = -2 (log likelihood restricted model - log likelihood unrestricted model)