Extensions Of Multiple Regression Flashcards

1
Q

What is high leverage point vs outlier?

A
  • high-leverage point is an extreme value for one of the independent variables
  • outlier is an extreme value for the dependent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s are 2 methods for detecting influential data points? Which method is for high leverage points and which method is for outliers? LS

A
  • leverage measure (high leverage point/ independent variable)
  • studentized residuals (outliers/dependent variables)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s formula for leverage when trying to detect influential data points and when is the outlier considered to be significant? What is the leverage measure for?

A

hii = observation - mean value

if hii > 3*(k+1/n) an observation is considered to be significant

k = number of independent variables
n = number of observations

  • for identifying high-leverage points. It takes a value between 0 and 1 that quantifies the distance between the ith value for an independent variable and the mean value. A higher value indicates more influence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the studentized residuals method steps for detecting influential data points?

A
  1. runs regression
  2. Deletes 1 observation and re-runs regression
  3. residual (ei) = observed data point before re running regression - new regression line after re running the regression
  4. calculate residual for each observation in data set
  5. calculate standard deviation of the residuals
  6. studentized residual (ti) = ei/ standard deviation of residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does studentized residual measure?

A

ti (studentized residual) measures # of standard deviations away from regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is the studentized residual considered an outlier?

A

if the studentized residuals absolute value (if negative turn to positive) (ti) is greater than 3 or greater than the critical value of the t statistic

t statistic = n-k-2

n = # of observations
k = # of dependent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 values that dummy variables take on?

A

1 if true

0 if false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do you need to use when there are more than 2 categories and why?

A
  • to avoid multicollinearity. using n-1 dummy variables, you create a model where one category is implicitly represented by the absence of all other dummy variables, which avoids the multicollinearity problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 types of dummy variables?

A
  • intercept dummies
  • slope dummies
  • intercept and slope dummies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does adding an intercept dummy change a single linear regression formula?

A

single linear regression
y = b0 + b1x

single linear regression with dummy variable 0
y = b0 + b1x

single linear regression with dummy variable 1

y = b0 + dummy variable + b1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does adding an intercept dummy variable affect the regression on a graph?

A

if intercept dummy variable is 1 then it’ll move the intercept or regression line up but it’ll be parallel to a simple linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does adding a slope dummy change a single linear regression formula?

A

single linear regression
y = b0 + b1x

single linear regression with slope dummy variable 0
y = b0 + b1x

single linear regression with slope dummy variable 1

y = b0 + (b1+d1)x

d1 = dummy variable of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does adding a slope dummy variable affect the regression on a graph?

A

intercept stays the same but with a dummy variable the regression line becomes steeper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If p value is greater than or less than 5% when can you reject and when can you not reject?

A

p value > 0.05 don’t reject the null hypothesis

p value < 0.05 reject the null hypothesis (indicates a significant result)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s the formula for odds of an event occurring?

A

P / (1-p)

p = probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What’s logit odds and why use logit odds?

A

logistic regression (logit)

log odds = In (P/1-p)

we use log odds because regular odds the outcome value or regression line can be greater than 1. where as log odds transforms the data so the probability of odds falls between 0 and 1 and the regression line stays within 0 and 1

17
Q

What is the regression equation for logistic transformation or logit and how can it be reorganized to isolate the probability?

A

In (p / 1-p) = b0 +b1X1 + b2X2 + b3X3 + e

P = 1 / (1 + exponent (- b0 +b1X1 + b2X2 + b3X3 ))

b = slope coefficient
x = mean of independent variable

18
Q

What is Maximum likelihood estimation (MLE)?

A
  • statistical method for finding the parameters of a distribution that best fit a set of observed data, in essence we want to move our distribution over a set of data that will include the most amount of our data points which is usually the mean.
19
Q

What is the likelihood ratio (LR) test, ratio, and what does high vs low LR mean, and what are the ranges an LR can take on?

A

LR is always negative

LR = assesses how well two statistical models fit a dataset by comparing their likelihoods (the probability of observing the data point given the models). Essentially, it helps determine if a more complex model significantly improves the fit over a simpler one

LR higher = better fit

LR = -2 (log likelihood restricted model - log likelihood unrestricted model)