Regression Flashcards

Question 1

Q

OLS- Assumptions of residual

Answer

A

Normality – distribution should be normal QQ plot of residuals)

Linearity between I and DV as well as multivariate normality of predictors (no collinearity)

Homoscedasticity – variance of Y does not depend on X (residuals vs. X and predicted values)

Independence – error of one case provides no information for errors of another case

Question 2

Q

OLS- Suppressor

Answer

A

when relationship between two predictors hide or suppress their real relationship with Y (Cohen et al., 2003).

Question 3

Q

Spurious Effect

Answer

A

full redundancy – also confounding effect; X2 related to Y1 only because X2 causes both

Question 4

Q

Collinearity

Answer

A

one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy

Question 5

Q

Logit regression

Answer

A

binary DV

when distribution of errors is Bernoulli/ standard logistic (flatter than normal)

Question 6

Q

Probit regression

Answer

A

when distribution of errors is normal (or artificially dichotomous)

Question 7

Q

Logit/probit assumptions

Answer

A

Variables are normally distributed

Linear relationship between X and Y (To detect linearity: use theory, examine residual plots, or run a regression for squared/cubic terms

Measures are reliable

Question 8

Q

Logit best when (in comparison to ols)

Answer

A

Better fit for low base-rate phenomenon (10-20%). (If larger than 50%, use OLS)

Question 9

Q

Poisson Regression

Answer

A

form of regression analysis in which the relationship between the independent variables and dependent variables are modeled in the nth degree polynomial.

models are usually fit with the method of least squares. The least square method minimizes the variance of the coefficients,under the Gauss Markov Theorem.

Polynomial Regression is a special case of Linear Regression where we fit the polynomial equation on the data with a curvilinear relationship between the dependent and independent variables.

Question 10

Q

Poisson/Polynomial Regression Assumptions

Answer

A

The behavior of a dependent variable can be explained by a linear, or curvilinear, additive relationship between the dependent variable and a set of k independent variables (xi, i=1 to k).

The relationship between the dependent variable and any independent variable is linear or curvilinear (specifically polynomial).

The independent variables are independent of each other.

The errors are independent, normally distributed with mean zero and a constant variance (OLS).

Question 11

Q

Negative Binomial

Answer

A

a discrete distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified number of failures are observed

shares many common assumptions with Poisson regression, such as linearity in model parameters, independence of individual observations, and the multiplicative effects of independent variables.

However, comparing with Poisson regression, negative binomial regression allows the conditional variance of the outcome variable to be greater than its conditional mean, which offers greater flexibility in model fitting

Question 12

Q

Maximum Likelihood

Answer

A

looking for a curve that maximizes the probability of our data given a set of curve parameters. In other words, we maximize probability of data while we maximize likelihood of a curve.

Question 13

Q

MLE Assumptions

Answer

A

i.i.d. assumption. These assumptions state that:

1) Data must be independently distributed.
2) Data must be identically distributed.

Question 14

Q

Multinomial logistic Regression

Answer

A

used to predict categorical placement in or the probability of category membership on a dependent variable based on multiple independent variables.

The independent variables can be either dichotomous (i.e., binary) or continuous (i.e., interval or ratio in scale).

Question 15

Q

Multinomial logistic Regression Assumptions

Answer

A

Assumption 1- Your dependent variable should be measured at the nominal level with more than or equal to three values.

Assumption 2- You have one or more independent variables that are continuous, ordinal or nominal (including dichotomous variables). However, ordinal
independent variables must be treated as being either continuous or categorical.

Assumption 3- You should have independence of observations and the dependent variable
should have mutually exclusive and exhaustive categories (i.e. no individual belonging to two different categories!).

Assumption 4- There should be no multicollinearity. Multicollinearity occurs when you have two or more independent variables that are highly correlated with each other.

Assumption 5- There needs to be a linear relationship between any continuous independent variables and the logit transformation of the dependent variable.

Assumption 6- There should be no outliers, high leverage values or highly influential points for the scale/continuous variables.

Question 16

Q

Cox Regression model (survival)

Answer

Study These Flashcards

A

builds a predictive model for time-to-event data.

produces a survival function that predicts the probability that the event of interest has occurred at a given time t for given values of the predictor variables

information from censored subjects, that is, those that do not experience the event of interest during the time of observation, contributes usefully to the estimation of the model.

Question 17

Q

Cox Regression model (survival) Assumptions

Answer

Study These Flashcards

A

1) independence of survival times between distinct individuals in the sample,

2) a multiplicative relationship between the predictors and the hazard (as opposed to a linear one as was the
case with multiple linear regression analysis, discussed in more detail below)

3) a constant hazard ratio over time.

Question 18

Q

Problems with models and solutions

Answer

Study These Flashcards

A

Misspecification

Heteroskedascity

Multicollinearity

Endogeneity

Question 19

Q

Misspecification

Answer

Study These Flashcards

A

if we look at distribution & something looks wrong:

Don’t have true dichotomous variable

Omitted variables

Linear vs. nonlinear (interactions)

Question 20

Q

Heteroskedascity

Answer

Study These Flashcards

A

don’t want pattern - a situation where the variance of the residuals is unequal over a range of measured values. If heteroskedasticity exists, the population used in the regression contains unequal variance, the analysis results may be invalid.– this is a plot of
error – if pattern, could be problems with:

Normality

Outliers

Linearity

Question 21

Q

Multicollinearity

Answer

Study These Flashcards

A

inflates standard errors and deflates t-value

Correct data with bootstrapping

Principle component – force orthogonal

Centering helps

Correlation table – want correlations no higher than 0.8

VIF – amount of shared variance among variables – no higher than 10

R squared

Question 22

Q

Endogeneity

Answer

Study These Flashcards

A

Omitted variables

Simultaneity

Omitted selection

Common method variance

Measurement error

Regression Flashcards

(22 cards)