Regression Flashcards

1
Q

OLS- Assumptions of residual

A

Normality – distribution should be normal QQ plot of residuals)

Linearity between I and DV as well as multivariate normality of predictors (no collinearity)

Homoscedasticity – variance of Y does not depend on X (residuals vs. X and predicted values)

Independence – error of one case provides no information for errors of another case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

OLS- Suppressor

A

when relationship between two predictors hide or suppress their real relationship with Y (Cohen et al., 2003).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Spurious Effect

A

full redundancy – also confounding effect; X2 related to Y1 only because X2 causes both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Collinearity

A

one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Logit regression

A

binary DV

when distribution of errors is Bernoulli/ standard logistic (flatter than normal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probit regression

A

when distribution of errors is normal (or artificially dichotomous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Logit/probit assumptions

A

Variables are normally distributed

Linear relationship between X and Y (To detect linearity: use theory, examine residual plots, or run a regression for squared/cubic terms

Measures are reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logit best when (in comparison to ols)

A

Better fit for low base-rate phenomenon (10-20%). (If larger than 50%, use OLS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Poisson Regression

A

form of regression analysis in which the relationship between the independent variables and dependent variables are modeled in the nth degree polynomial.

models are usually fit with the method of least squares. The least square method minimizes the variance of the coefficients,under the Gauss Markov Theorem.

Polynomial Regression is a special case of Linear Regression where we fit the polynomial equation on the data with a curvilinear relationship between the dependent and independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Poisson/Polynomial Regression Assumptions

A

The behavior of a dependent variable can be explained by a linear, or curvilinear, additive relationship between the dependent variable and a set of k independent variables (xi, i=1 to k).

The relationship between the dependent variable and any independent variable is linear or curvilinear (specifically polynomial).

The independent variables are independent of each other.

The errors are independent, normally distributed with mean zero and a constant variance (OLS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Negative Binomial

A

a discrete distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified number of failures are observed

shares many common assumptions with Poisson regression, such as linearity in model parameters, independence of individual observations, and the multiplicative effects of independent variables.

However, comparing with Poisson regression, negative binomial regression allows the conditional variance of the outcome variable to be greater than its conditional mean, which offers greater flexibility in model fitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Maximum Likelihood

A

looking for a curve that maximizes the probability of our data given a set of curve parameters. In other words, we maximize probability of data while we maximize likelihood of a curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

MLE Assumptions

A

i.i.d. assumption. These assumptions state that:

1) Data must be independently distributed.
2) Data must be identically distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Multinomial logistic Regression

A

used to predict categorical placement in or the probability of category membership on a dependent variable based on multiple independent variables.

The independent variables can be either dichotomous (i.e., binary) or continuous (i.e., interval or ratio in scale).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Multinomial logistic Regression Assumptions

A

Assumption 1- Your dependent variable should be measured at the nominal level with more than or equal to three values.

Assumption 2- You have one or more independent variables that are continuous, ordinal or nominal (including dichotomous variables). However, ordinal
independent variables must be treated as being either continuous or categorical.

Assumption 3- You should have independence of observations and the dependent variable
should have mutually exclusive and exhaustive categories (i.e. no individual belonging to two different categories!).

Assumption 4- There should be no multicollinearity. Multicollinearity occurs when you have two or more independent variables that are highly correlated with each other.

Assumption 5- There needs to be a linear relationship between any continuous independent variables and the logit transformation of the dependent variable.

Assumption 6- There should be no outliers, high leverage values or highly influential points for the scale/continuous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cox Regression model (survival)

A

builds a predictive model for time-to-event data.

produces a survival function that predicts the probability that the event of interest has occurred at a given time t for given values of the predictor variables

information from censored subjects, that is, those that do not experience the event of interest during the time of observation, contributes usefully to the estimation of the model.

17
Q

Cox Regression model (survival) Assumptions

A

1) independence of survival times between distinct individuals in the sample,

2) a multiplicative relationship between the predictors and the hazard (as opposed to a linear one as was the
case with multiple linear regression analysis, discussed in more detail below)

3) a constant hazard ratio over time.

18
Q

Problems with models and solutions

A

Misspecification

Heteroskedascity

Multicollinearity

Endogeneity

19
Q

Misspecification

A

if we look at distribution & something looks wrong:

Don’t have true dichotomous variable

Omitted variables

Linear vs. nonlinear (interactions)

20
Q

Heteroskedascity

A

don’t want pattern - a situation where the variance of the residuals is unequal over a range of measured values. If heteroskedasticity exists, the population used in the regression contains unequal variance, the analysis results may be invalid.– this is a plot of
error – if pattern, could be problems with:

Normality

Outliers

Linearity

21
Q

Multicollinearity

A

inflates standard errors and deflates t-value

Correct data with bootstrapping

Principle component – force orthogonal

Centering helps

Correlation table – want correlations no higher than 0.8

VIF – amount of shared variance among variables – no higher than 10

R squared

22
Q

Endogeneity

A

Omitted variables

Simultaneity

Omitted selection

Common method variance

Measurement error