Regression Flashcards
OLS- Assumptions of residual
Normality – distribution should be normal QQ plot of residuals)
Linearity between I and DV as well as multivariate normality of predictors (no collinearity)
Homoscedasticity – variance of Y does not depend on X (residuals vs. X and predicted values)
Independence – error of one case provides no information for errors of another case
OLS- Suppressor
when relationship between two predictors hide or suppress their real relationship with Y (Cohen et al., 2003).
Spurious Effect
full redundancy – also confounding effect; X2 related to Y1 only because X2 causes both
Collinearity
one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy
Logit regression
binary DV
when distribution of errors is Bernoulli/ standard logistic (flatter than normal)
Probit regression
when distribution of errors is normal (or artificially dichotomous)
Logit/probit assumptions
Variables are normally distributed
Linear relationship between X and Y (To detect linearity: use theory, examine residual plots, or run a regression for squared/cubic terms
Measures are reliable
Logit best when (in comparison to ols)
Better fit for low base-rate phenomenon (10-20%). (If larger than 50%, use OLS)
Poisson Regression
form of regression analysis in which the relationship between the independent variables and dependent variables are modeled in the nth degree polynomial.
models are usually fit with the method of least squares. The least square method minimizes the variance of the coefficients,under the Gauss Markov Theorem.
Polynomial Regression is a special case of Linear Regression where we fit the polynomial equation on the data with a curvilinear relationship between the dependent and independent variables.
Poisson/Polynomial Regression Assumptions
The behavior of a dependent variable can be explained by a linear, or curvilinear, additive relationship between the dependent variable and a set of k independent variables (xi, i=1 to k).
The relationship between the dependent variable and any independent variable is linear or curvilinear (specifically polynomial).
The independent variables are independent of each other.
The errors are independent, normally distributed with mean zero and a constant variance (OLS).
Negative Binomial
a discrete distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified number of failures are observed
shares many common assumptions with Poisson regression, such as linearity in model parameters, independence of individual observations, and the multiplicative effects of independent variables.
However, comparing with Poisson regression, negative binomial regression allows the conditional variance of the outcome variable to be greater than its conditional mean, which offers greater flexibility in model fitting
Maximum Likelihood
looking for a curve that maximizes the probability of our data given a set of curve parameters. In other words, we maximize probability of data while we maximize likelihood of a curve.
MLE Assumptions
i.i.d. assumption. These assumptions state that:
1) Data must be independently distributed.
2) Data must be identically distributed.
Multinomial logistic Regression
used to predict categorical placement in or the probability of category membership on a dependent variable based on multiple independent variables.
The independent variables can be either dichotomous (i.e., binary) or continuous (i.e., interval or ratio in scale).
Multinomial logistic Regression Assumptions
Assumption 1- Your dependent variable should be measured at the nominal level with more than or equal to three values.
Assumption 2- You have one or more independent variables that are continuous, ordinal or nominal (including dichotomous variables). However, ordinal
independent variables must be treated as being either continuous or categorical.
Assumption 3- You should have independence of observations and the dependent variable
should have mutually exclusive and exhaustive categories (i.e. no individual belonging to two different categories!).
Assumption 4- There should be no multicollinearity. Multicollinearity occurs when you have two or more independent variables that are highly correlated with each other.
Assumption 5- There needs to be a linear relationship between any continuous independent variables and the logit transformation of the dependent variable.
Assumption 6- There should be no outliers, high leverage values or highly influential points for the scale/continuous variables.