Linear Regression Models Flashcards
What are some recommendations when selecting control variables?
- When in doubt, leave them out!
- Select conceptually meaningful CVs and avoid proxies
- When feasible, include CVs in hypotheses and models
- Clearly justify the measures of CVs and the methods of control.
- Subject CVs to the same standards of reliability and validity as are applied to other variables.
- If the hypotheses do not include CVs, do not include CVs in the analysis
- Conduct comparative tests of relationships between the IVs and CVs.
- Run results with and without the CVs and contrast the findings.
- Report standard descriptive statistics and correlations for CVs, and the correlations between the measured predictors and their partialled counterparts.
- Be cautious when generalizing results involving residual variables.
What is regression analysis?
Regression analyses are a set of statistical techniques that allow one to assess the relationship between one dependent variable DV and several IVs.
What does a parameter estimate explain?
Parameter estimates in multiple regression are the unstandardized regression coefficients (B weights). B weight for a particular IV represents the change in the DV associated with a one unit change in that IV, all other IVs held constant.
What is one important limitation to Regression Analyses ?
Regression analyses reveal relationships among variables but do not imply that the relationships are causal.
An apparently strong relationship between variables could stem from many sources, including the influence of other, currently unmeasured variables.
What are some assumptions of multiple regression?
- Linearity: the population model is linear in it’s parameters.
- Random sampling
- No perfect collinearity/ multicollinearity
- Zero conditional mean of error: can be cause by misspecifiing relationships or omitting important variables correlated with IVs.
- Homoscedasdicity: IVs error term must have the same variance.
- Normality
- Outliers
What does the assumption of homoscedasticity mean?
It is the assumption that the standard deviations of errors of prediction are approximately equal for all predicted DV scores.
Homoscedasticity means that the band enclosing the residuals is approximately equal in width at all values of the predicted DV.
Heteroscedasticity may occur when some of the variables are skewed and others are not.
Mention the Major Types of Multiple Regression.
There are three major analytic strategies in multiple regression: standard multiple regression, sequential (hierarchical) regression, and statistical (stepwise) regression.
Differences among the strategies involve what happens to overlapping variability due to correlated IVs and who determines the order of entry of IVs into the equation.
Explain Standard Multiple Regression
In the standard, or simultaneous, model, all IVs enter the regression equation at once.
Each IV is evaluated in terms of what it adds to prediction of the DV that is different from the predictability afforded by all the other IVs.
Explain Sequential Multiple Regression
In sequential regression (hierarchical regression), IVs enter the equation in an order specified by the researcher.
Each IV is assessed in terms of what it adds to the equation at its own point of entry.
The researcher normally assigns order of entry of variables according to logical or theoretical considerations. Variables with greater theoretical importance could also be given early entry.
Explain Statistical (Stepwise) Regression
Statistical regression (stepwise regression) is a procedure, in which order of entry of variables is based solely on statistical criteria. It is typically used to develop a subset of IVs that is useful in predicting the DV, and to eliminate those IVs that do not provide additional prediction to the IVs already in the equation.
There are three versions of statistical regression: forward selection, backward deletion, and stepwise regression.
In forward selection, the equation starts out empty and IVs are added one at a time provided they meet the statistical criteria for entry. Once in the equation, an IV stays in. Often start with the variable with the highest simple correlation with the DV.
stepwise regression: same as forward, but also remove variables that are least useful in explaining variance.
In backward deletion, the equation starts out with all IVs entered and they are deleted one at a time if they do not contribute significantly to regression.
What are the reasons for choosing the different types of regression?
To simply assess relationships among variables and answer the basic question of multiple correlation, the method of choice is the standard multiple regression.
Reasons for using sequential regression are theoretical or for testing hypotheses. It allows the researcher to control the advancement of the regression process.
Statistical regression is a model-building rather than model-testing procedure. As an exploratory technique, it may be useful for such purposes as eliminating variables that are clearly superfluous to tighten up future research.
When should you use centering in multiple regression?
When you want to include interactions of IVs in the prediction equation, they can cause problems of multicollinearity unless they have been centered: converted to deviation scores so that each variable has a mean of zero
Centering an IV does not affect its simple correlation with other variables, but it does affect regression coefficients for interactions or powers of IVs included in the regression equation.
What is a type I error?
we reject H0 although it is truth
What is a type II error?
the error that occurs when one fails to reject a null hypothesis that is actually false
What is used for goodness of fit test in multiple regression?
R^2 and adjusted R^2.
Adjusted R^2 is best because it penalizes for added IVs.
It explains how much variance in the DV is explained by our model.