Regression, Simple, Multiple, & Logistic (Fields Ch. 7 & 8) Flashcards
Adjusted R_
a measure of the loss of predictive power or shrinkage in regression. The adjusted R_ tells us how much variance in the outcome would be accounted for if the model had been derived from the population from which the sample was taken.
Autocorrelation
when the residuals of two observations in a regression model are correlated.
bi
unstandardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in the units of measurement of the predictor. It is the change in the outcome associated with a unit change in the predictor.
_i
standardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in a standardized form. It is the change in the outcome (in standard deviations) associated with a one standard deviation change in the predictor.
Cook’s distance
a measure of the overall influence of a case on a model. Cook and Weisberg (1982) have suggested that values greater than 1 may be cause for concern.
Cross-validation
assessing the accuracy of a model across different samples. This is an important step in generalization. In a regression model there are two main methods of cross-validation: adjusted R_ or data splitting, in which the data are split randomly into two halves, and a regression model is estimated for each half and then compared.
Dummy variables
a way of recoding a categorical variable with more than two categories into a series of variables all of which are dichotomous and can take on values of only 0 or 1. There are seven basic steps to create such variables: (1) count the number of groups you want to recode and subtract 1; (2) create as many new variables as the value you calculated in step 1 (these are your dummy variables); (3) choose one of your groups as a baseline (i.e., a group against which all other groups should be compared, such as a control group); (4) assign that baseline group values of 0 for all of your dummy variables; (5) for your first dummy variable, assign the value 1 to the first group that you want to compare against the baseline group (assign all other groups 0 for this variable); (6) for the second dummy variable assign the value 1 to the second group that you want to compare against the baseline group (assign all other groups 0 for this variable); (7) repeat this process until you run out of dummy variables.
F-ratio
a test statistic with a known probability distribution (the F-distribution). It is the ratio of the average variability in the data that a given model can explain to the average variability unexplained by that same model. It is used to test the overall fit of the model in simple regression and multiple regression, and to test for overall differences between group means in experiments.
Generalization
the ability of a statistical model to say something beyond the set of observations that spawned it. If a model generalizes it is assumed that predictions from that model can be applied not just to the sample on which it is based, but to a wider population from which the sample came.
Goodness of fit
an index of how well a model fits the data from which it was generated. It’s usually based on how well the data predicted by the model correspond to the data that were actually collected.
Heteroscedasticity
the opposite of homoscedasticity. This occurs when the residuals at each level of the predictor variables(s) have unequal variances. Put another way, at each point along any predictor variable, the spread of residuals is different.
Hierarchical regression
a method of multiple regression in which the order in which predictors are entered into the regression model is determined by the researcher based on previous research: variables already known to be predictors are entered first, new variables are entered subsequently.
Homoscedasticity
an assumption in regression analysis that the residuals at each level of the predictor variable(s) have similar variances. Put another way, at each point along any predictor variable, the spread of residuals should be fairly constant.
Independent errors
for any two observations in regression the residuals should be uncorrelated (or independent).
Mean squares
a measure of average variability. For every sum of squares (which measure the total variability) it is possible to create mean squares by dividing by the number of things used to calculate the sum of squares (or some function of it).
Model sum of squares
a measure of the total amount of variability for which a model can account. It is the difference between the total sum of squares and the residual sum of squares.
Multicollinearity
a situation in which two or more variables are very closely linearly related.
· To check for multicollinearity, use the VIF values from the table labelled Coefficients in the SPSS output.
· If these values are less than 10, then there probably isn’t cause for concern.
· If you take the average of VIF values, and it is not substantially greater than 1, then there’s also no cause for concern.
Multiple R
the multiple correlation coefficient. It is the correlation between the observed values of an outcome and the values of the outcome predicted by a multiple regression model.
Multiple regression
an extension of simple regression in which an outcome is predicted by a linear combination of two or more predictor variables. The form of the model is: (see above image) in which the outcome is denoted as Y, and each predictor is denoted as X. Each predictor has a regression coefficient b associated with it, and b0 is the value of the outcome when all predictors are zero.
Ordinary least squares (OLS)
a method of regression in which the parameters of the model are estimated using the method of least squares.
Outcome variable
a variable whose values we are trying to predict from one or more predictor variables.
Perfect collinearity
exists when at least one predictor in a regression model is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated - they have a correlation coefficient of 1).
Predicted value
the value of an outcome variable based on specific values of the predictor variable or variables being placed into a statistical model.
Predictor variable
a variable that is used to try to predict values of another variable known as an outcome variable.