Chapter 8: Advanced Correlational Strategies Flashcards
Regression Equation
An equation that provides a mathematical description of how variables are related; allows us to predict scores on one variable based on one or more other variables.
Linear Regression
Involves 1 predictor variable.
Multiple Regression
Involves 2+ predictor variables.
Linear Regression Equation
An equation that defines the straight line that best represents the linear relationship between two variables.
y = β₀ + β₁x
y = dependent/criterion/outcome variable x = predictor variable β₀ = regression constant (y-intercept) β₁ = regression coefficient (slope)
b
The unstandardized regression coefficient; you can interpret b as the predicted change in y given a one unit change in x (use the original scales of each variable).
β
The standardized regression coefficient; the predicted change in y (in standard deviations) for a one standard deviation change in x (independent of the original scales of the variables); allows you to compare the size of different slopes.
Residual
The distance of each data point from the regression line
(y - ŷ); residuals represent unexplained error.
“Least Squares Criterion”
Minimize distance between all data points and the line (i.e. minimize “residual error”).
Three Types of Multiple Regression
- Standard (or Simultaneous)
- Stepwise
- Hierarchical
Standard (Simultaneous) Multiple Regression
All of the predictor variables are entered into the regression analysis at the same time. The resulting equation provides a regression constant (β₀ or intercept) and separate regression coefficients for each predictor (β₁, β₂, β₃, …).
Stepwise Multiple Regression
Builds the regression equation by entering predictor variables one at a time based on their ability to predict the outcome variable. Each step looks at unique associations (unique variance “above and beyond” the other predictors).
Hierarchical Multiple Regression
The predictor variables are entered into the equation in an order that is predetermined by the researcher. As each new variable is entered into the equation, the researcher tests whether the new variable significantly predicts unique variance in the criterion variable. Can be used to control for confounding variables, to test interactions with continuous variables (moderation), and to test for mediation.
R
The multiple correlation coefficient; it describes the degree of the relationship between the criterion variable and the set of ALL predictor variables. The larger the value of R, the better job the regression equation does of predicting the criterion variable from the predictor variables.
R²
The proportion of variance in the criterion variable that can be accounted for by the set of all predictor variables.
How do you control for confounding variables using hierarchical multiple regression (steps)?
Step 1: Enter the “confound” or control variable.
Step 2: Enter the predictor you are interested in to test its unique effects, over and above the control variable.
Main Effect
Individual effect due to one variable (e.g. does exercise intensity influence sleep?).
Interaction
Combined effect of two variables (e.g. does the effect of exercise intensity on sleep differ depending on time of day?). An interaction means that the effect of one variable depends on the value of another variable.
How do you test interactions using hierarchical multiple regression?
- Center your predictor variables.
- Calculate an “interaction term” (multiply the two centered predictor variables).
- Conduct a hierarchical multiple regression:
- Step 1: Enter the two centered predictor variables.
- Step 2: Enter the interaction term.
- Interpret the results at Step 2 (two main effects, interaction)
Moderation
An interaction can also be called moderation (one variable moderates the effect of the other).
Mediation
The association between a predictor and outcome variable can be accounted for or explained by another variable.
Step 1: Show that the predictor predicts the outcome.
Step 2: Add the mediating variable and see if it helps explain the association.
Cross-Lagged Panel Design
In a cross-lagged panel design, the correlation between two variables is calculated at two different points in time.
- Correlate the scores on x at T1 with y at T2
- Correlate the scores on y at T1 with x at T2.
If x causes y, then the correlation between x at T1 and y at T2 should be larger than the correlation between y at T1 and x at T2.
Structural Equation Modeling (SEM)
In structural equation modeling, the researcher makes a prediction regarding how a set of variables are causally related. This prediction implies that the variables ought to be correlated in a particular pattern. This predicted pattern is then compared to the actual pattern of correlations.
Fit Index
In structural equation modeling, the fit index indicates how well the hypothesized model fits the observed data. If the hypothesized model does not adequately fit the data, we can conclude that the model is not likely to be correct. By comparing fit indices for various models, the researcher can determine which model fits the data the best.
Factor Analysis
A class of statistical techniques that are used to analyze the interrelationships among a large number of variables. The presence of correlations among several variables suggests that the variables may all be related to some underlying factors.
Used of Factor Analysis
- To study the underlying structure of psychological constructs.
- To reduce a large number of variables to a smaller, more manageable set of data.
- To confirm that self-report measures of attitude and personality are unidimensional (measure only one thing).