validity part 2 Flashcards
explain incremental validity
Does using this test improve upon other tests that we are already using?
-Yes, if the correlation between the criterion and the new test is significantly greater in magnitude than the correlations between the criterion and the other tests
Does adding this test to a battery of tests that we are already using allow us to predict the criterion more accurately?
-Use Hierarchical Multiple Regression
what is the regression formula
Allows us to estimate the score on a dependent variable (Y) from an independent variable (X)
Yest = BX + C
- B = “beta weight” (how much to weight the score on the independent variable; the number of units Y increases for each unit increase in X)
- C = constant (the value of the dependent variable when the independent variable is 0).
NOTE: The dependent variable (Y) is the variable we want to predict from the independent variable (X)
B = slope of the line C = y-axis intercept
how to calculate regression formula
-To calculate a regression formula, we must have measured values for both X and Y in our sample (this is known as the derivation sample)
The regression formula gives us the best equation for predicting Y from X
- B = by what number should we multiply each value of X
- C = what is the value of Y when X = 0
We then hope to be able to use the formula in a new sample where we have also measured values of X and Y
We want to see how well the formula works in the new sample
But ….SHRINKAGE will occur
what is shrinkage
- Always occurs when we apply a regression formula calculated on a sample to a new sample
- This happens because the calculated values of B and C are optimal for the derivation sample, and therefore could be capitalizing on chance relationships within this sample
what is cross validation
- To estimate the degree of shrinkage, we apply the regression formula to a new sample (cross-validation sample)
- We examine the correlation between the actual values of Y (which were measured when the original MR formula was developed) and the values of Y predicted by the formula (Yest) in both samples
- If the correlation is significantly lower in the cross-validation sample (i.e., if substantial shrinkage has occurred), then the formula cannot be generalized beyond the derivation sample
- Important: For a fair test of cross-validation, the derivation sample and cross-validation sample must be matched on relevant demographic factors.
cross validation: calculating shrinkage
- Calculate the values of Yest, B, and C in Sample 1 using the measured values for Y and X in Sample 1
- Calculate the correlation between estimated values of Y and the actual values of Y in Sample 1
- Recruit Sample 2 which is matched in important respects to Sample 1
- Measure the independent variable (X) in Sample 2
- Calculate the estimated values of Y in Sample 2 by applying the values of B and C that were calculated in Sample 1 (i.e., enter into the formula BX + C).
- Compare the estimated values of Y (Yest) to the actual values of Y in sample 2 by calculating the correlation between Yest and Y in Sample 2
- Compare the correlation between Yest and Y in Sample 2 (Step 6) to the correlation between Yest and Y in Sample 1 (Step 2). The difference between these two correlation coefficients is the shrinkage
double cross validation
- Common procedure = recruit a large sample then split the sample into two (Samples 1 and 2)
- Sample 2 is the cross-validation sample for Sample 1 (weights calculated in Sample 1 are applied to Sample 2 and shrinkage estimated)
- Sample 1 is the cross-validation sample for Sample 2 (weights calculated in Sample 2 are applied to Sample 1 and shrinkage estimated)
formula calculated on sample 1, cross validated on sample 2 and vice versa
regression simple vs. multiple
simple: one IV
multiple: two or more IVs
multiple correlation
(R) = correlation between the actual values of Y and the values of Y that are predicted by the multiple regression (MR) formula.
squaring the multiple correlation
(R2) tells us the amount of variance in the DV that can be predicted by the combined set of IVs
what to do when satisfied with the results of the shrinkage analysis
then re-combine the samples and calculate a new regression formula on the entire sample
multiple regression: sample size
Absolute minimum is 10 participants per IV
-e.g., if we have 3 IV’s we need at least 30 participants
Preferable to have 20, some sources say 50 or even up to 100 participants per IV
The larger the sample, the smaller the shrinkage
multiple regression: multiplecollinearity
- Occurs when IV’s are highly correlated with one another
- In this case, two variables are really measuring the same construct and so might be redundant
Why a problem?
- Requires a larger sample size
- Could result in larger shrinkage
hierarchical: multiple regression
- We determine by how much R-squared increases when an additional IV is added to the regression formula
- For example, when we add an additional test to an existing battery of tests
different methods of multiple regression
hierarchical, stepwise, logistic, discriminant