Lecture 5 Flashcards
What is Validity?
-how well are we measuring what we are supposed to measure?
-total variance of test scores (o2x) = construct of interest (o2ci)+ systematic error of measurement (o2se) + random error of measurement (o2re)
-test scores must be reliable in order to be valid
-validity is about the proportion of variance that can be attributed to construct of interest (validity = o2ci / o2x)
What are the types of validity?
-face validity
-content validity
-criterion-related validity (multiple subtypes)
-experimental validity
-construct validity
What is face validity?
-test appears to be assessing what it is supposed to assess
-extent to which a test is subjectively viewed as covering the construct it is supposed to assess
-usefulness: acceptation, cooperation, etc.
-limits: bias, social desirability, etc.
only one that is optional
What is content validity?
-test covers key aspects of the construct it aims to assess (includes representative sample of target behaviours
-typically assessed by experts (scholars, clinicians, etc.)
What is criterion-related validity?
-established via comparison of test scores with “objective” criterion assumed to provide some “true” reflection of the underlying construct
-criterion refers to an external measure or source of information that informs us about the real presence of the construct that we want to assess
What are the 2 categories in the first subtype of criterion-related validity?
-concurrent: criterion is administered simultaneously
-predictive: criterion is administered later (selection - problems)
What are the 4 categories in the second subtype of criterion-related validity?
-congruent: criterion assesses the same construct
-convergent: criterion assesses a related construct (different construct assumed to be related)
-discriminant: criterion assesses a construct known to be: (a) opposite of target construct (negative correlation); (b) unrelated to target construct (no correlation)
-discriminative: criterion is categorical (aim is to predict group membership) [do scores on the test differ between groups of people]
What are 2 ways criterion-related (concurrent or predictive) discriminative validity can be assessed?
-Mean comparisons
-Chi-Square
What are mean comparisons?
-groups (serving as the criterion) are compared based on their test scores treated as continuous variables (norm-referenced or not)
-ex: compare engineers and musicians scores on a test of “musical abilities” using a t-test
What is Chi-Square?
-groups (serving as the criterion) are compared based on their test scores treated as categorical variables (criterion-referenced).
-ex: compare frequency of individuals receiving a diagnosis of bipolar disorder based on test scores in a group of psychology students and a group of psychiatric patients
In which types of criterion-related validity do we expect a strong positive correlation and which formula is used?
-congruent and convergent [predictive]
-r2xy = o2ci / o2x (squaring the correlation between test and criterion measure –> rough indicator of how much of the total variance in your test can be attributed to the construct of interest as captured by that specific criterion measure)
-1 - r2xy = o2e(s+r) / o2x (how much error there is in test score in total combining the 2 sources of error [random and systematic error])
In selection procedures, which type of validity do we look at and using what method & formula?
-in selection we work with validity that is predictive [congruent and convergent]
-in selection, validity is assessed using regression (rather than a correlation [because it is unidirectional]).
-Y’ = a + b(X)
-there is always a discrepancy between the observed score on the criterion (Y) and the score that is predicted (Y’) based on the test scores (X), unless validity is perfect (which never happens).
-the difference, or discrepancy, is called the Prediction Error
What is the standard error of the estimate?
-exactly how much prediction error is there on the average when we use scores from the test to predict the outcome
What are the 2 ways the standard error of the estimate can be calculated?
-first method (long): (1) the prediction residuals are estimated: Y’ - Y; (2) the standard deviation of these residuals represent the standard error of the estimate.
-second method (short): √(1 - r2 xy) * oy (y=score on the criterion) [just squaring to total variance attributed to measurement error; cue card 11]
How do we calculate Confidence interval?
-CI for the predicted score on the criterion (e.g., success on the job):
-Y’ = a + b(X) [+/- (z)(standard error of the estimate)]
-Y’ = a + b(X) [+/- (1.96 or 2.58)standard error of the estimate]
How do we verify the efficacy of a selection process?
-we rely on analyses of sensitivity and specificity
A: true positive [efficient; selected] B: false positive [not efficient; selected]
C: false negative [efficient; not selected] D: true negative [not efficient; not selected]
What are all the factors that are used to describe the efficacy of a selection process?
-sensitivity = A/(A+C)
-specificity = D/(D+B)
-positive predictive power (PPP) = A/(A+B)
-negative predictive power (NPP) = D/(D+C)
-percentage of correct classification: A+D/A+B+C+D (all of the true/all participants)
-base rate = (A+C)/(A+B+C+D)
-selection rate = (A+B)/(A+B+C+D)
What is sensitivity?
-proportion of cases presenting the characteristic that are correctly identified = A/(A+C) [ability to identify true cases].
What is Specificity?
-proportion of the cases not presenting the characteristic that are correctly identified = D/(D+B) [ability to exclude true non-cases]
What is PPP?
-the chances that an individual who receive a positive diagnosis on your measure really suffers from that problem = A/(A+B)
What is NPP?
-the chances that an individual who receive a negative diagnosis on your measure really does not suffer from that problem = D/(D+C)
What is base rate?
-proportion of cases presenting the characteristic: (A+C)/(A+B+C+D)
-the higher the base rate, the lower the sensitivity. Because people presenting the characteristic will be excluded anyway (not enough room to admit them all)
-the lower the base rate, the lower the specificity. Because people not presenting the characteristic will need to be selected to meet the selection quotas (to fill the positions).
everyone who can do it
What is selection rate?
-proportion of cases that are selected: (A+B)/(A+B+C+D)
-when it is very high, there is no need for any selection (because you have enough room for everyone)
-the higher the selection rate, the lower the specificity (people not presenting the characteristic will be selected to fill the quotas)
-the lower the selection rate, the lower the sensitivity (people presenting the characteristic will be excluded as there is not enough room)
how many people are you filling
What is the random selection protocol?
-using base and selection rate
-calculate how much is left for the 2nd column and 2nd row by subtracting BR and SR from total sample size
-for A, look at the SR and multiple that by the number of sample from BR (same thing for B but using SR and the total of the really not efficient column)
- OR for B and C, subtract A from SR and BR; and subtract for D as well