Test Construction Flashcards
Alternate Forms Reliability
coefficient of equivalence
Coefficient Alpha (Chronback’s Alpha)
method for assessing internal consistency reliability when items are not answered dichotomously
KR-20
method for assessing internal consistency reliability when items are answered dichotomously (they are either correct or not correct)
Kappa Statistic
used to measure inter-rater reliability when data are nominal or ordinal (discontinuous)
Test-Retest Reliability
yields a coefficient of stability
Spearman Brown Formula
corrects for the artificially low reliability coefficient from testing via split-half reliability (low coefficient due to shorter test length)
Size of reliability coefficient
smaller if it’s easy to get correct answer via random chance
Difficulty Index
btwn 0 (no one can answer correct) - 1 (everyone answers correct)
orthogonal factors v. oblique factors
orthogonal=uncorrelated (independent), oblique=correlated (dependent)
Concurrant Validity
type of criterion-related validity. extent to which scores related to an external criterion
Divergent (Discriminant) Validity
When scores on a measure are correlated with scores on unrelated traits (large coefficient) that is bad
cross-validation
done during test revision, associated with “shrinkage” of the criterion-related validity
external validity
researcher’s ability to generalize the results of the study to other individuals, settings, conditions
internal validity
researcher’s ability to determine whether there is a causal relationship between variables
pearson r
method of measuring inter-rater reliability, method for calculating criterion-related validity when both are on continuous scale
methods of assessing internal consistency reliability
-split half (must correct with spearman brown) -KR-20 -chronback’s alpha
4 methods of assessing reliability
inter-rater, internal consistency, alternate forms, test-retest
standard error of measurement
the standard deviation of a theoretically normal distribution of test scores acquired by one individual on equivalent tests (related to the reliability coefficient and the SD of the test)
calculating confidence interval of a true test score
person’s score + or - one or two standard errors of measurement (68% vs 95%)
standard error of estimate
standard deviation of a theoretically normal distribution of criterion scores obtained by one person measured repeatedly
Taylor-Russell Tables
numerically describe amount of improvement in decisions when a predictor is introduced
incremental validity is optimized when
base rate is moderate (.5), and selection ratio is low
item response theory
used to predict to what extent an examinee contains a certain trait based on response to a particular item
factors affecting criterion-related validity
range of scores (more heterogenous testers means higher validity), reliability of the predictor, reliability of the predictor and criterion, criterion contamination (usually results in inflated validity)
relationship between reliability of the predictor and criterion related validity
criterion-related validity cannot be higher than the square root of the reliability of the predictor
percentile corresponding with one standard deviation above the mean, and 2 standard deviations above
84, 97
type 1 error
the null hypothesis is falsely rejected
Spearman rank order (rho) correlation coefficient
used when both variables are ranks
phi correlation coefficient
used when both variables are true dichotomies
biserial correlation coefficient
when one variable is continuous and one is an artificial dichotomy
contingency correlation coefficient
when both variables are nominal
when there is a moderator variable, make sure the test has
differential validity
in a positively skewed distribution from greatest to lowest
mean, median, mode
MST
“mean square total”=measure of treatment effects and error (MSB+MSW)
MSW
“mean square within”=estimate of variability that is due purely to error
MSB
“mean square between”=estimate of variability due to treatment effects plus error
degrees of freedom for t-test for independent samples
N-2
where item characteristic curve hits the y axis
probability of getting answer right by guessing
statistical regression
tendency for extreme test scores to move toward the mean on retesting (threat to internal consistency if participants selected due to extreme scores on pre-test)
purpose of rotation in factor analysis
makes pattern of factor loadings easier to interpret
Solomon Four Group Design
controls for testing/test practice (which is a threat to internal validity)
F ratio
mean square between groups divided by mean square within groups (MSB/MSW)
eta
used to calculate correlation btwn x and y when relationship thought to be curvilinear