Test Construction Flashcards
Reliability
consistency, repeatability rxx (0 to 1.0)(.85 is 85% reliable and 15% error) (0.8 is the minimum for acceptability)
o Test-retest-correlate 1st and 2nd score (r)
o Alternate forms- correlate form A and B
o Internal consistency
§ Splif half (correlate odd and even items) (Spearman-Brown)
§ KR 20/21 (use for dich items), coefficient alpha (cronbach’s alpha-use for mulitple scored items)(split test in half in every way/permutation then average all possible split hlfs)
o Inter-rater reliability-kappa
§ Tries to reduce subjectively
Factors affecting the reliability coefficient
§ Numbers of items-more items more reliable
§ More homogeneous items more reliable (similar content)
§ Unrestricted range of scores for more reliability-achieved by having diverse (heterogeneous) subjects (high, med and low scorers) and mid-rane difficulty items
§ Lower reliability if easy to guess (i.e. t/f)
Cohen’s d effect size
0.2 is small, 0.5 is med, 0.8 is large
Standard error of measurement
Deals with reliability- index of the amount of error is due to the unreliability of the test- helps establish the CI range of a measured score
The lower the SD the higher the rxx and lower the SEM
When rxx is 1 the SEM is 0 and when rxx is 0, SEM=SD
68% CI is use 1 SEM to get range
95% CI- 2 SEM
99% CI- 3 SEM
Content Validity
test familiarity with particular content or bx domain
use expert opinion- content validity ratio
construct validity
hypothetical trait
convergent and discriminant validity
factor analysis
multitrait-multimethod matrix
criterion-related
estimate or predict standing or performance
concurrent (similar time, estimate current status) and predictive validity (make predictions)
Interpreting Criterion-related validity coefficients:
shared variability is established by squaring the coeff.
rxy=0.6 become .36-36% of score is accounted for by predictive relationship
Standard Error of the Estimate (SEE)
used to construct a CI around a predicted (estimated) criterion score, criterion-related validity
Taylor-Russell Tables
table used to find probability of selecting successful employee using a certain measure (true positive or good hires or satisfactory performers)(i.e. 80% of new hires selected with new measure will be satisfactory performers)
Will need the following:
Criterion-related validity coefficient-correlation between score on measure and job performance
Base rate-rate of success before predictor is introduced (0 to 1)
o After the predictor is introduced, look at amount of improvement in the success rate (Incremental validity) (results from the addition of the predictor test)
o Moderate base rates optimizes incremental validity
o Selection ratio- ratio of openings in company over the amount of applicants (low selection ratio optimize incremental validity) (1 opening many applicants)
o Optimized incremental validity when baserate is 0.5 and selection rate is 0.1
Correction for attenuation
Correction for attenuation adjusts for the change in reliability on validity (reliability is never perfect but can adjust to show how much more valid it would be with perfect reliability)
ipsative measures
ipsative measures (/ˈɪpsətɪv/; from Latin: ipse, ‘of the self’) are those where respondents compare two or more desirable options and pick the one they prefer most.
Ceiling and Floor Effects
measure doesn’t include an adequate range of items at the extremes, eg floor effect occurs when a test is unusually difficult and many test-takers score at or near the bottom of the scale
Cross-Validation
test is often revalidated with a sample of individs diff from the original validation sample, “shrinkage” refers to the reduction in a criterion-related validity coeff upon cross-validation, shrinkage is greatest if the original validation sample is small, the orginal item pool is large, the number of items retained is small relative to the number of items in the item pool, and/or items are not chosen on the basis of a previously formulated hypothesis