Exam 2 Flashcards
consistency in measurement of some (real or hypothetical) characteristic
reliability
the theoretical number that it one’s perfectly accurate representation of knowledge as a score on a test
True Score
a person’s true score, plus some error
observed test score
unsystematic variability introduced into scores; random; can cause deviation from true score
error
Sources of Psychological Measurement Error
- test construction
- test administration
- test scoring
items are created/selected from a large population of possible items from within the domain
may affect how well one performs on a test. Someone may perform very well on some subtests and bad on others.
content sampling
factors that influence the test taker’s attention, concentration, motivation, etc.
test administration
physical appearance, departure from test standardization procedure, not placing materials in proper orientation, incorrect timing, etc.
examiner influences
test environment, test anxiety, medication effects, extended testing session - fatigue
physical or psychological discomfort
estimates of the ratio of true score variance to total variance (cannot be negative)
reliability coefficients
test-retest/stability, interrater agreement. internal consistency, alternate forms, etc.
types of reliability
reliability indices should meet or exceed
.85 or .90
index of how an individual’s scores may vary over tests presumed to be parallel
standard error of measurement (SEm)
as rxx increases from 0-1…
SEm decreases from SD to 0
Purpose of rxx
- internal consistency
- alternate forms
- test-retest
- interrater agreement
static vs. dynamic characteristic
static doesn’t change very much
dynamic evolves a lot
degree to which evidence and theory support the interpretations of test scores for proposed uses of tests
relates to inferences or interpretations made about performance based on scores from the measure.
validity
- content validity
- criterion related validity
- construct validity
trinitarian model
Evidence based on:
- test content
- response processes
- internal structure
- relations with other variables
- consequences of testing
unified theory (Messick)
examination of test content
content validity
- test-retest
- alternate forms
- interrater agreement
Methods of Reliability
Rα
Cronbach’s coefficient alpha
when an item belongs to more than one factor
cross loading
examination of test content
content validity
- test-retest
- alternate forms
- interrater agreement
Methods of Reliability
Rα
Cronbach’s coefficient alpha
when an item belongs to more than one factor
cross loading
z scores for 68%
95%
99%
1
- 96
- 58
Which reliability estimate should be used with test-retest?
stability
SEdiff formula
SEdiff = SD√2-rxx1-rxx2
Rxx1 = reliability of test 1
Rxx2 = reliability of test 2
answer is the number of points required to make a SIGNIFICANT DIFFERENCE
Which reliability estimate should be used with internal consistency?
coefficient alpha, KR-20, split-half
Which reliability estimate should be used with alternate forms?
equivalance
Which reliability estimate should be used with test-retest?
stability
compare a test in question with an already accepted standard measure
concurrent validity
when correlations are superficially high because two tests are too similar
criterion contamination
test in question is compared to a standard gathered at a future time; predicts performance on a standard at a future time
ex: hs GPA and ACT predicting college GPA
predictive validity
experimental procedures that cause a change in construct can be used to test a test’s validity
theory consistent intervention effects
two or more measures designed to measure the same construct should produce high correlations (large amount of shared variance)
convergent validity
two or more measures designed to measure different constructs should produce low/near zero correlations (little shared variance)
divergent validity
use scores to classify subjects into groups
discriminant analysis
conceptualization
is there a need? who wil it be used for? content? administration? responses?
scaling
how we assign “points” to responses
try-out, analyze, revise
first “test” administration, analysis, revision
Spearman brown
Used to estimate reliability after changing test length
KR-20
Used with dichotomous