Midterm 1 - Reliability (Ch. 4) Flashcards
(reliability/validity) is a condition for (reliability/validity)
reliability is a condition for validity
reliability means ____ of scores
consistency
What are 3 assumptions of Classical Test Theory about measurement?
- no measurement instrument is perfectly reliable (observed score = true score + error)
- measurement errors are random (mean error = 0)
- measurement errors are normally distributed
(random/systematic) error lowers reliability of a test
random!
(T/F) systematic measurement errors lower reliability of a test over time
FALSE, does not lower reliability bc test is inaccurate by same amount every time
What impact do random and systematic errors have on the average of a distribution?
- random: no effect on average, just variability around it
- systematic: changes average (bias)
Classical test theory assumes that the variance of observed scores = ____ + _____
variance of true scores + error variance (random error)
σX(squared) = σT(squared) + σE(squared)
The reliability coefficient (rxx’) measures the proportion of observed test scores accounted for by ____
variability in true scores
What is the formula for the reliability coefficient?
rxx’ = true score variance/observed score variance
rxx’ = σT(squared)/σX(squared)
the reliability coefficient is always between __ and __
0 and 1
Classical test theory assumes that two tests are parallel if:
1.
2.
3.
- true scores are the same (which means equal observed score means)
- equal error variance
- same correlations w other tests
Standard error of measurement (SEM) indicates the amount of _____ expected in an individual’s (true/observed) test score. SEM corresponds to _____
uncertainty or error; observed test score
corresponds to SD of distribution of scores one would obtain by repeatedly testing a person
Identify each of the variables in this equation:
SEM = SDx * √(1-rxx’)
if SDx is high, SEM will be (higher/lower)
if rxx’ is high, SEM will be (higher/lower)
SEM = standard error of measurement
SDx = standard deviation of observed scores
rxx’ = reliability of observed scores
if SDx is high, SEM will be HIGHER
if rxx’ is high, SEM will be LOWER
Confidence interval (CI) is a range of scores that _____
we feel confident will include the true score
Identify each of the variables in the following formula:
CI = X +/- (zCI)(SEM)
CI = confidence interval
X = observed score
zCI = z of CI’s boundaries
SEM = standard error of measurement
what is the z score (zCI) for a 95% confidence interval?
1.96
What is the 95% confidence interval for an observed score of 110 with a SEM of 5?
CI = X +/- (zCI)(SEM)
CI = 110 +/- 1.96(5)
CI = 110 +/- 9.8
CI = approx 100 and 120
Reliability of a test can be increased by ____
adding items! (as long as they’re valid)
What does the Spearman-Brown formula do?
predicts effect of lengthening or shortening a test on reliability
Identify the variables in the Spearman-Brown formula:
Predicted rxx’ = [n(rxx’)] / [1+(n-1)rxx’]
Predicted rxx’ = reliability of observed scores
n = degree of increase/decrease in test length (ex double = 2; half = .5)
What are the 3 main types of reliability?
- test-retest
- internal consistency
- interrater agreement
What is test-retest reliability?
- coefficient of stability
- admin same test to same group @ diff occasions and correlate both sets of scores
(T/F) when evaluating reliability we always want test-retest reliability to be high
FALSE - should match construct (should be lower when measuring a construct we expect to change!)
What is another form of test-retest reliability?
- coefficient of equivalence
- make two similar versions of a test and admin both to same group in very short period, then correlate scores
What is internal consistency?
- look at how consistently examinees performed across items or subsets of items
- one test administration!
What are the two methods we discussed for assessing internal consistency?
- Split half methods (older): correlate scores based on first half or second half of items OR based on odd/even items
- contemporary approach: Cronbach’s alpha (avg of all possible split-half correlations); unaffected by how items are arranged in test
(T/F) we always want internal consistency coefficients to be as high as possible
FALSE
- higher for narrow constructs (ex sadness)
- lower for broader constructs (ex neuroticism)
**very high scores can indicate insufficient sampling of the domain!
What is interrater reliability?
- have diff observers rate a certain behaviour
- look @ level of agreement btw observers
Identify the variables in the Kappa formula for 2 categories
k = (po - pe) / (1 - pe)
This formula assesses ____
k ranges from __ to __ with higher values indicating ___
k = kappa (interrater agreement corrected by chance)
po = observed proportions (how often raters agreed out of all observations)
pe = expected proportions
assesses interrater agreement corrected by chance
k ranges from -1 to 1; higher = higher agreement
What is one challenge of the interrater reliability method?
- can be hard to precisely define a construct
What is the main source of error for each of these forms of reliability coefficients?
- test-retest
- parallel forms
- internal consistency
- interrater
- test-retest: time sampling (length of test-retest interval)
- parallel forms: item sampling (homogeneity of items)
- internal consistency: item sampling (homogeneity, length)
- interrater: observer differences