Reliability Flashcards
Statement 1: The Neale Analysis of Reading involves children being told a word and then pointing to the picture (in an array of four pictures) that corresponds to that word.
Statement 2: The Neale Analysis of Reading includes scores based on accuracy and speed (amongst other things).
(a) Both statements are true.
(b) Statement 1 true; Statement 2 false.
(c) Statement 1 false; Statement 2 true.
(d) Both statements are false.
The answer was c. See Lecture 3. Statement 1 is false because it describes the Peabody Picture Vocabulary Test not the Neale Analysis of Reading (where the description doesn’t actually involve testing reading). Statement 2 is true because accuracy and speed are two of the three key outcomes measures of the Neale Analysis of Reading (the third one being comprehension).
A man has a motorcycle crash that involves a closed-head injury. Before his crash, he completed an intelligence test where he scored 40 (mean 50, standard deviation 10). After his crash, he completed the same intelligence test and scored 35. If the standard error of the difference is 3 points for the test, then which of the following statements best describes the data?
(a) His intelligence has increased significantly after the crash (95% confidence).
(b) His intelligence has decreased significantly after the crash (95% confidence).
(c) His intelligence has not significantly changed after the crash (95% confidence).
(d) His initial intelligence is not significantly lower than the average person (95% confidence).
The answer was c. See Lecture 3 - section on the standard error of the difference. If the SEdiff is 3, then the man’s intelligence score needs to decrease by at least twice this to represent a statistically significant difference (95% confidence) - i.e. it needs to have decreased by 6 or more. In fact, it’s only decreased by 5 (40 to 35) - so this change is not significant.
Two students score 89 and 94 in a multiple-choice IQ test, which has been shown to have a standard error of measurement of 3. The mean of the test is 85 and the standard deviation is 15. Are their scores significantly different (95% level of confidence)?
(a) Depends on the validity of the IQ test used.
(b) Depends on whether the IQ test used is norm- or criterion-referenced.
(c) Yes, their IQ scores are significantly different.
(d) No, their IQ scores are not significantly different.
The answer was d. See Lecture 3 - section on the standard error of the difference. Using the formula for SEdiff, we put in the SEM of 3: square root of (3 squared + 3 squared) = square root of 18 = 4.24. To be significant, the difference between the students’ scores must be more than twice the SEdiff (i.e. 2 x 4.24 = 8.48). The actual difference (94 - 89 = 5) is less than this. Therefore their scores are not significantly different.
Imagine you had a questionnaire with 10 items and you were disappointed that its internal consistency was .69. What effect would adding another 10 items be predicted to have on the reliability?
(a) Predicted reliability of new test = .96.
(b) Predicted reliability of new test = .69.
(c) Predicted reliability of new test = .77.
(d) Predicted reliability of new test = .82.
The answer was d. You need to use the Spearman-Brown prediction formula shown in Lecture 3, where n = 20/10 = 2 (the 20-question-long new test is double the length of the 10-question-long existing test) and rxx = .69. So the new reliability will be: (2 x .69)/(1 + ((2-1) x .69)) = .82.
If you had a dataset of test scores where the range of scores was substantially restricted, which estimate of reliability would be affected (assuming the data could in principle yield statistics on all reliability estimates)?
(a) Internal consistency.
(b) Test-retest and alternate forms.
(c) Internal consistency, alternate forms, and test-retest.
(d) Internal consistency, alternate forms, test-retest, and inter-rater reliability.
The answer was d. Restriction of range affects any correlational type measure, which would include things like inter-rater reliability (see correlation lecture). One student asked whether restriction of range means only situations where a specific group was sampled (i.e. 1st year students instead of a cross-section of the university) in all cases. The answer is no: restriction of range could be any situation where the range of scores was restricted (which would affect all the estimates).
What are alternate forms tests?
(a) When there are two or more versions of a test that have the same mean and standard deviation.
(b) When there are two of more versions of a test that are equivalent in content and difficulty, but may not have exactly the same means and standard deviation.
(c) When there are two or more versions of a test that equivalent variance, but may not have exactly the same mean and standard deviation.
(d) When there are different versions of a test that test the same construct but use conceptually different approaches.
The answer was b. See Lecture 3. Alternate forms tests are different versions of a test that are generally equivalent but don’t have the same mean and standard deviation. If the means and standard deviations of the two test versions are the same then the two test versions would be called “parallel forms”. Also note that variance is just standard deviation squared - so option c can’t be true. Finally option d is incorrect because alternate forms tests wouldn’t have conceptually different approaches - because they’re supposed to be equivalent to each other.
According to Classical Test Theory, if a test has very high reliability then:
(a) The error variance must be very high.
(b) The total variance must be very high.
(c) Virtually all of the total variance must be accounted for by true variance.
(d) Virtually all of the total variance must be accounted for by error variance.
The answer was c. See Lecture 3. If virtually all of the total variance is accounted for by true variance, this means measurement error must be very low. This means reliability is high.
Statement 1: As part of the process of calculating Cronbach’s alpha, you have to split the questionnaire into two halves, calculate the total score for each half, and then multiply the total scores together.
Statement 2: As part of the process of calculating Cronbach’s alpha, you have to adjust for the homogeneity of the test by applying a special version of the Spearman-Brown formula.
(a) Both statements are true.
(b) Statement 1 true; Statement 2 false.
(c) Statement 1 false; Statement 2 true.
(d) Both statements are false.
The answer was d. Statement 1 – false. You average the correlations – you don’t multiply them. Statement 2 – false. You do indeed have to apply a version of the Spearman Brown formula but not because of adjusting for the homogeneity of the test. It’s because a half length test is likely to be less reliable than a full length test.
If you had a heterogeneous test, which estimate of reliability should you AVOID?
(a) Internal consistency.
(b) Test-retest.
(c) Alternate forms.
(d) Inter-rater reliability.
The answer was a. If your measure is heterogeneous then internal consistency might be an inappropriate estimate of reliability because it assumes that all the items in your test are measuring the same thing.
Statement 1: When we calculate the confidence interval of an individual’s test score, we are assuming that their observed score must be their true score.
Statement 2: If a revised version of a test is found to be more unreliable, then this will increase the Standard Error of Measurement (assuming that the standard deviation of the revised test is the same as the original).
(a) Both statements are true.
(b) Statement 1 true; Statement 2 false.
(c) Statement 1 false; Statement 2 true.
(d) Both statements are false.
The answer was a. See Lecture 3. Statement 1: The key words that solve this are “individual’s score” and “calculation”. To CALCULATE the confidence interval of an INDIVIDUAL’S test score (when we add and subtract twice the standard error of measurement), we have to assume the true score is in the middle of the distribution (we’re doing this whether we like it or not when we assume the confidence interval is symmetrical around their observed score). That is, we don’t actually know what the true score is, but our best guess of what it is (which we need to be able to calculate a confidence interval at all) would be the observed score – because that’s all we have to go on for our individual. This doesn’t mean we “believe” the true score will literally be the observed score – it’s just an assumption for the sake of the calculation (one previous student thought this statement was false because the true score is the observed score plus error). Statement 2: This is true because increased test unreliability will be associated with an increased Standard Error of Measurement (remembering that the formula for estimating SEM uses test reliability and test standard deviation - where the latter remains unchanged in this question).
If you dynamic characteristics, which estimate of reliability should you AVOID?
(a) Internal consistency.
(b) Test-retest.
(c) Alternate forms.
(d) Inter-rater reliability.
a. Internal consistency. You expect the results to change over time.
If you had a restriction of range/variance, which estimate of reliability should you AVOID?
(a) Internal consistency.
(b) Test-retest.
(c) Alternate forms.
(d) all of the above.
d. restricted range affects correlation and so reliability of the test.
If you had a speed test, which estimate of reliability should you USE?
(a) Internal consistency.
(b) Test-retest.
(c) Alternate forms.
(d) B and C.
D.
If you had a criterion referenced test, which estimate of reliability should you AVOID?
(a) Internal consistency.
(b) Test-retest.
(c) Alternate forms.
(d) all of the above.
D. pass or fail restricts range.