Test construction Flashcards
What is the item discrimination index?
item discrimination index (D) indicates the difference between the percentage of examinees with high total test scores who answered the item correctly and the percentage of examinees with low total test scores who answered the item correctly. When the same percentage of examinees in the two groups answered the item correctly, D equals 0.
How can you increase reliability coefficient?
Reliability coefficients tend to be larger for longer tests than shorter tests as long as the added items address similar content as the original items do and when the tryout sample is heterogeneous with regard to the content measured by the test so that there is an unrestricted range of scores.
maximized when range of scores is unrestricted, when examinees are heterogeneous the range of scores is maximized
-difficulty level of items will also affect range, all easy or all hard items will lead to all high or low test scores, want average difficulty level of item to be mid-range
Explain classical test theory
Classical test theory is also known as true score theory and predicts that obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E), with measurement error referring to random factors that affect test performance in unpredictable ways.
How do you interpret a reliability coefficient?
A reliability coefficient is interpreted directly as the amount of variability in test scores that’s due to true score variability. When a test’s reliability coefficient is .90, this means that 90% of variability in test scores is due to true score variability and the remaining 10% is due to measurement error.
What is the spearman-brown formula used for?
Test length is one of the factors that affects the size of the reliability coefficient, and the Spearman-Brown formula is often used to estimate the effects of lengthening or shortening a test on its reliability coefficient. This formula is especially useful for correcting the split-half reliability coefficient because assessing split-half reliability involves splitting the test in half and calculating a reliability coefficient for the two halves of the test. Therefore, split-half reliability tends to underestimate a test’s actual reliability, and the Spearman-Brown formula is used to estimate the reliability coefficient for the full length of the test.
When is Cohen’s Kappa coefficient used?
The kappa coefficient is used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale (e.g., when a rating scale classifies children as either meeting or not meeting the diagnostic criteria for ADHD).
used to evaluate inner-rater reliability
corrected for change agreement between raters
When do you use the Kuder-Richardson 20?
Kuder-Richardson 20 (KR-20) can be used to assess a test’s internal consistency reliability when test items are dichotomously scored (e.g., as correct or incorrect) (altenrative to cronbach’s alpha)
What is test reliability?
extent to which a test provides consistent info
r =reliability coefficent (a correlation coefficient)
-ranges from 0-1
-interpreted as amoung of variability in test scores that’s due to true score variability
-do NOT square this, interpret as is
formula to calculate standard error of measurement
SEM = (SD)(square root of 1 - r)
where r = reliability coefficient
how to construct CI for 68%, 95% and 99%
from the person’s score add/subtract 1 SEM for 68% CI, 2 SEM for 95% CI, and 3 SEM for 99% CI
what does squaring a correlation coefficient tell you?
can only score correlation coefficient when it represents the correlation between two different tests
when squared it provides a measure of shared variability or uses terms like “accounted for by” or “explained by”
What does cronbach’s alpha measure?
internal consistency reliabilty
What is the problem with split-half reliability?
for split-half reliability you split the test in half and adminster then look at the correlation between the two halves
problem is that shorter tests are less reliable than longer tests, so the reliability coefficent of a split-half test underestimates the full tests true reliability
this is corrected with the spearman-brown prophecy formula
what is percent agreement?
used to assess inter-rater reliability for 2 or more raters, does not take chance agreement into account and can overestimate reliability
cohen’s kappa preferred because it is corrected for change agreement
What are factors that affect the reliability coefficient?
content homogeneity- leads to larger reliability coefficients
range of scores- reliability coefficients are larger when range of test scores are unrestricted
guessing- easier it is to choose the correct answer the lower the reliability coefficient