Comps - Assessment (Reliability) Flashcards
Reliability
Reflects the degree to which differences in observed scores are consistent with differences in true scores
Because true scores are __________, reliability is __________; it can only be __________, not actually calculated
theoretical, theoretical, estimated
In layman’s terms, what does reliability do?
It provides an index of score stability, giving you an idea of how consistently you are measuring something
Reliability is what from validity?
Independent
If measurements are __________, then there is theoretically little __________
consistent, error
What happens when observed and true score variance are equal?
The reliability coefficient is 0, meaning that there is no measurement error (“perfect” reliability)
Classical Test Theory (CTT)
The conceptual basis for the entire concept of reliability. It was pioneered by Spearman and focuses on overall test scores
CTT Equation
Observed Score (X) = True Score (T) + Error (E)
What is the goal of CTT in assessments?
To minimize the error component, ensuring maximum congruence between the observed score and the true score
CTT assessment examples
Woodcock-Johnson, standardized testing (GRE, SAT, ACT), Wechsler, Stanford-Binet, MMPI
CTT items must measure the following:
-Same latent variable
-Same scale
-Same degree of precision
-Same amount of error
Tau-Equivalence
Adjacent model to CTT
How are CTT and Tau-Equivalence different?
Same assumption as CTT, with the exception that individual item variance can differ
Tau-Equivalence items must measure the following:
-Same latent variable
-Same scale
-Same degree of precision
-Different amount of error
Essential Tau-Equivalent Model
Same assumptions as tau-equivalent, but individual items can measure the same variable with different degrees of precision and different error
What does the Essential Tau-Equivalent Model consider?
That the true score may be measured more accurately by one item than another
Essential Tau-Equivalent Model must measure the following:
-Same latent variable
-Same scale
-Different degree of precision
-Different amount of error
Which model of reliability is the most representative and practical?
Essential Tau-Equivalent Model
Alternate Forms (method for estimating reliability)
Has 2 different forms that must measure the same thing, which is impractical in estimating reliability as it is hard to determine if the 2 forms are tau-equivalent. The correlation between Form 1 and Form 2 is an indicator of reliability
With Alternate Forms, how do we know that one true score is equal to the other?
If they really are equivalent, exposure to one form might impact your performance on the form. The person might be bored or remember the information from the first form. Very few tests make parallel alternate forms because it is difficult, expensive, and time-consuming
Test-Retest Model (method for estimating reliability)
Take the test, then take the test again a second time
What is a good timeframe to give a second test in the test-retest model?
2-8 weeks (want it to be long enough to minimize memory, practice, and learning effects while being short enough to prevent maturational developments or historical changes that would affect the true score)
Two Assumptions (method for estimating reliability)
The true score remains stable across testing conditions and error variance is equal in both administrations
Why might the true score not remain stable across testing conditions?
-State vs. trait (your trait should not change, but state while taking the test might)
-Carry-over effects, practice effects, maturation effects, interval between administrations
-Additional sources of error that might make performance better or worse
-Introduces sources of error that lower the reliability coefficient
State
Mood, emotional state
Trait
Personality, negative affectivity (predisposed to view the world negatively)
Why might the error variance be equal in both administrations?
-If the testing conditions differ or the intervals are far apart
-Testing at the same time of day in the same testing conditions can reduce the likelihood of different error variance
-Increased likelihood of carry-over effects when you learn how to take a test as you take it
Internal Consistency
Looks at inter-relatedness of test items and treats different parts of a test as different forms
What does internal consistency assume?
Unidimensionality (you can only divide a test into different forms if all of the items are measuring the same thing)
If items are strongly correlated with each other, what does that suggest?
That each item is measuring the same construct
What are the two forms of internal consistency?
Split-Half Reliability and Coefficient Alpha (Cronbach’s Alpha)
Split-Half Reliability
Items on a test are sorted into parallel subtests of equal length (evens vs. odds or down the middle). The correlation between the two scores is the reliability coefficient
What does CTT say about shorter tests, regarding Split-Half Reliability?
They are less reliable, assumes that new items are equivalent to existing ones, and assumes that split halves are parallel with the same true score and same variance
Coefficient Alpha (Chronbach’s Alpha)
Each item is considered its own subtest and it is grounded in the tau-equivalent model. It is the most widely used method for estimating reliability. The average of alpha is the reliability coefficient
Diagnostically, what is a good way to think about Coefficient Alpha?
It is a quick way to know that the items are tapping into the construct being measured (e.g., the BDI has items measuring the construct of depression)
Is Split-Half Reliability or Coefficient Alpha more accurate?
Coefficient Alpha
Standard Error of Measurement (SEM)
An estimate of the amount of error inherent in an obtained score; directly reflects reliability
What does higher SEM mean?
It is less reliable because there is a negative relationship
Reliability and Clinical Diagnosis
Our diagnostic criteria are supposed to be an approximation of the natural phenomena that is a disorder. The quality of that relationship between what they have and what they are diagnosed with is based on sensitivity and specificity
What is the problem with reliability of a clinical diagnosis?
The DSM diagnosis could be correlated with various other disorders because the DSM defines a diagnosis and not a disorder
What are the three variances in diagnoses that are non-overlapping influences?
Disorder, contaminants, and random error
Disorder
Variance among patient samples in characteristic related to the disorder that affects the diagnosis
Contaminents
A portion is due to variance among the patients sampled in characteristics unrelated to the disorder but that affect the diagnosis
Random Error
Factors not characteristic of the patient that impact the diagnosis (e.g., random fluctuations, errors by the diagnosticians, or lack of clarity in the diagnostic criteria)
Reliability is measured and described via __________ and __________
sensitivity and specificity
Sensitivity
The probability of correctly diagnosing true positives (i.e., am I accurately getting the diagnosis?)
What does high sensitivity suggest?
Low type II error (miss)
Specificity
The probability of correctly diagnosing true negatives (i.e., am I accurately ruling this out?)
What does high specificity suggest?
Low type I error (false alarm(
Type I Error
You think you found a significant effect when there really is not one
Type II Error
You miss a significant effect that is really there