Comps - Assessment (Reliability) Flashcards by Jenna Deutch

Reliability

Reflects the degree to which differences in observed scores are consistent with differences in true scores

How well did you know this?

Not at all

Perfectly

Because true scores are __________, reliability is __________; it can only be __________, not actually calculated

theoretical, theoretical, estimated

How well did you know this?

Not at all

Perfectly

In layman’s terms, what does reliability do?

It provides an index of score stability, giving you an idea of how consistently you are measuring something

How well did you know this?

Not at all

Perfectly

Reliability is what from validity?

Independent

How well did you know this?

Not at all

Perfectly

If measurements are __________, then there is theoretically little __________

consistent, error

How well did you know this?

Not at all

Perfectly

What happens when observed and true score variance are equal?

The reliability coefficient is 0, meaning that there is no measurement error (“perfect” reliability)

How well did you know this?

Not at all

Perfectly

Classical Test Theory (CTT)

The conceptual basis for the entire concept of reliability. It was pioneered by Spearman and focuses on overall test scores

How well did you know this?

Not at all

Perfectly

CTT Equation

Observed Score (X) = True Score (T) + Error (E)

How well did you know this?

Not at all

Perfectly

What is the goal of CTT in assessments?

To minimize the error component, ensuring maximum congruence between the observed score and the true score

How well did you know this?

Not at all

Perfectly

CTT assessment examples

Woodcock-Johnson, standardized testing (GRE, SAT, ACT), Wechsler, Stanford-Binet, MMPI

How well did you know this?

Not at all

Perfectly

CTT items must measure the following:

-Same latent variable
-Same scale
-Same degree of precision
-Same amount of error

How well did you know this?

Not at all

Perfectly

Tau-Equivalence

Adjacent model to CTT

How well did you know this?

Not at all

Perfectly

How are CTT and Tau-Equivalence different?

Same assumption as CTT, with the exception that individual item variance can differ

How well did you know this?

Not at all

Perfectly

Tau-Equivalence items must measure the following:

-Same latent variable
-Same scale
-Same degree of precision
-Different amount of error

How well did you know this?

Not at all

Perfectly

Essential Tau-Equivalent Model

Same assumptions as tau-equivalent, but individual items can measure the same variable with different degrees of precision and different error

How well did you know this?

Not at all

Perfectly

What does the Essential Tau-Equivalent Model consider?

That the true score may be measured more accurately by one item than another

How well did you know this?

Not at all

Perfectly

Essential Tau-Equivalent Model must measure the following:

-Same latent variable
-Same scale
-Different degree of precision
-Different amount of error

How well did you know this?

Not at all

Perfectly

Which model of reliability is the most representative and practical?

Essential Tau-Equivalent Model

How well did you know this?

Not at all

Perfectly

Alternate Forms (method for estimating reliability)

Has 2 different forms that must measure the same thing, which is impractical in estimating reliability as it is hard to determine if the 2 forms are tau-equivalent. The correlation between Form 1 and Form 2 is an indicator of reliability

How well did you know this?

Not at all

Perfectly

With Alternate Forms, how do we know that one true score is equal to the other?

If they really are equivalent, exposure to one form might impact your performance on the form. The person might be bored or remember the information from the first form. Very few tests make parallel alternate forms because it is difficult, expensive, and time-consuming

How well did you know this?

Not at all

Perfectly

Test-Retest Model (method for estimating reliability)

Take the test, then take the test again a second time

What is a good timeframe to give a second test in the test-retest model?

2-8 weeks (want it to be long enough to minimize memory, practice, and learning effects while being short enough to prevent maturational developments or historical changes that would affect the true score)

Two Assumptions (method for estimating reliability)

The true score remains stable across testing conditions and error variance is equal in both administrations

Why might the true score not remain stable across testing conditions?

-State vs. trait (your trait should not change, but state while taking the test might)
-Carry-over effects, practice effects, maturation effects, interval between administrations
-Additional sources of error that might make performance better or worse
-Introduces sources of error that lower the reliability coefficient

State

Mood, emotional state

Trait

Personality, negative affectivity (predisposed to view the world negatively)

Why might the error variance be equal in both administrations?

-If the testing conditions differ or the intervals are far apart -Testing at the same time of day in the same testing conditions can reduce the likelihood of different error variance -Increased likelihood of carry-over effects when you learn how to take a test as you take it

Internal Consistency

Looks at inter-relatedness of test items and treats different parts of a test as different forms

What does internal consistency assume?

Unidimensionality (you can only divide a test into different forms if all of the items are measuring the same thing)

If items are strongly correlated with each other, what does that suggest?

That each item is measuring the same construct

What are the two forms of internal consistency?

Split-Half Reliability and Coefficient Alpha (Cronbach's Alpha)

Split-Half Reliability

Items on a test are sorted into parallel subtests of equal length (evens vs. odds or down the middle). The correlation between the two scores is the reliability coefficient

What does CTT say about shorter tests, regarding Split-Half Reliability?

They are less reliable, assumes that new items are equivalent to existing ones, and assumes that split halves are parallel with the same true score and same variance

Coefficient Alpha (Chronbach's Alpha)

Each item is considered its own subtest and it is grounded in the tau-equivalent model. It is the most widely used method for estimating reliability. The average of alpha is the reliability coefficient

Diagnostically, what is a good way to think about Coefficient Alpha?

It is a quick way to know that the items are tapping into the construct being measured (e.g., the BDI has items measuring the construct of depression)

Is Split-Half Reliability or Coefficient Alpha more accurate?

Coefficient Alpha

Standard Error of Measurement (SEM)

An estimate of the amount of error inherent in an obtained score; directly reflects reliability

What does higher SEM mean?

It is less reliable because there is a negative relationship

Reliability and Clinical Diagnosis

Our diagnostic criteria are supposed to be an approximation of the natural phenomena that is a disorder. The quality of that relationship between what they have and what they are diagnosed with is based on sensitivity and specificity

What is the problem with reliability of a clinical diagnosis?

The DSM diagnosis could be correlated with various other disorders because the DSM defines a diagnosis and not a disorder

What are the three variances in diagnoses that are non-overlapping influences?

Disorder, contaminants, and random error

Disorder

Variance among patient samples in characteristic related to the disorder that affects the diagnosis

Contaminents

A portion is due to variance among the patients sampled in characteristics unrelated to the disorder but that affect the diagnosis

Random Error

Factors not characteristic of the patient that impact the diagnosis (e.g., random fluctuations, errors by the diagnosticians, or lack of clarity in the diagnostic criteria)

Reliability is measured and described via __________ and __________

sensitivity and specificity

Sensitivity

The probability of correctly diagnosing true positives (i.e., am I accurately getting the diagnosis?)

What does high sensitivity suggest?

Low type II error (miss)

Specificity

The probability of correctly diagnosing true negatives (i.e., am I accurately ruling this out?)

What does high specificity suggest?

Low type I error (false alarm(

Type I Error

You think you found a significant effect when there really is not one

Type II Error

You miss a significant effect that is really there