Test Construction: Reliability Flashcards by Nikos Rebetis

Reliability Coefficient:

Range and Reliability

Range: 0.0 to +1.0

Acceptable Test Reliability: r = .80 or higher

How well did you know this?

Not at all

Perfectly

Reliability is also known as…

Consistency, Precision

*a reliable test yields consistent, dependable results

How well did you know this?

Not at all

Perfectly

Test-Retest Reliability: Overview

Same group
Same exam repeated

r = coefficient of stability/consistency

How well did you know this?

Not at all

Perfectly

Test-Retest Reliability: Source of error

Time Sampling Factors

How well did you know this?

Not at all

Perfectly

Alternate Forms Reliability

aka Equivalent Forms/Parallel Forms

Same group

Compare scores on alternate/equivalent form of exam

r = coefficient of equivalence

How well did you know this?

Not at all

Perfectly

Alternate Forms Reliability: Sources of Error

Content Sampling

Time Sampling

How well did you know this?

Not at all

Perfectly

Test-retest reliability and alternate forms reliability should NOT with large probability of these 2 types of errors

Time Sampling

Practice Effects

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability: Two Methods

Split-Half Reliability

Cronbach’s Alpha

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability: Split-Half Reliability

One test
One group

Test is split into equal halves so that each examinee has two scores

r = correlation between two halves/scores

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability: Split-Half reliability underestimation

Shorter tests = underestimation of reliability (small sample size)

*Corrected with Spearman-Brown prophecy formula

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability: Spearman-Brown formula

r = split-half reliability if it were based on full test

“Long Spear cuts you in half”

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability: Cronbach’s Alpha

Like split-half,
One test
One group

Looks at all items for inter-item consistency

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability:

Cronbach’s Alpha: characteristics

Conservative

Considered the lower boundary of the test’s reliability

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability: Kuder-Richardson Formula 20 (KR-20)

Cronbach Alpha Variation for Dichotomous scores (yes/no)

“KRonbach 2.0”
two point oh = 2 for another version of Cronbach and 2 for dichotomous

How well did you know this?

Not at all

Perfectly

Internal consistency reliability:

Sources of error

Content sampling

Heterogeneity of content

How well did you know this?

Not at all

Perfectly

Internal Consistency Reliability:

Most Effective Application

When test measures single characteristic

When characteristic fluctuates over time

When practice effects are likely

Internal consistency reliability: when not to use

Speed test

*fixed time prevents completion of all items

Best Reliability Approach for Speed Test

Alternate Forms Reliability

Inter-Rater Reliability: Two Methods, three statistics

a.k.a. Inter-Scorer, Inter-Observer

Used when scores depend on a rater’s judgment

Two methods:

Correlation Coefficient
kappa (k) statistic
coefficient of concordance (Kendall)

*Percent Agreement

Inter-Rater Reliability: When to Use Kappa Statistic (k)

Nominal or Ordinal scales

*Mnemonic: kNOw the other raters

Inter-Rater Reliability: Coefficient of Concordance

aka Kendall’s Coefficient

3 or more raters
Ranked Ratings

“Everyone ranked Kendall Jenner Pepsi Ad as horrible”

Inter-Rater Reliability: Sources of Error

motivation
biases
characteristics of measuring device
observer drift

Inter-rater reliability: Observer Drift

Error introduced raters working together influence each other’s ratings to be more similar

Factors that affect Reliability: Test Length

Smaller test = less data and more error

Factors that affect Reliability: How to Increase range

heterogenous examinees varied degree of difficulty of items (r is maximized when range is unrestricted)

Lowest to Highest Test Reliability

True/false Multiple-choice Free recall (least amount of guessing)

Interpreting Reliability: the reliability coefficient

Proportion of variability that is attributable to true score variability e.g. r = .84 84% = variability of true score differences 16% = measurement error

Interpreting Reliability: Standard Error of Measurement

Similar to standard deviation Useful to interpret an individual examinee's test score SEM factors in reliability to create a Confidence interval

Standard Error and Confidence Intervals

Add and subtract SEM to Test Score to create CI [Same percentages as SD and same math as Z score] E.g. Test score = 100, SEM = 10: 68% CI = +1 & - 1 [90-110] 95% CI = +2 & - 2 [80-120] 99% CI = +3 & - 3 [70-130]

Standard Error of the Difference

Compares ONE examinee's performance on: two different tests (or same test at different times)

Factor that decreases Cronbach's Alpha

Heterogeneity of content

Error: Content Sampling

Interaction between: Examinee knowledge and different content on alternate form

Error: Time Sampling Factors

Random factors related to passage of time E.g. - Difference in examinees' anxiety, motivation, etc. - Random variations in testing environment

Error: Carryover Effects

When scores are affected by multiple tests: order of test administration (order effects) repeated measurement (practice effects)