Reliability Flashcards

1
Q

Classical Test Theory

definition & formula

A

Measurement theory of how test scores relate to a construct (or concept that the measure is trying to get at).

X = T + e

X: observed score
T: true score
e: measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CTT Error

A

Assumes random error of measurement, not systematic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 CTT Assumptions

A
  1. Expected error (e) is 0 = is random.
  2. T and e are not correlated.
  3. Error at time 1 is not correlated with error at time 2.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Path Diagram Element: Box

A

Observable variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Path Diagram Element: Circle

A

Latent (unobservable) variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Path Diagram Element: single-headed straight arrow

A

Regression path; one variable influences the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Path Diagram Element: curved double-headed arrow

A

Covariance (association) between two variables; ***unstandardized, so depends on scales of the variables (as opposed to correlation, which is standardized, and can only be between -1 and 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Index of Reliability

A
  • Square root of rxx
  • Factor loading
  • Estimate of the correlation between true scores and observed scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Coefficient of Reliability

A
  • rxx
  • Estimated correlation between X1 and X2
  • Association between a measure and itself over time (or with another measure)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Test-Retest Reliability

A
  • For a stable construct in which the correlation between T across time is 1, we estimate the reliability as the correlation between observed scores at time 1 and time 2
  • The extent to which the time 1 and time 2 scores do not correlate (distance from 1 or -1) is due to a measurement error (rather than a change in the true score)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Systematic Error

A
  • Can be either positive or negative
  • Influence consistently for a person or sample; same value every time
  • Affects mean of scores, not variability of scores = biased estimate of average
  • Decreases accuracy of group-level and individual scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random Error

A
  • Expected value = 0
  • Errors occur due to chance, do not have consistent effect on individual/sample
  • Affect variability (noise around the mean), do not affect the mean
  • Large number of observations cancels out random errors
  • Group-level means are accurate, but individual scores are less precise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Within-Person Random Error

A

No person or group level bias; means approximate T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Within-Person Systematic Error

A

Positive bias for individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Between-Person Random Error

A

Person-level bias, group approximates T, group variance is inflated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Between-Person Systematic Error

A

Increases bias for each person, group mean is higher than T

17
Q

Reliability

A
  • Consistency, repeatability
  • equal to (true score variance) / (observed score variance)
  • Can only be estimated because we only have observed scores, not T or e
  • Coefficient of reliability: inversely related to measurement error; depends on variance of scores
  • Increases with greater # of items (because we’re averaging)
  • Shortcut: test many people over time
18
Q

Standard Error of Measurement

A
  • Estimates the extent to which an observed score deviates from T (true score)
  • 95% of the time, T is expected to fall within +/- 2 SEoM
  • Higher reliability = lower SEoM
19
Q

Test-Retest Reliability & Types

A
  • consistency of scores across time, typically 2 weeks
    1. Relative (coefficient of stability, dependability)
    2. Absolute (coefficient of repeatability)
20
Q

Coefficient of Stability

A
  • Relative measure of test-retest reliability

- Pearson’s correlation between T1 and T2 over time (days/weeks)

21
Q

Coefficient of Dependability

A
  • Relative measure of test-retest reliability

- Pearson’s correlation between T1 and T2 immediately (minutes)

22
Q

Coefficient of Repeatability

A
  • Absolute measure of test-retest reliability
  • Reflects consistency of scores across time by defining a range in which 95% of score differences are expected to be
  • Higher CR = greater unreliability
  • Smaller CR = more consistent scores = stronger reliability
  • Uses a Bland Altman Plot
23
Q

Bland Altman Plot

A
  • Used to demonstrate (absolute test-retest) coefficient of repeatability
  • Bias (aka systematic measurement error) plus and minus the CR define the upper and lower limits of agreement (LOA)
  • —Bias = mean difference between T1 and T2 for all subjects
  • ———If closer to 0, means stronger absolute test-retest reliability at group level
  • X-axis: mean of scores across time
  • Y-axis: difference between T1 and T2 scores
  • Line of identity: y=0, meaning perfect consistency across time
24
Q

Inter-rater Reliability

A
  • Consistency of scores across raters
  • Uses intraclass correlation for continuous data
  • Uses Cohen’s kappa for binary/categorical variables
25
Q

Intra-rater Reliability

A

Consistency of scores within the same rater

26
Q

Parallel-Forms Reliability

A
  • Consistency of scores across two parallel forms (two equivalent measures of the same construct)
  • Can be immediate or delayed
  • Coefficient of equivalence: Pearson’s correlation, ratio of true score variance to observed score variance
  • Controls for specific error (error that is particular to a specific measure)
  • Assumes that parallel forms are equivalent, which requires same T and variability
27
Q

Coefficient of Equivalence

A
  • Measure of parallel-forms reliability

- Pearson’s correlation, ratio of true score variance to observed score variance

28
Q

Internal Consistency

A
  • Consistency of scores across time
  • Necessary but insufficient for unidimensionality
  • Use split-half reliability (Spearman-Brown prediction, Cronbach’s alpha, Omega)
29
Q

Split-Half Reliability

A
  • Used to measure internal consistency
  • Randomly take half of items on a measure and relate scores to the other half
  • But CTT states that fewer items = less reliable…
  • So Spearman-Brown prediction formula predicts reliability after changing test length
30
Q

Cronbach’s Alpha

A
  • approximately equal to mean reliability estimate of all split-halves
  • Affected by:
  • —-# of items; more items = inflates alpha
  • —-Variance of item scores; if low variance/SD, it underestimates internal consistency
  • —-Violations of assumptions (that items equally relate to construct (tau equivalence) and that scale is unidimensional)
  • Omega is better option
31
Q

Types of Omega & Uses

A
  • Used to estimate reliability of split-halves; better option than Cronbach’s Alpha
  • Omega total for continuous, unidimensional data
  • Hierarchical omega for continuous, multi-dimensional data
  • Categorical omega for categorical data
32
Q

Reliability Type Rankings

A

delayed parallel-forms< test–retest, inter-rater < immediate parallel-forms, internal consistency, intra-rater

33
Q

Why High Test-Retest does not mean Reliability?

A

Just because a measurement has high level of test-retest reliability, doesn’t mean that it actually has reliability because systematic error does not reduce stability or repeatability coefficients; actually artificially increases stability in within-person!

34
Q

Generalizability Theory

A
  • Alternative to CTT bc it views measurement error (facets) as conditions causing a certain score
  • Examines extent to which scores are consistent across a specific set of conditions
  • Takes into account multiple sources of error (facets) simultaneously
  • “UNIVERSE SCORE” = person’s true score across all conditions in universe
  • Measures reliability, validity via GENERALIZABILITY COEFFICIENT (relative)