Lecture 4 Flashcards

1
Q

What is Reliability?

A

-can we reproduce the results? are they trustworthy?
-reliability is only about measurement error
-observed score = true score + random measurement error
-looking at inter-individual differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 3 factors that influence reliability?

A

-any event that creates inconsistencies in performance
-content sampling
-statistical factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the formula rxx = ot2/ox2 mean?

A

-correlation between a test and itself
-correlation corresponds to the variance in the total score that is due to true score variance
-correlation varies from 0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some examples of events that create inconsistencies in performance?

A

-true changes in results (over time only)
-random changes in the persons being assessed (tiredness, sickness, etc.)
-random changes in the test administration process (location, instructions, noise, etc.)
-random changes in the scoring procedures
-random changes in the item response process (e.g., guessing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the formula 1-rxx = oe2/ox2 = % Error mean?

A

-gives us the percentage of the total score that is in the random measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some examples of content sampling?

A

-representativeness of the items (luck)
-clarity of the items
-test length (longer tests = higher reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some examples of statistical factors?

A

-regression toward the mean
-range restriction/extension
*extension: more variability in responses
*restriction: less variability; harder to get a strong correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some sources of error and their corresponding reliability assessment?

A

-stability/time sampling –> test-retest reliability
-inter-item consistency/content sampling –> alternate forms reliability OR split half reliability
-scale score reliability/content sampling –> cronbach alpha
-inter-rater agreement and inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the test-retest reliability (stability/time sampling)?

A

-test administered twice at two separate time points to assess how stable the scores will be over time.
-time interval should carefully selected to: (a) limit the possibility of true score changes; (b) limit the effects of recall [which artificially inflate stability] (1/2 weeks - 1 month)
-we want a correlation between score obtained on both administrations of .800-.900 (depends on time interval and what is known about construct)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is alternate forms reliability (inter-item consistency/content sampling)?

A

-2 equivalent alternative forms of the same test are administered either simultaneously (assess reliability due to content sampling) or at 2 separate time points (assess reliability due to content + time sampling)
-assesses item similarity across forms through correlation
-differences reflect random errors related to item sampling: difficulty, representativeness, clarity, guessing, etc.
-it is important to balance the administration order in 2 separate samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a pro and con of alternate forms reliability?

A

-pro: no recall effect (not concerned about memory bias)
-con: not the same items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is split half reliability (inter-item consistency/content sampling)?

A

-1 test is simply split in two (then correlate the 2 halves)
-important to ensure content similarity in this process; if all items are equivalent, split half process can be done randomly; fatigue effects can be controlled by splitting as a function of the order of appearance of the items
-differences in scores will reflect random error due to item sampling (difficulty, representativeness, clarity, guessing, etc.)
-underestimation: the longer the test, the higher the reliability
-Spearman Brown prophecy formula (k=2; 2 times longer) [-k is about how much longer or harder you want the test to be]
-not as accurate as alternative because there are a lot of different halves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Cronbach alpha (scale score reliability/content sampling)?

A

-coefficient alpha (a) or KR-20 (Kuder-Richardson, formula 20 for binary items).
-roughly, this corresponds to doing all possible split-half, and combining them in a single estimate. (does this without dividing the test) [most precise way to assess reliability]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Are there cut-off scores to determine a good alpha?

A

-no, it depends:
-items in index/survey and tests of speed/power have no reason to be consistent/correlated with one another
-test length is positively related to reliability so we should expect higher alphas for longer tests (& vice versa)
-very short tests aiming to assess very broad constructs (content heterogeneity) will tend to have lower alphas
-Spearman-Brown prophecy formula can be used to estimate what the alpha would be based on a different number of equivalent items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the order of what we look at on reliability statistics table?

A
  1. Cronbach’s alpha [scale score reliability]
  2. alpha if item deleted
  3. corrected item-total correlation (how much each item measures the same thing as the test)
    [scale mean and variance if item deleted is not important, but variability should drop dramatically]
    (we look at this to see if some items are irrelevant and worth removing without affecting the alpha)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between inter-rater agreement and inter-rater reliability?

A

-same participants are assessed, using the same test and procedure, by different assessors.
-agreement: do they agree on their final rating of participants (how many of them) [most relevant for criterion-related measurement]
-reliability: do they classify participants in roughly the same order [norm-referenced measurement]

15
Q

What are the reliability assessments and the errors they account for?

A

-test-retest –> error due to time and content sampling
-alternate forms (delayed) –> error due to time and content sampling
-alternate forms (simultaneous) –> error due to content sampling
-split half –> error due to content sampling
-alpha –> error due to content sampling
-inter-rater –> error due to the assessors

16
Q

What is the Standard Error of Measurement?

A

-a concept similar to the standard deviation, and reflects the “average” amount of random measurement error associated with a specific individual test score;
-i.e., the average discrepancy between observed scores and the underlying true scores

17
Q

What does the confidence interval allow us to measure?

A

-random error is random so its distribution follows the normal curve –> true scores will be between 1.96 deviations from the mean
-score on test +/- (1.96)*(standard error of measurement)
-it is with this confidence interval that it becomes possible to interpret and compare individual test scores.