reliability pt 3 Flashcards

1
Q

inter rater reliability

A

-Applies when judgment must be exercised in scoring responses (e.g., WAIS-IV VCI subtests)

  • Item level (agreement on item scores)
  • How much agreement on each individual item?
  • Correlation between raters on assigned item scores
  • Scale level (total score on scale)
  • How much agreement on the total score?
  • Correlation between raters on total scores

If there are more than two raters, take the mean of the correlations for each pair of raters (A & B, B & C, A & C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

inter rater reliability categorical decisions

A

When there is a finite number of categories to which each person being rated can be assigned

  • Items: Pass/Fail (0,1)
  • Items: 0, 1, 2
  • Diagnosis: Present/Absent

Two methods for assessing:

  • Percent Agreement
  • Kappa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

inter rater reliability: percent agreement

A
  • Percentage of all cases for which both raters make the same decision (i.e., both assign a score of 0 or both assign a score of 1)
  • Problem: Raters could agree simply by chance
  • Percent agreement can OVERESTIMATE inter-rater reliability
  • Kappa (Κ) takes chance agreement into account and is the preferred method for assessing inter-rater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to calculate percent agreement

A

Two raters independently decide whether an item score should be 0 or 1 for N individuals who complete the item
A = number of times item was scored 0 by both #1 and #2
B = number of times item was scored 0 by #1 and 1 by #2
C = number of times item was scored 1 by #1 and 0 by #2
D = number of times item was scored 0 by both #1 and #2
Percent agreement = percentage of cases for which both raters gave the same score (either both 0 or both 1) = (A+D)/N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

calculating chance agreement for a score = 0

A

Total scores of 0 given for Rater #1 = A + B
Total scores of 0 given for Rater #2 = A + C
Proportion of cases given a score of 0 by Rater #1 = (A+B)/N
Proportion of cases given a score of 0 by Rater #2 = (A+C)/N
Chance agreement for a score of 0 =
(A+B)/N times (A+C)/N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

calculating chance agreement for score = 1

A

Total scores of 1 given for Rater #1 = C + D
Total scores of 1 given for Rater #2 = B + D
Proportion of cases given a score of 1 by Rater #1 = (C+D)/N
Proportion of cases given a score of 1 by Rater #2 = (B+D)/N
Chance agreement for a score of 1 =
(C+D)/N times (B+D)/N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

calculating total chance agreement

A

Add the chance agreement for a score of 0 to the chance agreement for a score of 1
(A+B)/N times (A+C)/N
PLUS
(C+D)/N times (B+D)/N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

implications of reliability

A
  • There is no single value that represents the reliability of a test … we must specify which type of reliability we are estimating
  • The methods we have considered all permit us to estimate a specific type or source of error
  • To estimate multiple sources of error simultaneously Generalizability Theory
  • Test manuals will report all relevant types of reliability (test/retest; split-half; internal consistency; inter-rater)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

standard error of measurement

A
  • Reliability coefficients apply to the test itself

- The SEM permits us to estimate how much error is likely to be present in an individual examinee’s score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SEM in words

A
  • Step 1. Subtract the reliability of the test from 1.
  • Step 2. Take the square root of Step 1.
  • Step 3. Multiply the standard deviation of the test by Step 2.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SEM and reliability

A
  • The SEM is INVERSELY -
  • If reliability is high, SEM is low
  • If reliability is low, SEM is high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

standard error of measurement according to classical reliability theory

A

According to Classical -Error is normally distributed around a mean of 0
-SEM = the standard deviation of the distribution of error scores

Using the probabilities associated with the normal curve

  • The probability is 68% that the amount of error is within 1 SEM
  • The probability is 95% that the amount of error is within 2 SEM
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

estimating error

A

We can use the SEM to make probability statements about the amount of error associated with an observed score

NOTE: To do this accurately, we have to use the exact values rather than the “approximate” values we used in Chapter 1 of the Manual

  • The probability is 68% that the amount of error associated with an observed score is no more than +/- 1 SEM
  • The probability is 90% that the amount of error associated with an observed score is no more than +/- 1.65 times SEM.
  • The probability is 95% that the amount of error associated with an observed score is no more than +/- 1.96 times SEM.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

confidence intervals for estimated true score

A

We can also construct confidence intervals around the estimated true score
-We can’t know the actual true score, but we can estimate it.

These confidence intervals tell us the range in which the person’s true score is likely to fall with a specified degree of certainty (probability)

These are the CI’s that are given in the table in the WAIS-IV Manual

Step 1. Calculate the estimated true score
Step 2. Calculate the standard error of estimate
Step 3. Calculate the desired confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

estimating the true score formula in words

A

Step 1. Subtract the Mean (M) from the observed score (Xo)
Step 2. Multiply Step 1 by the reliability of the test (rtt)
Step 3. Add the Mean to Step 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

standard error of estimate

A

Standard Error of Estimate (SEE) = SEM times reliability

17
Q

CI’s around estimated score….

A

…..will sometimes be asymmetrical around the obtained score

Reason: regression towards the mean

  • The estimated true score will always be closer to the mean compared to the observed score
  • Est True Score > Observed Score when observed score is below the mean
  • Est True Score < Observed Score when observed score is above the mean
18
Q

difference between estimated true scores and observed scores

A

GREATER when

  • Reliability is LOWER
  • Observed Score is farther from Mean

LESS when
-Reliability is HIGHER
Observed Score is closer to Mean

19
Q

standard error of difference

A
  • Used to decide if two scores are “significantly different” from one another
  • i.e., the observed difference between them is NOT just due to measurement error
20
Q

how to find SED in words

A
  • Step 1. Square the SEM of the first score
  • Step 2. Square the SEM of the second score.
  • Step 3. Add Steps 1 and 2
  • Step 4. Take the square root of Step 3.

The SED will always be larger than the larger of the two SEMs

21
Q

using SED

A
  • Multiplying the SED by 1.96 gives the amount of difference required for the scores to be considered significantly different at p < .05.
  • For the VCI and PRI, this difference is 1.96 times 4.50, or 8.82
  • The VCI and PRI must differ by at least 8.82 points (rounded to 9 points) in order for the difference to be considered statistically significant (not just due to measurement error) at p < .05.
22
Q

example of using SED

A
  • In other words, differences less than 9 points could be due entirely to measurement error and therefore cannot be considered “true” differences
  • VCI = 109 vs. PRI = 115. The difference is NOT statistically significant because the difference is only 6 points which is less than 9 pts.
  • A difference that is less than 9 points could be due entirely to measurement error, i.e., the true scores actually might not differ from one another.
23
Q

SED and WAIS-IV

A
  • To get more precise values for the minimum differences required for statistical significance at p < .05, we can use Table B.1 on p. 230 of the Administration Manual.
  • This table calculates values using the reliability of the indices within each specific age range.
  • Reliability of the indices varies slightly with age

For most purposes, the differences given in the WAIS-IV Interpretation Manual are sufficient