Lecture 10: Scales and Reliability Flashcards

Question 1

Q

Reliability –

Answer

A

consistency of measurement

Question 2

Q

Validity –

Answer

A

accuracy of measurement (Is the test or tool

measuring what it is meant to be measuring?)

Question 3

Q

________ Often assessed using a PCC.

For example, assessing _______ _______ often involves calculating the correlation between the measure and some other criterion for the same concept (e.g. the correlation between intelligence test
score and school grades).

Answer

A

validity

pearsons correlation coefficient

criterion validity

Question 4

Q

Reliability
development of multi-item scales, internal consistency of such scales may be measured using a function of the Pearson correlations between the items that make up the scale.

However, there are cases in which Pearson correlations are not an appropriate measure of reliability (e.g.).

Answer

A

the measurement of agreement between raters

Question 5

Q

Agreement between raters:

one criterion for reliability of a coding system for observations is ___ _______ _ ________ ______ _ _ __

Answer

A

the agreement between two (or more) raters.

• In psychological research, we are often interested in
obtaining reliable ratings of observations.
• If only a single person provides ratings, reliability cannot be assessed: we have no way of knowing whether the rater’s assessments are merely subjective impressions or assessments that can be agreed on intersubjectively.

• Yet, if two (or more) people rate the same observations
independently, we can assess whether raters are able to apply a given coding system consistently.

Question 6

Q

With one exception, Rater A rates consistently lower than the standard. The line in this plot represents perfect agreement: if the rater and the standard always agreed, then all points would be on the line

Answer

A

bias Rater A versus Standard

Question 7

Q

Rater B sometimes underestimates, and sometimes overestimates RF.
There is no evidence of bias, but Rater B’s ratings often differ considerably from the standard ratings. So Rater B’s ratings are unbiased, but imprecise.

Answer

A

Lack of Precision: Rater B versus

Standard

Question 8

Q

Rater C’s ratings are pretty precise. Although not all agree with the standard perfectly, all are ‘quite close’ to the standard. There is no indication of bias or of any other systematic mistake.

Answer

A

Good Agreement: Rater C versus

Standard

Question 9

Q

Rater D tends to overestimate scores when RF is low, and underestimate
scores when RF is high. Rater D tends to use values towards the middle of
the scale (points 3 to 6), in effect “shrinking” the scale. This might happen if
a rater is not very confident of their understanding of RF, and reluctant to
make clear judgements (i.e. low RF or high RF).

Answer

A

Scale Shift: Rater D versus Standard

Question 10

Q

In general, Pearson’s correlation coefficient ___ a good measure of inter-rater agreement –

why? B SD

Answer

A

is not

it does not detect bias (such as seen in Rater A), nor does it detect systematic differences in use of a scale (such as seen in Rater D).

Question 11

Q

A better alternative to measure agressment between raters

Answer

A

intraclass
correlation coefficient (ICC).

According to Pearson’s r, Rater A’s ratings agree best with the standard,
and Rater C is only in “third place”.
ICC shows, more usefully, that Rater C’s ratings are most reliable. Note
that both Pearson’s r and ICC are good at reflecting the imprecision of
Rater B.

Question 12

Q

Intraclass correlation coefficient (ICC) The ICC is a number that theoretically varies between 0 and 1

Answer

A

1 would indicate perfect agreement
• 0 would indicate absence of any relationship between ratings (which would be expected if a rater picked their numbers at random).

• … when estimating the ICC, it can sometimes happen that you obtain a negative ICC estimate. This would also indicate poor agreement.

Question 13

Q

Inter-rater reliability

You can use the ICC to assess:
You can also assess:

There are no generally accepted standards to say what
constitutes a ‘high enough’ ICC to judge a rater to be reliable. In psychology, often ICC>__ is judged to be ‘good enough’.

Answer

A

the agreement of one or more raters with a set of ‘standard’ ratings.

the agreement of two or more raters with one another, even in the absence of a ‘standard’.

.7

Question 14

Q

Pearson’s r only measures

It does not detect bias, and does not necessarily detect all types of imprecision.

Answer

A

the linear relationship between ratings.

Question 15

Q

There are different types of ICCs:

Answer

A

“Absolute Agreement ICCs” in SPSS, take into account both bias and imprecision
“Consistency ICCs” in SPSS, do not take bias into account.

Question 16

Q

Tips for assessing the reliability of a rater

make sure you have enough….

you can’t asses a rater’s reliabilty on the basis of X ratings

In many situations, the minimum number of ratings required is

The ‘subjects’ (the things being rated) should be sufficiently diverse to…

When analysing a set of ratings

Answer

Study These Flashcards

A

‘subjects’ to rate.
three
ten.
allow assessment of whether the rater can distinguish differences.
If your sample of interviews contains only parents with RF between 3 and 5, you will not be able to assess your raters’ ability to recognise interviews with RF = 9 or with RF = 1.

• plot the ratings in a scatterplot, and try to understand ‘what is going on’ (Do not rely on the ICC
on its own).

Lecture 10: Scales and Reliability Flashcards

(16 cards)