Reliability Flashcards

1
Q

What is reliability?

A

The extent to which a measurement tool gives you consistent measures; also refers to the degree to which test scores are free from errors of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we measure reliability?

A

By seeing if someone gets close to the same score if they complete the same questionnaire a number of times; or if people give similar responses to a series of questions supposed to be measuring the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is classical test theory?

A

The traditional conceptual basis of psychometrics; it’s the idea that every actual/observed score we measure can be decomposed into two parts: the true score and measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the true score, according to classical test theory?

A

The aspect of what we strive to measure; this is constant for an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is measurement error, according to classical test theory?

A

What we don’t want to measure; it’s random, and unrelated to the true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the equation for classical test theory?

A

X (observed score) = T (true score) + E (errors of measurement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a true score theory?

A

Another term for classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is reliability in terms of the relationship between true and total variance?

A

Reliability (r) is the proportion of true variance (variation of test scores in a sample without measurement error) to the total variance (actual variation in data – including error); true variance/total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a lower measurement error tell us about the reliability

A

It will be higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do we describe classical test theory in terms of variance rather than standard deviations?

A

Because it is additive and can be broken up into components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If a person took the same test multiple times and we ended up with a lower reliability, what would we expect in regards to the spread of their scores?

A

We’d expect them to be more spread out due to measurement error (and less spread out if higher reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the various sources of measurement error

A

Test construction (item sampling/content sampling); Test administration (e.g. distractions during the test, fatigue, etc); Test scoring (e.g. biased examiners, ambiguous scoring guidelines, technical errors); Other influences (e.g. self efficacy, motivational factors, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is item sampling/content sampling?

A

Only certain items or content are included in the test so scores may vary according to this; e.g. in an exam, not everything is included so people may be advantaged or disadvantaged depending on what they focused on when revising

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why can we only estimate the reliability of a test and not measure it directly?

A

Because true variance is hypothetical/theoretical, so instead we estimate reliability via different methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are four methods available to us to help estimate the reliability of a test?

A

Internal consistency; Test-retest reliability; Alternate-forms reliability; Inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is internal consistency (aka inter-item consistency or internal coherence)?

A

It’s how much the item scores in a test correlate with one another on average; are responses consistent across items? (e.g. Cronbach’s alpha, KR-20)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is test-retest reliability?

A

The correlation between scores on the same test by the same people done at two different times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why might test-retest reliability not always be appropriate?

A

People might remember from the first attempt (but counterbalancing alternate forms can get around this)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Cronbach’s alpha, and when should it be used?

A

A measure of internal consistency; when there’s more than 2 possible outcomes to a question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe the steps involved in calculating Cronbach’s alpha by hand.

A
  1. Split the questionnaire in half
  2. Calculate the total score for each half
  3. Work out the correlation between the total scores for each half
  4. Repeat steps 1-3 for all possible two way splits of the questionnaire
  5. Work out the average of all possible split-half correlations
  6. Adjust the correlation to account for shortening the test by applying a special version of the Spearman-Brown formula
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is KR-20 and what is it used for?

A

Kuder-Richardson 20 formula, used for estimating internal consistency for questions with 2 outcomes (e.g. true/false); like Cronbach’s alpha, it gives an estimate of the mean of correlations between all possible halves of your questionnaire (then corrected for halving)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What’s the difference between parallel forms and alternate forms?

A

Parallel forms are 2 versions of a test where their mean, SD and correlations with other tests must be the same; Alternate forms are 2 versions that only need equivalent content and difficulty level

23
Q

What is the coefficient of equivalence in the context of a test with parallel forms?

A

It’s the correlation between two versions of the same test

24
Q

List five issues/situations that might affect which reliability estimate you use

A
  1. Homogeneity/heterogeneity of the test
  2. Static vs. dynamic characteristics
  3. Restriction of range/variance
  4. Speed tests
  5. Criterion-referenced tests
25
Q

If a test is heterogeneous, what might be an inappropriate estimate of reliability, and what could you do about this?

A

Internal consistency; could look at each subscale separately (e.g. the big 5 personality test)

26
Q

What’s the difference between measuring something static and something dynamic?

A

Static remains the same over time (e.g. intelligence); dynamic is expected to change over time (e.g. fatigue/state anxiety)

27
Q

Which reliability measure could have issues if measuring something dynamic?

A

Test-retest reliability, as it assumes the thing being measured stays the same

28
Q

What’s the difference between a homogeneous test and a heterogeneous test?

A

Homogeneous – the test items measure the same thing; Heterogeneous – more than one independent thing is being measured (i.e. there are subscales that don’t intercorrelate highly)

29
Q

What special considerations do we need to take into account when analysing the psychometric properties of a speed test compared with a power test?

A

Measuring internal consistency may not be appropriate for a speed test as people may get all attempted questions correct but run out of time (spurious correlation between items)

30
Q

Which reliability measures would we use for a speed test?

A

Alternate-forms or test-retest reliability

31
Q

How does having more items on a test affect it’s reliability?

A

It tends to increase

32
Q

How can we estimate the change in reliability if the test is shortened or lengthened?

A

Using the Spearman-Brown formula

33
Q

What does n stand for in the Spearman-Brown adjusted reliability formula?

A

Number of items in new test divided by number of items in old test

34
Q

Does doubling the length of a test mean doubling the reliability?

A

No, increasing the numbers of items in a test has diminishing returns

35
Q

What are the performance measures used in the Neale Analysis of reading?

A

Reading accuracy, reading rate, and comprehension

36
Q

The Neale Analysis of reading tests oral reading comprehension and fluency aimed at what age group?

A

Students aged 6 – 12 years; also used to diagnose reading difficulties in older readers

37
Q

How is a Neale Analysis of reading test carried out?

A

The child reads a selection of stories out loud then completes a comprehension test on the story; test administrator notes any errors and length of reading time

38
Q

In clinical practice, we usually only test a client on a particular test once or twice. No test is perfectly reliable so their scores will be an inaccurate measure of the underlying trait. What does this mean?

A

It’s critical to know HOW INACCURATE the scores we obtain from the test are likely to be in order to make sensible judgements

39
Q

Describe in detail what the standard of error of measurement is supposed to represent

A

If an individual takes a test a number of times, we assume the distribution will be approximately normal; the average distance of each attempt from their average obtained score (the SD) is the SEM, which tells you the likely margin of error (confidence interval) in the test score

40
Q

Why do we have to add and subtract double the SEM from an individual score in order to get the 95% confidence interval?

A

Because we assume that the client’s true score (their score if perfectly reliable) is at the middle of the distribution, where we’d also assume the actual score would be if they only took the test once

41
Q

What is the margin of error we usually report?

A

That 95% of an individual’s scores will be within 2 SEMs of the true score (+/- 2 standard deviations)

42
Q

If we assume a normal distribution, what percentage of an individual’s scores will be within 1 SEM of the true score?; 3 SEMS?

A

68%;

99.7%

43
Q

Since we can’t actually work out the SEM by testing every client multiple times on a test, how do we work out what the SEM is?

A

We estimate it: SEM = Sx (SD of lots of test-taker’s scores, not one individual’s) * square root of (1 – reliability of test); or can look up in a table

44
Q

What can we assume about the SEM?

A

That it will be the same for everyone who takes the test

45
Q

When estimating the SEM, if 95% of scores fall within 2 SD (1.96) of the mean (which is the person’s actual score), then the 95% confidence interval is what?

A

The actual score +/- (2 x the SEM)

46
Q

If the reliability for the WAIS IQ test is .98 with a SD of 15, how would we work out the SEM?

A

SEM = 15 * (square root of) 1-.98 =2.12

47
Q

If someone gets an IQ score of 105, what would their 95% confidence interval be?

A

105 +/- (2 * 2.12); so from 101 to 109

48
Q

The standard error of the difference is used to work out whether someone’s score is significantly different from what?

A

Their own score on the same test (e.g. after intervention); Their score on another test of the same thing; Someone else’s score on the same test; Someone else’s score on another test

49
Q

If the 2 scores differ by more than 2 SEdiff, then what can we say?

A

That they’re significantly different at a 95% level of confidence

50
Q

The reliable change index is often used in clinical contexts in place of the standard error of the difference; how is it calculated?

A

Work out the difference between 2 scores (e.g. changes after an intervention) and divide by the SEdiff; if the reliable index is greater than 1.96 (i.e. 2 SEdiff) then the change is statistically significant

51
Q

What does alternate forms reliability measure?

A

How much a person’s scores correlate if they do two equivalent versions of the same test (sometimes called coefficient of equivalence)

52
Q

What is inter-rater reliability?

A

The correlation between scores on the same test by the same people provided by two different examiners

53
Q

What will be affected if scores are inappropriately restricted in the amount they can vary?

A

The correlation, and ALL reliability estimates

54
Q

How might criterion-referenced tests affect reliability estimates?

A

There may be little variation in people’s responses (e.g. first-aid tests where most people pass); there’s a restriction of range, leading to problems with ANY reliability estimates (as they’re all based on variance)