Reliability Flashcards

1
Q

what is ‘reliability’ in psychological testing?

A

consistency in measurement, and how much error the measurement tool has

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define the word reliability

A

the precision with which a test score measures achievement - higher reliability is better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a reliability in measurement?

A

reliability in measurement refers to the desired consistency or reproducibility of test scores - A reliable measure would be one that varies less from test to test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what assumption can be made when talking about test reliability?

A

assume that there is always some error in our measurement, that no test is perfectly reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

describe ‘random error’

A

assumption that the differences in scores we get each time is due to measurement error, because it is unlikely that the person’s true score has changed every time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is meant by ‘random error’?

A

observed scores vary across observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is meant by ‘systematic error’?

A

error is not random – it is fixed (=systematic error). There is the same amount of error. results are still reliable in a sense, but it is not accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the mathematical formula for reliability?

A

x = T + e
x - observed score
T - true score
e - random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what can we determine from observed scores?

A

the estimate of a person’s true score by taking the average of all their observed scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the assumptions for classical test theory?

A

1) each person has a true score we could obtain if there was no measurement error
2) there is measurement error - but this error is random
3) the true score of an individual doesn’t change with repeated applications of the same test, even though their observed score does
4) the distribution of random errors and thus observed test scores will be the same for all people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does SEM stand for?

A

standard error of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how do you work out measurement error?

A

by working out how, on average, an observed score on the test differs from the true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does working out measurement error give you?

A

standard deviation (SD) of the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the SD tell us?

A

the standard error of measurement - The smaller this SEM, the more accurate and reliable a test is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

describe the domain sampling model

A

central concept of Classical Test Theory
If we construct a test on something, we can’t ask all possible questions pertaining to that construct, use only some test items of all possible test items on that construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a possible issue with the domain sampling model?

A

The fewer items we have (questions we asked), the more likely that we do not have a good spread of easier, mediocre and difficult questions, and therefore can lead to the introduction of error. It is therefore important that test items adequately sample the construct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is reliability analysis in the domain sampling model?

A

reliability analysis is to figure out how much error we would have if we use a score from a test that includes a sample of all possible questions as the estimate of your true ability. We can do this because the observed score should be correlated with the true score. We also know that as the sample gets larger, the estimate is more accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the different types of test reliability?

A

test-retest reliability, parallel forms reliability, internal consistency, inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is test-retest reliability?

A

Test-retest reliability is investigated when we give someone a test at one point in time and then give them the same test a later point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what factors play into test-retest reliability?

A

if the scores are highly correlated we have a high test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is score correlation in test-retest reliability referred to as?

A

co-efficient of stability

20
Q

what is the source of error measured in test-retest reliability?

A

time sampling - The source of error and the reason for some difference in scores across the two occasions is time that has passed, hence the source of error measured is time sampling.

21
Q

what are some issues with test-retest reliability?

A

wouldn’t be useful to calculate when our test measures something that changes with time (eg. stress or mood) take into account that the person’s score will likely increase the 2nd time around because they have taken the test before. something could happen in between taking the test the first time and taking it the second time that would affect the correlation

22
Q

what is parallel forms reliability?

A

two different forms of the same test are administered

23
Q

how do we know if we have high parallel form reliability?

A

if the scores are highly correlated = good parallel forms reliability

24
Q

what is the score correlation in parallel forms reliability referred to as?

A

co-efficient of equivalence

25
Q

what is the co-efficient of equivalence?

A

the correlation between two scores in parallel forms reliability

26
Q

what is the source of error measured in parallel forms reliability?

A

item sampling - The source of error and the reason for some difference in scores across the two occasions is linked to the items that were included in each test, hence the source of error measured is item sampling.

27
Q

what are some issues with parallel forms reliability?

A

what if we give the different forms to people at two different times - How would that influence the correlation? We also need to think about if we give the different forms to the same people or different people. Which will be more useful and practical? Importantly, what if we don’t have two forms of a test, then we’ll likely need to set a new test in order to assess reliability.

28
Q

what is internal consistency reliability?

A

the reliability of one test administered on one occasion - Internal consistency asks the question of whether different items within one test all measure the same thing, to the same extent

29
Q

what does internal consistency reliability do?

A

Internal consistency asks the question of whether different items within one test all measure the same thing, to the same extent

30
Q

what are the ways of measuring of internal consistency?

A

split-half reliability, coefficient alpha

31
Q

where do errors in split-half reliability and coefficient alpha come from?

A

In each instance the error comes from internal consistency. The less each item in a questionnaire taps into the construct being measured, the less internally consistent the questionnaire is.

32
Q

how do you get a reliability estimate using split-half reliability?

A

correlated the scores on the two halves to get a reliability estimate

33
Q

what is split-half reliability?

A

when you split the number of items in the question in half and correlate the scores on the two halves to get a reliability estimate

34
Q

what is an advantage of split-half reliability?

A

only need one test, don’t need two forms of a test

35
Q

what is a challenge of split-half reliability?

A

to divide the test into equivalent/equal halves

36
Q

how do split-half reliability and the domain specific model relate?

A

when we ask too few questions, it does make it difficult to adequately sample from the entire domain.

37
Q

what is a result of using split-half reliability relating to the domain specific model?

A

we will likely underestimate our reliability, For this reason, when we calculate split-half reliability and we are concerned about the number of items in the test being too few when split in half, we use a calculation to correct for the few items and better estimate reliability. This calculation is called the Spearman-Brown Correction.

38
Q

what is the Spearman-Brown Correction?

A

calculation used to correct for too few items, for a better estimate of reliability

39
Q

why would you apply the spearman-brown correction?

A

Because each half of the split test will have reduced reliability compared to the total test - we apply this correction

40
Q

what does the spearman-brown correction do?

A

The SB correction formula essential corrects for the number of items changing in a test

41
Q

what challenge relates to the spearman-brown correction on split-half reliability?

A

the correlation changes each time, depending on which items are put in each half. We will get different reliability coefficients for each different split. Ideally we should have equivalent halves. This is a challenge in itself.

42
Q

how do we solve the challenge relating to the spearman-brown correction on split-half reliability?

A

coefficient alpha

43
Q

what does the coefficient/Cronbrach’s alpha do?

A

coefficient/Cronbach’s Alpha estimates the consistency of responses to different scale items. It takes the average of all possible split-half correlations into account when calculating reliability.

44
Q

how do you interpret the coefficient results?

A

0.00= no consistency in measurement
1.00= perfect consistency in measurement

45
Q

what levels of reliability based on coefficient results are appropriate?

A

0.70 = exploratory research
0.80 = basic research
0.90 = applied scenarios

46
Q

what correlation can be seen between reliability and the number of items?

A

We see a positive, non-linear correlation between the number of items and the reliability of a questionnaire. Rapid increase in internal consistency from 2 to 10 items, a steady increase from 11 to 30 items, tapers off at about 40 items. reason that most questionnaires aim to include between 20 and 40 items.

47
Q

what are some things Cronbach’s alpha can be affected by?

A

multidimensionality, bad test items, number of items

48
Q

what is inter-rater reliability?

A

Inter-rater reliability measures how consistently 2 or more raters/observers/judges agree on rating something

49
Q

how is inter-rater reliability score interpreted?

A

When correlate scores of 2 raters or judges, use Cohen’s Kappa. When calculating correlations between more than 2 raters’ scores, use Fleiss’ Kappa.
Anything above .75 is considered excellent agreement, while somewhere between .4 and .75 is considered satisfactory, and below .40 is considered poor. If inter-rater agreement is poor, we cannot use the data.

50
Q

how can reliability be improved?

A

increase the number of items, item analysis, inter-rater training, pilot testing, clear conceptualization, standardizing administration