lecture 2- Reliability Flashcards
reliability is a ______ property of a test
-explain
-who is reliability important for and why
If a test is not reliable it will never be valid; i.e. reliability is a
necessary (but obviously not sufficient) condition for validity
Reliability is particularly important for applied psychologists
(clinical psychologists, clinical neuropsychologists, educational
psychologists) as they deal with individual cases
-
what is a reliability coefficient
Reliability coefficients tell us how much of the variability
in scores on tests is true variability (i.e., signal) and how
much of it is measurement error (i.e., noise)
what is
- true variability
- measurement error
-If a psychological test has a reliability coefficient of (say)
0.8, then 80% of the variability in scores is true variability
(i.e., the test is picking up real differences in the construct
being measured)
It follows that 20% of the variability in scores reflects measurement error – i.e., noise in the instrument
something that will affect the performance
reliability coefficient
The reliability coefficient can be seen as a signal-to-(signal plus
noise) ratio
Reliability (i.e.,r11 ) =true variance /Total variance
You will often see the reliability coefficient denoted as r11 or
rxx because it can be seen as the test’s correlation with (a
strictly parallel version of) itself – there is always
measurement error so the correlation is not perfect
why reliability is important- what does it allow for
Reliability allows us to quantify the confidence we have in our
test results and allows us to assess whether differences
between an individual’s scores are liable to reflect true
differences in ability or may have simply arisen by chance
(i.e., measurement error)
can we reify a test score ?
-reliability coefficients
Psychologists are often warned not to reify a test score: it is
only an estimate of an individual’s true ability level or mood
level etc
Reliability coefficients allow us to form confidence intervals
on scores to help remind us of the above (we will cover this
later)
what happens if we ignore reliability of tests
-chapman and chapman 1973 study
Furthermore, as much of clinical practice is concerned with
differences between an individual’s abilities, a failure to consider the
reliability of measures can lead the psychologist astray
Chapman & Chapman (1973) provided a classic illustration of
artefacts arising from differences in reliability
◦ Schizophrenic patients were compared to a healthy control sample on
two tasks
◦ The schizophrenic sample appeared to have a severe deficit on only one of the tasks (abstract reasoning)
◦ Was in fact the same task but one version rendered less reliable (by
shortening the test)
(they used a short version if the test and so the test was not that reliable) - the test also for the schiz group was shortened in half
how high should reliability coefficients be
There is no absolute rule (will depend on purpose) but various
standards have been proposed:
◦ Nunnally & Bernstein (1994) take a hard line and propose that
reliability coefficients should be above 0.90
Others are less demanding:
◦ Sattler (2001) suggests that tests with reliabilities of 0.70 and
above should be considered to be “reliable”
◦ Similarly, Cicchetti (1994) suggests tests with reliabilities below
0.70 should be considered “unreliable
can reliability be too high? high reliability as a problem
-give an example
Yes:
if we are trying to measure a broad, multifaceted, construct
then a very high reliability may indicate a problem (Boyle, 1985)
Suggests we’re not measuring the whole concept
Take example of an anxiety measure:
- We could ask people ten different ways about whether they
experience muscle tension (a symptom of anxiety)
-The “measure” would be very reliable but would not be a good
measure of anxiety itself - anxiety is multifaceted (the test just asks how tense they feel- this is just a symptom of anxiety but doesn’t necessarily measures anxiety itself reliably
how can we decide if a test is reliable
- Cronbach’s Alpha
- Test-retest reliability
To be considered reliable a test should provide a consistent
measure
what is Cronbach’s alpha
-used when
-determined by
-what does it indicate
-used in questionnaire type tests
Cronbach’s alpha is determined by:
(a) the number of items in the test
(b) the size of the correlations between the items
Longer tests are more reliable
Tests in which the items have higher correlations with each
other are more reliable
You don’t need any maths to see why that makes sense
-reliability and test length
-vocabulary test
Take the example of a Vocabulary test
If we use only, say, 4 items the test is not going to be very reliable
There are an enormous number of words out there and we will not be
able to sample them at all well with only 4 items
Some people will, by chance, do much better on the particular 4 words than they would if we tested their vocabulary for all words
Equally, others will, by chance, do worse than their real overall level of
vocabulary knowledge However, if we up the number of words substantially, these chance
advantages or disadvantages will even out
are all longer tests reliable?
longer tests will be more reliable only provided
other things are equal
Suppose a psychologist is developing a test and carefully
selects items they think will be suitable
If the reliability is disappointing, simply throwing in a bunch
of additional poor items (items that are not closely related to
the other items or have ceiling or floor effects) will not help
much
longer tests are more reliable provided that the items in the longer test are as good (as
highly correlated with the other items) as the shorter version
how can psychologists save time and shorten teste/ short form tests
psychologists are always
looking for ways to save time and try and develop short-forms of tests
Sometimes this can be done with only a marginal lowering of reliability because poor items (e.g., items that are not highly correlated with the other items)
are selectively dropped
reliability (cronbach’s alpha) is a function of…
reliability (Cronbach’s alpha) of a scale is a function of the correlation between items and the number of items
designed to measure the same underlying construct. It evaluates how closely related the items are as a group.