lecture 2- Reliability Flashcards by malaika imran

reliability is a ______ property of a test
-explain
-who is reliability important for and why

If a test is not reliable it will never be valid; i.e. reliability is a
necessary (but obviously not sufficient) condition for validity

Reliability is particularly important for applied psychologists
(clinical psychologists, clinical neuropsychologists, educational
psychologists) as they deal with individual cases
-

How well did you know this?

Not at all

Perfectly

what is a reliability coefficient

Reliability coefficients tell us how much of the variability
in scores on tests is true variability (i.e., signal) and how
much of it is measurement error (i.e., noise)

How well did you know this?

Not at all

Perfectly

what is
- true variability
- measurement error

-If a psychological test has a reliability coefficient of (say)
0.8, then 80% of the variability in scores is true variability
(i.e., the test is picking up real differences in the construct
being measured)

It follows that 20% of the variability in scores reflects measurement error – i.e., noise in the instrument
something that will affect the performance

How well did you know this?

Not at all

Perfectly

reliability coefficient

The reliability coefficient can be seen as a signal-to-(signal plus
noise) ratio
Reliability (i.e.,r11 ) =true variance /Total variance

You will often see the reliability coefficient denoted as r11 or
rxx because it can be seen as the test’s correlation with (a
strictly parallel version of) itself – there is always
measurement error so the correlation is not perfect

How well did you know this?

Not at all

Perfectly

why reliability is important- what does it allow for

Reliability allows us to quantify the confidence we have in our
test results and allows us to assess whether differences
between an individual’s scores are liable to reflect true
differences in ability or may have simply arisen by chance
(i.e., measurement error)

How well did you know this?

Not at all

Perfectly

can we reify a test score ?
-reliability coefficients

Psychologists are often warned not to reify a test score: it is
only an estimate of an individual’s true ability level or mood
level etc

Reliability coefficients allow us to form confidence intervals
on scores to help remind us of the above (we will cover this
later)

How well did you know this?

Not at all

Perfectly

what happens if we ignore reliability of tests
-chapman and chapman 1973 study

Furthermore, as much of clinical practice is concerned with
differences between an individual’s abilities, a failure to consider the
reliability of measures can lead the psychologist astray

Chapman & Chapman (1973) provided a classic illustration of
artefacts arising from differences in reliability
◦ Schizophrenic patients were compared to a healthy control sample on
two tasks
◦ The schizophrenic sample appeared to have a severe deficit on only one of the tasks (abstract reasoning)
◦ Was in fact the same task but one version rendered less reliable (by
shortening the test)

(they used a short version if the test and so the test was not that reliable) - the test also for the schiz group was shortened in half

How well did you know this?

Not at all

Perfectly

how high should reliability coefficients be

There is no absolute rule (will depend on purpose) but various
standards have been proposed:
◦ Nunnally & Bernstein (1994) take a hard line and propose that
reliability coefficients should be above 0.90

Others are less demanding:
◦ Sattler (2001) suggests that tests with reliabilities of 0.70 and
above should be considered to be “reliable”
◦ Similarly, Cicchetti (1994) suggests tests with reliabilities below
0.70 should be considered “unreliable

How well did you know this?

Not at all

Perfectly

can reliability be too high? high reliability as a problem
-give an example

Yes:
if we are trying to measure a broad, multifaceted, construct
then a very high reliability may indicate a problem (Boyle, 1985)
Suggests we’re not measuring the whole concept

Take example of an anxiety measure:
- We could ask people ten different ways about whether they
experience muscle tension (a symptom of anxiety)

-The “measure” would be very reliable but would not be a good
measure of anxiety itself - anxiety is multifaceted (the test just asks how tense they feel- this is just a symptom of anxiety but doesn’t necessarily measures anxiety itself reliably

How well did you know this?

Not at all

Perfectly

how can we decide if a test is reliable

Cronbach’s Alpha
Test-retest reliability

To be considered reliable a test should provide a consistent
measure

How well did you know this?

Not at all

Perfectly

what is Cronbach’s alpha
-used when
-determined by
-what does it indicate

-used in questionnaire type tests

Cronbach’s alpha is determined by:
(a) the number of items in the test
(b) the size of the correlations between the items

Longer tests are more reliable

Tests in which the items have higher correlations with each
other are more reliable

You don’t need any maths to see why that makes sense

How well did you know this?

Not at all

Perfectly

-reliability and test length
-vocabulary test

Take the example of a Vocabulary test

If we use only, say, 4 items the test is not going to be very reliable

There are an enormous number of words out there and we will not be
able to sample them at all well with only 4 items

Some people will, by chance, do much better on the particular 4 words than they would if we tested their vocabulary for all words
Equally, others will, by chance, do worse than their real overall level of
vocabulary knowledge However, if we up the number of words substantially, these chance
advantages or disadvantages will even out

How well did you know this?

Not at all

Perfectly

are all longer tests reliable?

longer tests will be more reliable only provided
other things are equal

Suppose a psychologist is developing a test and carefully
selects items they think will be suitable
If the reliability is disappointing, simply throwing in a bunch
of additional poor items (items that are not closely related to
the other items or have ceiling or floor effects) will not help
much

longer tests are more reliable provided that the items in the longer test are as good (as
highly correlated with the other items) as the shorter version

How well did you know this?

Not at all

Perfectly

how can psychologists save time and shorten teste/ short form tests

psychologists are always
looking for ways to save time and try and develop short-forms of tests

Sometimes this can be done with only a marginal lowering of reliability because poor items (e.g., items that are not highly correlated with the other items)
are selectively dropped

How well did you know this?

Not at all

Perfectly

reliability (cronbach’s alpha) is a function of…

reliability (Cronbach’s alpha) of a scale is a function of the correlation between items and the number of items
designed to measure the same underlying construct. It evaluates how closely related the items are as a group.

How well did you know this?

Not at all

Perfectly

Reliability coefficients for the WAIS-IV
-the reliability of a composite is a function…

Study These Flashcards

The reliability of a composite (an Index or IQ in this case) is a
function of the reliability of components (subtests) and the
correlation between the components

The reliability of a composite score, such as an index or IQ, indeed depends on both the reliability of its individual components (the subtests) and the correlations among those components.

do composites have superior reliability to the components

Study These Flashcards

Composites will always have superior reliability to the
components they are derived from if the components are
correlated (and they always are)

Can see this when compare the reliabilities of the WAIS-IV
subtests with those for the Indexes

reliability coefficients for WAIS - IV IQs and indexes

Study These Flashcards

The reliability of WAIS-IV Indexes and FSIQ are uniformly
excellent – among the highest of any psychological instrument

In case of FSIQ (in both US and UK), r11 is 0.98 so 98% of the variance in test scores is true variance and only 2% is measurement error

reliability for processing speed is ______ than others
why

Study These Flashcards

a bit lower
-in part because it is a composite made up of only two components (coding and symbol search)

what is temporal stability
-how is temporal stability tested

Study These Flashcards

Temporal stability refers to the extent to which a measure
yields consistent scores over time, i.e. stability coefficients
allow us to gauge extent to which performance is affected by
day to day fluctuations / differences in mood, testing conditions

-refers to a consistency if a measure over time

temporal stability is assessed using the test-retest method

the temporal stability or test retest reliability of a scale is simply….

Study These Flashcards

the correlation between scores at test and retest

why is it important to set an appropriate interval between administrations?
-why do you need to avoid inflating the estimate

Study These Flashcards

Normally the interval between administrations is set so it is
unlikely that true change has occurred in the underlying ability

Against that must be set the need to avoid inflating the estimate of stability due to a teste’s memory for their previous answers

Study These Flashcards

higher
some of chance fluctuations on components will cancel each other out

Temporal stability of FSIQ and Indexes are highly satisfactory
– again among the highest of any psychological instrument
Temporal stability of mental tests is generally very
impressive: e.g. Deary et al. (2000) e found a (corrected)
correlation of .73 between an IQ test administered at age 11
and again at age 77 (66 year follow-up)

what are practice effects on cognitive tests

Study These Flashcards

A psychologist often wants to know if an individual’s cognitive abilities have genuinely improved (e.g., as they recover from a head injury, or as a result of a psychological,
pharmacological or surgical intervention etc)

Similarly, a psychologist often wants to know if an individual’s cognitive abilities have genuinely deteriorated
(e.g., as a result of a degenerative condition, or as an unfortunate consequence of surgical intervention etc)

A complication is that there are practice effects on most
cognitive tests

Practice effects on cognitive tests refer to the improvements in test scores that result from repeated exposure to the same or similar assessments
May exaggerate or give false impression of recovery /
improvement
May mask a deterioration in functioning

WAIS IV has no alternative tests so the same test has to be administered if retetsing -would alternate forms abolish practice effects? -if a test had high test retest reliability, does this mean there will not be practice effects?

High test-retest reliability does NOT mean an absence of practice effects

example of practice effects -graph

To illustrate, here is an example where the test-reliability is 1.0 (i.e., scores at test and retest are perfectly correlated) However, everyone improved by 15 points (i.e., there is a large practice effect of 15 points -example: this case * scored 30 at test but 45 at retest (have they become familiar to the test-is it an improvement)

practice effects are fairly substantial for some WAIS-IV indexes -which subtests do they particularly mark effects on perceptual reasoning , working memory ,processing speed

Particularly marked on visuoperceptual / psychomotor subtests Practice effects are over 1/3rd of an SD for overall IQ, Perceptual Reasoning, and Working Memory On Processing Speed (PS) the practice effect is over 2/3rds of an SD (a massive effect) Perhaps counterintuitively, an identical score on PS at retest would therefore be a cause for concern!

are practice effects extreme in clinical settings?

Practice effects are not liable to be as extreme with more clinically realistic retest intervals

practice effects can still be detected after ___

a 7 year gap

Why is it important for psychologists to be aware of practice effects in cognitive testing?

Understanding Variability: Practice effects can vary across different tests, affecting interpretation. Informal Adjustments: Psychologists can informally factor in practice effects when interpreting a person's scores. Formal Methods: There are statistical methods available to account for practice effects in analyses.

lecture 2- Reliability Flashcards

(31 cards)