Chapter 4 reliability Flashcards

1
Q

Reliability

A
  • the consistency with which a test measures what it purports to measure in any given set of circumstances

psychological tests have both systematic and unsystematic sources of unreliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Systematic Errors

A

Systematic errors produce predictably incorrect results from a measuring process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unsystematic errors

A

constitute random variations

If you weighed your package with your hand resting on it, the pressure of your hand incorrectly inflated the weight of your package. Each time you weigh the package, the outcome varies depending on how hard you press on the scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

domain-sampling model

A

sees a test as a representative sample of the larger domain of possible items that could be included in the test

o test reliability becomes a problem of sampling; through sampling items from a domain of all possible items (not people)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

true position

A

The mean of the scores from all possible samples indicates the true position

the person’s ‘true score’ as it is called in classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The standard deviation

A

is the the distribution of scores from all possible samples about the true score would tell us about the likelihood of obtaining any particular sample score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

standard error of measurement:

A

an expression of the precision of an individual test score as an estimate of the trait it purports to measure

o we use a sample to make estimates of the likely true score for an individual

 The interval in which it lies, with a stated degree of confidence.
 If the interval is large, we have a great deal of imprecision in the measurement process and we cannot depend on any score we obtain with this sample of items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

reliability coefficient:

A

an index—often a Pearson product moment correlation coefficient—of the ratio of true score to error score variance in a test as used in a given set of circumstances

  • The reliability coefficient is used in forming judgments about the overall value of a particular test (e.g. is this a better test for some given purpose than another test?),
  • quantifies the degree of consistency. There may be many reasons why a test is not consistent, such as errors in assessment that occur when the testing environment has an influence on how the participants perform, or other issues related to how the tests are designed. Calculating the reliability coefficient can help us understand such errors in testing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

z scores

A

describes the position of a raw score in terms of its distance from the mean, when measured in standard deviation units.

The z-score is positive if the value lies above the mean, and negative if it lies below the mean.
• for scores expressed as standard normal deviates (z scores).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

error score variability

A

drawing samples repeatedly from a domain gives rise to variation in obtained scores AND THUS CAN BE THOUGHT OF AS a mixture of true score and error score variability that makes up the observed score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

variance

A
  • a measure of the spread, or dispersion, of scores within a sample
    o small variance indicates highly similar scores, all close to the sample mean
    o large variance indicates more scores at a greater distance from the mean and possibly spread over a larger range

• Variance: the sum of the squared deviation of each score from the mean of the scores (the distance between)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

observed score variance

A

The observed score variance = true score plus error score variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

variance and reliability

A
  • we use reliability coefficient to discover how much variance there is
  • proportion will be less than 1.0 and in some cases a good deal less.
  • 0.5 ( r = 0.5) = variance in the scores obtained within the test is due to variance in true scores and the other 50 per cent to errors of measurement.
  • If r = 1.0 (perfect reliability), the SEM is zero; that is, there is no error in estimating the true score.
  • the proportion of true score variance is zero (r = 0) then the SEM = 1, which is the standard deviation for a standard normal distribution

obtained score gives us no more information about the true score than any other score we might have obtained at random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The reliability coefficient is determined in three main ways:

A

1) the equivalent forms reliability
2) Split half reliability:
3) The formula for estimating the reliability of a test that is longer
o the formula for a test that is longer than the original test by some factor is given Spearman-Brown formula (Spearman-Brown prophecy): tell us about an otherwise unknown state of affairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

split-half reliability

A

the estimate of reliability obtained by correlating scores on the two halves of a test formed in some systematic way (e.g. odd versus even items)

 when speeded tests are being examined (those that must be completed within a time limit), this method of estimating reliability is not recommended.
 odd-even method is arbitrary and different reliability estimates can result from the one test split in different ways.

o Useful in overcoming logistical difficulties of test-retest reliability.
o Estimates of reliability based on split-half will be smaller than actual reliability scores due to use of smaller number of items in your calculation.
o Longer tests have higher reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Solution to inaccuracy of split half reliability

A
  • Cronbach proposed A test be split into subtests, each one item in length;
    o made up of k tests  (k= number of items in the test)
  • All subtests are correlated with all other subtests and the average correlation calculated= average correlation becomes the estimate of reliability.
  • This method essentially determines internal consistency of a test.
  • Cronbach’s alpha: an estimate of reliability that is based on the average intercorrelation of the items in a test
17
Q

Limitation of reliability

A
  • A test is not reliable in ALL situations
  • The variance of observed scores on the test is likely to differ depending on the particular sample of individuals we choose to study
  • It is better to think of the reliability coefficient and the SEM as applying to a test when applied to a particular type of sample and not as a property of the test itself.
18
Q

Generalisability theory

A

Cronbach- original theory be extended to include not just items but also occasions

  • Generalisability theory: asks the user to specify what generalisation they are seeking to make, and then ask whether there are data that support such a generalisation.
  • theory: a set of ideas and procedures that follow from the proposal that the consistency or precision of the output of a psychological assessment device depends on specifying the desired range of conditions over which this is to hold
  • Generalizability theory, or G theory, is a statistical framework for conceptualizing, investigating, and designing reliable observations. It is used to determine the reliability (i.e., reproducibility) of measurements under specific conditions. It is particularly useful for assessing the reliability of performance assessments
19
Q

Interrater reliability

A

the extent to which different raters agree in their assessments of the same sample of ratees

20
Q

which index of reliability is the correct way?

generalising to a domain of items

A
  • equivalent forms
  • split-half
    internal consistency
21
Q

which index of reliability is the correct way?

generalising over occasions of testing

A

-generalising over occasions of testing might be quite important and reliability needs to be assessed in terms of some version of the test-retest procedure.

In this situation, test-retest reliability of the measure being used is a prime concern. If a test is known to have scores that drift over time, then it is of little use for this type of assessment.

22
Q

reliability across different tests

A

Stanford binet:
median alternate forms reliability of 0.91.
The latest version - 0.95 to 0.98 at the scale level and 0.84 to 0.89 at the subtest level.

Weschler tests

  • Earlier versions for Full Scale IQ of -from 0.95 to 0.97.
  • The latest version of the WAIS (IV) report reliabilities of 0.98 at the scale level and 0.78 to 0.94 at the subtest level.

Thematic Apperception Test,
o test-retest reliability over periods of one to two months was no better than 0.26, split-half reliability about 0.27, and equivalent forms at best 0.48 and in some cases as low as 0.29.

self-report measures
o The reliabilities of self-report tests are closer to those of cognitive tests, in the order of 0.75 to 0.85 for commercially produced tests

23
Q

equivalent forms reliability

A

the estimate of reliability of a test obtained by comparing two forms of a test constructed to measure the same construct

o draw two samples from the domain of possible test items (minimises practise effects)
o if it doesn’t yield comparable scores, cannot be depended on.
o of a reasonably large sample, gives an estimate of the reliability coefficient.

  • Parallel Forms: same distribution of scores (means and variance equal)
  • Alternate Forms: different distribution of scores (mean and variance may not be equal)
  • Both tests are matched for CONTENT and DIFFICULTY
24
Q

limitations to cronbachs alpha

A

1) test can have a high internal consistency by having items with highly similar content HOWEVER the domain itself might be so constricted that its trivial.
2) high internal consistency does not guarantee that the items are all reflecting the one thing.
3) not good for multiple factors ( or traits); alpha can overestimate the reliability of the factor thought to underlie the test (the one referred to in the label on the test),

25
Q

Classical test theory

A

CTT explains how we can calculate a true score, which is basically the score a test taker would achieve if there were no error at all in the test-taking process. Since this is basically impossible, we look at someone’s observed score, which is the score he or she actually achieved. CTT basically tells us how consistent a test is.

-Classical test theory (CTT) in psychometrics is all about reliability.

Test scores are the result of:
o Factors that contribute to consistency
o stable attributes under examination (True Scores)
o Factors that contribute to inconsistency – characteristics of tests taker, test or situation that have nothing to do with attribute being tested but effect scores. (Errors of Measurement)
o X = T + e
o • X = obtained score
o • T = true score
o • E = errors of measurement