Lec2 - Ch6 Empirical estimates of reliability Flashcards

Empirical estimates of reliability

1
Q

why can’t reliability be calculated through real data?

A

reliability is a theoretical property of test scores, therefore it can only be estimated
(impossible to know the real true and error scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the three methods to estimate reliability?

A
  • alternate forms
    > two versions of the same test
  • test-retest
    > two times of testing
  • internal consistency
    > parts of the test
  • see picture 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What must we take into account regarding the methods of estimation?

A
  • no single method is completely accurate
    > the accuracy of each method depends on a variety of assumptions
  • each method requires at least two testings
  • consistency is at the basis of reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Alternate forms method

A
  • two different forms of the same test
    > compute the correlation between the two forms and interpret it as estimation of reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

when can we use the alternate forms method?

A
  • only if the two test forms are parallel
    > identical true scores
    > same error variance
    > correlation = reliability
    !! we can never be entirely sure that the two tests are parallel, but if they are “close enough”, then we could still use this method
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the disadvantages of alternate forms method?

A
  • we cannot be sure that the tests are parallel
  • we cannot be sure that the alternate forms reflect the same psychological construct
  • potential for carryover or contamination effect (due to repeated testing)
    > might cause error scores on one form to be correlated with error scores on the other form
    -* see picture 2*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Test-retest method

A
  • administering the same test twice
  • correlation = reliability
    > the lower the correlation, the higher the effects of measurement error
  • sure to measure the same construct
  • “stability coefficient”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

when can we use the test-retest mathod?

A
  • when the tests are parallel
  • when the tests are measuring a trait-like psychological construct (stable, does not change between tests)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are some disadvantages of test-retest method?

A
  • carryover effects
    > make the two tests situations as similar as possible
  • true scores can change between pre and post-test
    > some psychological attributes are unstable across time (e.g. mood)
  • if traits change, the correlation represents the reliability and the amount of change
  • many requirements (taking test twice - expensive - time-consuming - …)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

test-retest method

what are three factors affecting the confidence of the assumption of stability of traits?

A
  • kind of attribute measured (trait-like vs transient)
  • length of test-retest interval
    > large intervals = large psychological changes
    > short intervals = carryover or contamination effects
  • period at which intervals occur
    > e.g. different to measure knowledge depending on age
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Internal consistency method

A
  • complete only one test, once
  • can be used for composite test scores
  • differnt parts of the test can be treated as different forms of a test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

internal consistency method

what factors affect the reliability of test scores?

A
  • consistency among parts of the test
    > if strong correlation, then likely to be reliable
  • test’s length
    > long test is more likely to produce reliable scores than short test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the four internal consistency methods?

A
  • split-half approach
  • “raw alpha” approach
  • “standardized alpha” approach
  • omega
  • see picture 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Split-half estimates of reliability
- how to calculate it?

A

1- divide the items in two parallel subtests
> equal true scores and error variance
2- compute correlation between subtests
> if reliable, you find consistency between the two halves
3- enter correlation in formula (Spearman-Brown formula)
> we use formula because correlation is based only on halves
- see picture 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

difficulties of the split-halves method

A
  • arbitrary choice of how to split the test
    > all items should be highly correlated with each other (unrealistic)
  • not accurate for speeded tests
    > split-halves reliability is almost always 1 (unrealistic)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

in the table, which one is the correlation and which one is the reliability?

A

see picture 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

in the item-level perspective, what do the methods differ on?

A
  • different response formats (binary vs nonbinary items)
  • applicability to data for different assumptions (parallel vs less strict tests)
  • different forms of data used (item variances, covariances, …)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Raw coefficient alpha

A
  • each item is conceived as a subtest
    → consistency of all items is used to estimate the erliability of scores for the whole test
  • (Cronbach’s alpha)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how do you compute Cronbach’a alpha?

A

1- variance of scores on complete test
2- covariance between each pair of items
3- sum the covariances
4- variance and covariances in equation
- see picture 6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cronbach’s alpha
- lower bound to the reliability

A
  • cronbach’s alpha underestimates reliability
  • real reliability is usually equal or higher than cronbach’s alpha
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what does it mean to have a 0 covariance between two items?

A
  • differences among participants’ responses in item 1 are inconsistent with differences among responses in item 2
    > they don’t measure the same construct (or)
    > one is heavily affected by measurement error
  • we would like all positive covariances among items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what does the sum of covariances indicate?

A
  • it reflects the degree to which responses to all of the items are generally consistent with each other
  • the larger the sum is, the more consistent the items are with each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how do we make inferences from our sample to the population with Cronbach’s alpha?

A
  • use sample’s alpha as point estimate for the population
  • confidence interval (reflects that point estimate is a guess)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is a confidence interval?
+ important facts

A
  • represented by two values
  • usually 95% C.I.
  • “we are 95% confident that the alpha of the population lies between those two values”
    ! small samples will produce a wide and imprecise confidence interval
    ! negative values in C.I. are inconsistent with the concept of reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

why would we obtain a negative alpha estimate?

A
  • there is something wrong with the test
  • test is fine but one or more items need their score to be reversed
26
Q

what can be established through statistical procedures, regarding alpha?

A
  • whether alpha is a specific value in the population
  • whether alpha in one population is the same alpha in another population
  • whether alphas for two different tests are the same in a given population
27
Q

Standardized Alpha
- when to use it

A
  • (generalized Spearman-Brown formula)
  • appropriate if a test score is created by aggregating standardized responses to the items
    = to be used if responses are standardized
  • provides the most straightforward perspective on reliability
    > fundamental and intuitive terms
    ! standardized and raw alpha procedures often produce similar estimates in real data
28
Q

why would a test user standardize the responses?

A
  • if variances are very from each other
    > the overall score would reflect the items having the largest variabilities
29
Q

when is the differential weighting of items a big problem?

A
  • if scores from different measures were combined to form a new composite measure
  • if items had different response scales
30
Q

how to compute the standardized alpha

A

1- calculate correlations between each pair of items
> reflect degree to which differences among the participants’ response to the items are consistent with each other
2- average the correlations (= average interitem correlation)
> reflects degree to which responses to all items are generally consistent with each other
3- compute equation
- see picture 7

31
Q

KR20

A
  • “Kunder-Richardson 20”
  • raw alpha for binary items
32
Q

how can the KR20 be computed?

A

1- compute proportions of each of the possible answers for each item
> e.g. 0.75 for “yes” and 0.25 for “no” in item 1…
2- calculate variances for each item
3- calculate total score (sum all responses to items)
4- calculate variance of total score
5- use formula to calculate coefficient
- see picture 8

33
Q

why is alpha so commonly used in psychology?

A
  • most statistical packages produce alpha coefficient
  • alpha requires little effort compared to other methods
  • alpha is based on data that is relatively easy to acquire
34
Q

with what kind of tests is the alpha method used?
- why is that impractical?

A
  • test have to be essentially tau-equivalent or stricter
    > items are linked to construct of test but to different degrees
    > many tests are not unidimensional
    > many tests have correlated error
    ! it is a matter of degree (can still be ok even if tests are not perfectly adherent to all assumptions
35
Q

How to compute Omega

A

1- factor analysis on participant’s responses to test item
> signal = factor loading
> strong factor loading → strong correlation to true score
2- sum signal and noise values across all items on a test (this gives a “total signal” and “total noise” values)
3- ratio of values to estimate reliability

36
Q

Omega
- what is it based on?

A
  • conceptually: based on the idea that reliability is the ratio of a signal and noise
  • computationally: based on factor analysis (to estimate signal and noise)
37
Q

What tests can Omega be applied to?

A
  • congeneric or stricter
    > wider range of circumstances than alpha
38
Q

what fault do all internal consistency methods have in common?

A

they might overestimate reliability
- fail to account for measurement error that transcends a single measurement occasion (version of carryover effect)
→ factors could influence the response to all items equally
→ this creates error correlation
→ this inflates reliability estimation

39
Q

internal consistency vs dimensionality

A
  • internal consistency ≠ dimensionality / internal homogeneity
    !! reliability based on internal consistency could be high even in multidimensional tests
40
Q

what are the three main factors affecting reliability?

A
  • consistency among parts of tests
  • length of tests
  • heterogeneity
41
Q

factors affecting reliability

Consistency among parts of tests

A
  • greater internal consistency → greater reliability
    > high consistency among items = high consistency between observed and true scores
42
Q

how can we estimate the reliability of a revised test?

A
  • Spearman-Brown prophecy formula
    > it forecasts what would happen if a test was revised in a particular way
  • equation for standardized alpha can also be used
  • see picture 9
43
Q

factors affecting reliability

how can we improve the internal consistency of a test?

A
  • rewrite some items to make them clearer
  • replace one or two items altogether
44
Q

factors affecting reliability

Length of a test

A
  • a long test produces more reliable scores than a short test
    > as length of test increases, true score variance increases more than error score variance
    > when doubling a test, true score variance quadruples while error score variance only doubles
45
Q

what must we consider when increasing the length of a test?

A
  • link between length and reliabilty is true only when the new items are parallel to the old ones
    > average interietm correlation would remain the same
  • practical limits on the number of items that can be included in a test
  • see picture 10
46
Q

sample heterogeneity

A
  • the greater the variability among respondents with respect to the psychological attribute being measured, the larger the reliability coefficient
  • reliability likely to be higher in heterogeneous sample compared to homogeneous sample
47
Q

what are the implications of sample heterogeneity?

A
  • test might produce scores that are highly reliable in one sample but not reliable in another sample
  • it highlights the utility of generalization studies
48
Q

Reliability generalization study

A
  • it reveals the degree to which a test produces different reliability estimates across different kinds of populations and research uses
  • used to identify and understand the ways in which sample characteristics affect the reliability of test scores
49
Q

Difference scores

A
  • if person takes test on two different occasions, you can subtract one of the test scores from the other score, creating a different score (Di = Xi - Yi)
50
Q

Intraindividual change scores

A
  • each person has two scores on the same test
  • the difference scores is the score obtained by subtracting one score from the other
51
Q

Intraindividual discrepancy scores

A
  • each person has scores on two measures from different tests
  • difference indicates discrepancy between attributes
52
Q

Interindividual difference scores

A
  • two different people are given the same test and the score of one person is subtracted from the score of the other
53
Q

how can you estimate the reliability of Difference scores?

A
  • see picture 11
54
Q

factors affecting the reliability of difference scores

A
  • correlation between tests’ observed scores
  • test score reliability
55
Q

how does the correlation between tests’ observed scores affect the reliability of difference scores?

A
  • the higher the correlation, the lower the reliability
    > the reliability of difference scores is highest when the tests are uncorrelated with each other
    > as the correlation increases, the reliability decreases
56
Q

how does the reliability of tests scores affect the reliability of difference scores?

A
  • test scores with high reliability produce difference scores that also have high reliability
    > test scores with low reliability produce difference scores with also low reliability
57
Q

what are the main four concepts to remember regarding the reliability of difference scores?

A
  • difference scores will be reliable if the tests are uncorrelated
  • difference scores will be reliable if the tests have high reliability
  • the reliability of difference scores will not be higher than the average reliability of the two individual test scores
  • the reliability of difference scores can be much smaller than the reliability of the two sets of individual test scores
58
Q

what is the main problem with difference scores?

A
  • the lack of variability in the difference scores can mask clear differences in their component scores
  • they might essentially reflect only one of the two variables used
    > e.g. pay attention to the measurement scale of the two tests
59
Q

discriminant validity

A

Degree to which test scores are not associated with other measures with which they should not be associated

60
Q

what is a solution for tests having different variabilities?

A
  • standardize the test scores before calculating the difference scores so that they share a common metric
    ! it still might not make sense to subtract one score from the other
    > difference scores are most meaningful if the two tests have the same construct
61
Q

in what cases do difference scores have psychometric faults?

A
  • high intercorrelations between the component tests
  • poor reliability of the component tests
  • unequal variances in the component tests