L3. Reliability - bias and empirical estimates Flashcards

1
Q

reliability
is it achieveable?

and its importance

A
  • consistency of measurement
  • is a property of test scores, not the test itself
  • Reliability reflects the degree of correspondence between observed scores and true scores.
  • can observe reliability via repeated measures
  • in practice because of measurement error there will never be perfect reliability
  • is a necessary but not sufficient condition for validity
  • greater variance is good and it indicates a greater contribution of the reliable variance
  • if we only get measurement errors we will never find any associations between variables and the whole exercise is pointless
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classical Test Theory (CTT)
True score
observed score
error score

A

CTT
- is a measurement theory that defines the conceptual basis of reliability and procedures for estimating the reliability of scores

True score
- A true score is a hypothetical score devoid of measurement error (not a construct thought)
- maybe perfect in terms of reliability but not in terms of validity.
- should be highly correlated with observed scores

Observed score
- Observed scores are the scores we obtain from tests or instruments.
- True score + error = observed score.
- All other things equal, we want the observed scores to be as close to their corresponding true scores as possible.
- should be highly correlated with true scores

error score
- Error scores should have a mean of zero.
- Effectively, error cancels itself out across cases.
- Error scores should be a random process.
- Error scores should be uncorrelated with true scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 ways to think of reliability

A
  1. the ratio of true score variance to observed score variance
  2. squared correlation between observed scores and true scores (also called reliability index)
  3. lack of error variance
  4. lack of correlation between observed scores and error scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

parallel tests
alternate forms reliability

A
  • two tests that are psychometrically identical but the items are different
  • all methods of reliability are based on this notion
  • assumes tau-equivalence: true scores of each test measure the same construct
  • assumes equal error variances between the two tests
  • synonymous with parallel forms reliability
  • It is often argued that alternative forms reliability is effectively impossible in practice because:
    1. We can never be sure that the true scores associated with two tests are in fact measuring exactly the same construct.
    2. The two tests are not based on the same items.
    3. carry-over effects: responding diff to taking the test for the second time, contaminates data
  • can counter balance to try combat these limitations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

interpreting reliability

A

.60 Unacceptably low.

.70 Minimum for beginning stage research

.80 Good level for research purposes

.90+ Necessary in applied contexts where important decisions are made about individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test -retest reliability standards

A

if you run a pearson correlation matrix on them time 1 and time 2
Cicchetti (1994) recommended the following:
Fair: .40 to .59
Good: .60 to .74
Excellent: .75 and above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Test-Retest reliability
interval

A
  • also called ‘stability coefficient’
  • create only one test, but administer it on two different occasions.
  • the same items are presented on both occasions, so the true scores should represent the same construct. (tau equivalence)
  • assume the construct is stable eg not mood, it changes frequently
  • assumes equal error variance, very hard, soo many variables
  • some people don’t respect this as a measurement error estimation and think of it as a stability coefficient.
  • on average smaller than ICR
  • Test using a ‘T-test’ = test the DIFFERENCE between the means (of time 1 and time 2) in a SS way (there should be no difference if the test is R)

interval
- the magnitude of the interval between the two testing sessions will affect the magnitude of the correlation between the scores.
- Most test-retest studies use a 2 to 8-week interval.
- not very practical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

internal-consistency reliability
factors that affect ICR

A
  • respondents complete one form of the test once.
  • testers treat different parts of the same test as different forms of the test.
  • the most common method for reliability
  • correlation between two parts is an indication of reliability
  • tests have high internal consistency but also very NARROW in breadth e.g. asking same question repeatedly
  • ICR is LARGER than All Validity coefficients e.g. ICR should be larger than criterion validity

factors that affect ICR
1. the longer the test, the more reliable the scores
2. degree of consistency between items/parts in the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Split half-method
Spearman brown formula

A
  • is a type of ICR
  • split the test in halves using eg odd/even or first half second half
  • get a different ICR depending on how you specify the split halves
  • Correlation really just a representation of half the test  Need to adjust to reflect the whole length using spearman brown
  • used to make an adjustment to the internal consistency reliability estimate to reflect the actual length of the test
  • only for split half method
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cronbach’s alpha, why and what is it?

A
  • the problem with split half is that you get different ICR depending on how you specify the two halves
  • Cronbachs represents the reliability of all possible split halves.
  • It’s a method of ICR estimation based on the item level (each item is considered a unique portion of the test)
  • the ratio of true score variance to total score variance
  • Assumes unidimensionality + loadings are equal
  • the product of the number of items squared and the mean inter-item covariance divided by the sum of the square variance/covariance matrix
  • formula represents the ratio of true score variance to the total variance, which is how reliability has been defined at the theoretical level.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standardised Cronbachs (general spearman brown)

A
  • spearman brown but with more than two halves
  • Cronbachs but using r instead of covariances
  • Applied to converted RAW score –> STANDARDISED score
  • in practice not common
How well did you know this?
1
Not at all
2
3
4
5
Perfectly