Lecture 3 - Reliability Flashcards
Define reliability.
The extent to which a measurement tool gives consistent measurements.
What is reliability?
Consistency in measurement.
What is Classical Test Theory?
The concept that any actual/observed score is a combination of an individual’s true score and measurement error.
Classical Test Theory is the traditional conceptual basis of psychometrics. T/F
True
What is True Score Theory?
Another name for classical test theory.
What is a true score?
The aspect of what we want to measure, e.g. the underlying behaviour or trait that is captured by our measurement (Real intelligence or real level of extroversion)
What is measurement error?
Everything captured within our observed score that isn’t what we wanted to measure.
If a whole egg was your observed score, what is the true score and measurement error?
Egg yolk - true score (e.g. middle of the brain, measuring intelligence, ability etc.)
Egg white - measurement error
Observed score is not fallible. T/F
False. It is prone to make errors.
True score is an ideal measurement (perfect and consistent) and constant for an individual. T/F
True.
Errors of measurement is random and unrelated to the true score. T/F
It can easily be eliminated. T/F
True.
False. Cannot be eliminated entirely.
How can the classical test theory (X=T+E) be described in terms of variation between people or more specifically, with variance?
Total Variation (X) = True Variation (T) (systematic) + Error Variation (E) (unsystematic)
What is reliability in terms of the relationship between true and total variance?
Reliability is the proportion of the true score variance divided by the total score variance.
Variance is a way of measuring variation, and it is standard deviation squared. T/F
True.
Why do we describe classical test theory in terms of variance rather than standard deviations?
Variance is additive, standard deviation is not.
Give an example of X=T+E in terms of driving.
Total variation - variation of scores from questionnaires measuring low and high speeders.
True variation - peoples actual speeding behaviours
Error variation - errors due to questionnaire not reflecting what people’s choices actually are.
True variance.
Hypothetical variation of test scores in a sample if there is no measurement error.
Total variance.
Actual variation in data, including error variation.
Lower measurement error = Higher reliability
Higher measurement error = Lower reliability
T/F
True.
If a person took the same test multiple times, but their scores were more spread out. Would this be considered low or high reliability? Why?
Low reliability, because their scores were inconsistent.
Describe the various sources of measurement error.
Test construction (e.g. item sampling/content sampling. Cannot ask every single detailed piece of content - some people may know certain answers that is part of the subset of questions included in an exam by luck)
Test administration (eg. whether/not there were any distracting noises when the test was administered)
Test scoring (whether markers are more stringent, biased examiners)
Other influences: motivation, self-efficacy, etc.
What is item sampling/content sampling?
Sample of items from the content of a whole construct of assessment that is included in a certain test or measure.
Why can we only estimate the reliability of a test and not measure it directly?
Because true variance is hypothetical and cannot be measured directly - therefore we can only infer reliability.
Four methods available to help ESTIMATE reliability of a test.
Internal consistency; test-retest; alternate/parallel forms; inter-rater reliability.
How much the item scores in a test correlate with one another on average (e.g. Cronbach’s alpha, KR-2-)
Internal consistency.
If a test involves an examiner making a rating - get two of them to do the rating independently and see how much their ratings correlate.
Inter-rater reliability.
If people sit the same test twice, how much do their scores correlate between the two sitings.
Test-retest reliability.
If people do two different versions of the same test, how much do their scores on the two versions correlate.
Alternate-forms reliability.
Internal consistency.
Conceptually, this is the average correlation between the items on your scale. If all items on questionnaire are measuring the same thing, do individuals give consistent responses?
What are alternative names for internal consistency?
Inter-item consistency
Internal coherence
High internal consistency.
Scores from the items in a questionnaire that measure the same thing are consistent.
Low internal consistency.
Scores are inconsistent. Unreliable test.
What is Cronbach’s alpha a measure of?
Internal consistency.
When should you use Cronbach’s alpha?
When there is more than two possible outcomes to a question.
How do you calculate Cronbach’s alpha in SPSS?
Select Analyze; Scale; Reliability Analysis; select model ‘Alpha’
Select all items in your scale.
Click OK
Describe the steps involved in calculating Cronbach’s alpha by hand.
- Split questionnaire in half.
- Calculate total score for each half.
- Compute bivariate correlation between total scores for each half
- Repeat with every possible split-halves of the questionnaire
- Work out the average of all split-half correlations
- Adjust the correlation using the Spearman-Brown formula.
What does the Spearman-Brown formula achieve in regards to measuring Cronbach’s alpha?
Since the questionnaire has been split it will reduce the reliability of the questionnaire because your reducing the number of sample points. There is a relationship between the number of items and reliability. So spearman-Brown formula corrects this.
What is KR-20?
Kuder-Richardson 20. A measure of internal consistency used when the answers are dichotomous (eg. true/false, yes/no, correct/incorrect)
In the examination, the examination is multiple choice, and that multiple choice has four possible responses. If I want to work out the internal consistency of the examination should I use Cronbach’s alpha or Kuder-Richardson 20 formula?
KR-20. Even though you’ve got four things to choose between, there are only two ways it can go. You can either get it right or wrong (two outcomes).