Reliability and coefficient alpha Flashcards
What is reliability?
Reliability is the desired consistency or reproducibility of test scores?
- A measure of the extent of error present in a test.
Do we assume there is always some error in measurement?
Yes and that error is random.
What do we assume leads to the differences in a person’s score on a test?
Measurement error. It is unlikely that a person’s true score will change every time they take a test.
What do we expect the distribution of scores to be for a test?
Normal distribution.
What four assumptions underlie classical test theory?
- Each person has a true score we could obtain if there was no measurement error.
- There is measurement error, but this error is random.
- The true score of an individual doesn’t change with repeated applications of the same test even though their observed score does.
- The distribution of random errors (and thus observed test scores) will be the same for all ages.
What is the domain sampling model?
- It is another central concept of classical test theory.
- If we construct a test on something, we can’t ask all possible questions, so we only use a few test items (a sample)
- Using fewer test items can lead to the introduction of error. We need to determine whether the test items adequately sample the domain or construct.
What is the point of reliability analysis?
Reliability analysis is conducted to ascertain how much error we would make by using a score from a shorter test as an estimate of someone’s true ability.
What are three things to note regarding reliability analysis?
- Reliability= variance of observed score on short test/variance of true score.
- Observed test scores should be correlated with true score.
-As the sample gets larger, estimate is more accurate - It is easy to work out reliability if we have the true score.
What can affect reliability measurements?
- Different ways of measuring reliability are sensitive to measurement error.
- We consider various sources of measurement error.
What is “Standard Error of Measurement”?
- We can workout how much measurement error we have by working out how much, on average, an observed score on our test differs from the true score.
- We know that a person’s observed score differs from their true score, and that their true score is unknowable. But we can calculate the range in which a person’s true score should fall by calculating the Standard Error of Measurement.
What is the formula for standard error of measurement?
SEM= SD(sqrt)(1-r)
- SD of the scores
- r is the reliability of the test.
What do we do once we know the SEM?
- We can use it to create confidence intervals.
- The z-score for a 95% confidence interval= 1.96
- Lower bound= x-(1.96*SEM)
- Upper bound= x+(1.96*SEM)
where x is the person’s score on the test.
What are the different types of reliability?
- Test-retest reliability
- Parallel forms reliability
- Internal consistency (split-half reliability, Kuder-Richardson 20 reliability, coefficient/Cronbach’s alpha)
- Inter-rater reliability
What is test-retest reliability?
- The simplest way to establish reliability is to administer the test or
scale to a sample on two different occasions. If the scale is reliable,
the scores at the test and retest administration should be strongly
correlated. - The correlation between the 2 scores is also known as the coefficient of stability.
- The source of error measured is time sampling.
What are the issues with test-retest reliability?
- What is the optimal length of time that should elapse between the administrations? If it is too soon, the participants may recall their answers from the first administration. If left too long, extraneous
events may influence the scores on the scale. - There are issues around using it when measuring things that are more transient like mood.
- ## What if some event happens in between first and second administration?
What is parallel form reliability?
Alternate-forms reliability requires the construction of two equivalent versions of the same test, which have items that are closely matched. Then the two forms are administered to the same set of people either at different times or at the same time.
- The correlation between the two forms is known as the coefficient of equivalence
- The source of error measure is item sampling.