Module 6: Reliability and Validity Flashcards
Reliability
Reliability refers to the consistency or stability of a measuring instrument. In other words, the measuring instrument must measure exactly the same way every time it is used.
Systematic errors
Problems that stem from the experimenter and the testing situation.
Trait errors
Problems that stem from the participants. Were they truthfull, did they feel well?
True score
The true score is what the score on the measuring instrument would be if there were no error.
Error score
The error score is any measurement error (systematic or trait).
Observed score
The score recorded for a participant on the measuring instrument used.
Conceptual formula for observed score
Observed score = True score + Error score
Random errors
Errors in measurement that lead to measurable values being inconsistent when repeated measurements of a constant attribute or quantity are taken
Conceptual formula for reliability
Reliability = True score / (true score + error score)
Correlation coefficients
A correlation coefficient measures the degree of relationship between two sets of scores and can vary between -1.00 and +1.00. The stronger the relationship between the variables, the closer the coefficient is to either 1.00 or +1.00.
Positive correlation
A positive correlation indicates a direct relationship between variables: When we see high scores on one variable, we tend to see high scores on the other
Graph going from left bottom to right top
Negative correlation
A negative correlation indicates an inverse, or negative, relationship: High scores on one variable go with low scores on the other and vice versa.
Graph going from left top to right bottom
Rules-of-thumb correlation coefficient
.29 = none to weak .30-.69 = moderate .70-1.00 = strong
Types of reliability
- Test/retest
- Alternate forms
- Split-half
- Interrater reliability
Test/retest reliability
One of the most often used and obvious ways of establishing reliability is to repeat the same test on a second occasion. The correlation coefficient needs to be high on both tests for the reliability to increase as well.
Practice effects
Some people get better at the second testing, and this practice lowers the observed correlation
Alternate form reliability
using alternate forms of the testing instrument and correlating the performance of individuals on the two different forms
Split-half reliability
where you give one group one half of the test and the other group the other half.
Interrater reliability
Here you test how consistent the assessment of two or more raters or judges are
Con: you need to test the reliability of the interraters between them.
Conceptual formula interrater reliability
Interrater reliability = Number of agreements / number of possible agreements x 100
Cronbach’s alpha
A measure of the internal consistency as a kind of average correlation between the items
(I.e. measuring one participant in group 1 and comparing that to another participant from group 1 to see if they match up)
Rules-of-thumb Cronbach’s alpha
- > .80: reliability = good.
- .60 - .80: reliability = sufficient.
- Less than .60 = insufficient
Validity
Validity refers to whether a measuring instrument measures what it claims to measure. The extent to which the observations reflect what we want to measure, i.e. the extent to which the observation reflects the concept or construct under investigation.
Differences validity and reliability
- Reliability refers to observations (scores)
- Validity refers to conclusions based on observations.
- Reliability concerns random measurement error.
- Validity issues have to do with systematic error. E.g. our policemen are only 90 meters apart instead of 100, So, the observation does not reflect what we want to measure.
E.g. John scores higher on the IQ-test than Peter. Reliability: are we sure their true scores are different? Validity: Is John more intelligent than Peter?
Statistically significant
What is important for validity coefficients is that they are statistically significant at the .05 or .01 level (with p-value)
7 types of validity
- Content validity
- Face validity
- Criterion validity
- Concurrent validity
- Predictive validity
- Construct validity
- Statistical Conclusion validity
- Internal validity
- External validity
- Population validity
- Ecological validity
Content validity
Looks at the content of tests. Does it cover a representative sample of the domain you are researching?
Face validity
Face validity is whether or not a test looks valid on its surface (not the content!). Does the operationalization appear to be valid on it’s surface?
Criterion validity
Criterion validity measures how accurately an instrument the behavior or ability predicts. There are two types of criterion validity.
• Concurrent validity is used to estimate present performance. Is the test for bipolar disorder good at distinguishing people with and without depression?
• Predictive validity is used to estimate future performance. Is the personality test a good predictor for study success?
Construct validity
Assesses the extent to which a measuring instrument accurately measures a theoretical construct or trait that it is designed to measure.
Some examples of theoretical constructs or traits are verbal fluency, neuroticism, depression, anxiety, intelligence, and scholastic aptitude. The conclusion that were made, can they be concluded from the research they did?
(Statistical) Conclusion validity
Do the observations allow for the conclusion that variables are related?
Internal validity
Does the operationalization allow for the conclusion that variables are causally related?
External validity
Extent of generalizability of the conclusions
• Population validity: does the sample allow for conclusions about the target population?
• Ecological validity: does the procedure followed in the study allow for conclusions about more natural circumstances?
Cons of test/retest-reliability
Con:
- Practice effects
- Individuals may remember how they answered previously, both correctly and incorrectly. In this case we may be testing their memories and not the reliability of the testing instrument
Cons alternate form reliability
Con:
- Difficult to make them parallel, same number of items, difficulty, etc.
- Practice effects (not as much as test/retest)
Con split-half reliability
Con:
- Helps with reliability of itself, but not over time (the test is not being done twice in its entirety).
- Difficult to divide the items equally