Week 10 Reliability & Validity of Measures Flashcards
Still part of PCA & Factor Analysis but separated it for ease of learning
What are the types of reliability?
*Test-retest reliability -
*Internal consistency reliability
- Split-half reliability
(use Coefficient alpha or Cronbach’s alpha)
- Inter-rater reliability (use Cohen’s Kappa)
What types of validity are there?
Face Validity Criterion Validity Construct Validity Content Validity External versus Internal Validity Convergent versus Discriminant Validity
How do we measure Test-Retest Reliability?
Test-retest measures temporal reliability.
*This can be obtained by checking the consistency of measurement on the value on at least 2 occasions of at least .70 correlations
Tell me more about Test-Retest Reliability
- “Stability” is measured using Test-Retest Reliability. *Assessed by having an instrument (measure) completed by the same people during two different time periods
- Strong positive correlations (r’s) between the repeated test times are sought.
- Difficulties may arise because of familiarity with the test items when completed on subsequent occasions – best to do this at least two weeks apart. Then “alternate form reliability” may be used with two reliability of the two sets of scores
How do we measure Internal consistency reliability?
Internal consistency refers to the extent to which items in a measuring instrument or scale are all measuring the same thing.
- It requires only one administration.
- Internal consistency is usually reported with Cronbach’s Alpha.
Tell me more about Internal Consistency Reliability
- How well items “hang together” Also called reliability of components, it is the degree of relatedness of individual items on a test.
- Split-half reliability – Spearman-Brown formula is frequently used as a measure of split-half reliability. Average inter-correlation of all items, the mean of item-to-item Pearson (r) correlation
- Coefficient alpha or Cronbach’s alpha corresponds to the mean of all the split-half coefficients resulting from different splittings of a test
How do we test inter-rater reliability?
Probable exam Question
- Cohen’s Kappa tests Inter-Rater Reliability
- Undertaken using a cross-tabulations of scores where the value of Cohen’s Kappa is produced to evaluate 2 raters.
- The output identifies where the two raters agree and gives the percentage of agreement on each scale rated.
- Cohen’s Kappa is evaluated and “ideally .70 and above is required but in any clinical context it should be .8 or above” (Hills, 2008, p.288), as well as significance levels of whether differences emerge between the raters.
What is the relationship between Reliability and External Validity?
- External validity is the generalizability of causal relationships across people, settings, manipulations (not just time).
- e.g. is learning and cognition for college students the same for the general population? or is a Biomedical experiment on male volunteers transferable to women or non-volunteers?
- Standard treatment of phenomenon – is it the same for similar treatments?
What is Face Validity?
Face Validity asks if the test on the surface (face) measures what it purports to measure
What is Content Validity?
Content Validity evaluates whether there is adequate sampling of relevant material on the test that it purports to measure, that is does the test represent the kind of material it is measuring, for example, IQ.
What is Criterion Validity?
Criterion Validity is based on the degree to which the test is correlated to the expected outcome criteria – statistical aspects of the test.
*Researchers choose the most sensitive and relevant present criterion (Concurrent Validity) or future (Predictive Validity) on which the test is based upon and correlate the performance of the test or questionnaire with that criterion
What is Construct Validity?
The degree to which the test assesses what purports to measure. This can be seen as either Convergent Validity or Discriminant Validity.
What is Convergent Construct Validity?
Convergent Construct Validity
- Convergence of related tests or behaviour
- Scale on attitudes toward Presidents related to scales on attitudes toward other politicians, like judges and governors
What is Discriminant Construct Validity?
Discriminant Construct Validity
- Distinctiveness of unrelated tests or behaviour
- Scale on attitudes toward Presidents unrelated to scales on attitudes toward older men (even though all previous Presidents have been older men)
What are the 4 types of validity in experimental design?
- External Validity (“Generalizability”)
- Construct Validity (Scale measures, what it says it measures)
- Statistical-Conclusion Validity (Accuracy of drawing certain statistical conclusions)
- Internal Validity