Week 11 and 12: Reliability and Validity Flashcards
Define reliability
Consistency in measurement
List 3 ways that consistency of scores occurs when re-examining the same people
- the same test on different occasions
- different set of items measuring the same thing
- different conditions of testing
What is standard error of measurement?
An estimate of the amount of error usually attached to an examinee’s obtained score
What is a confidence interval
Confidence that you have that the population mean is within that interval
What are some sources of random error?
- test construction
- test administration
- test scoring and interpretation
- test construction error
List the ways of testing reliability
- cronbach’s alpha
- test retest
- split half
- item total correlations
How big should a reliability coefficient be?
Above .8, preferably .9
What does cronbach’s alpha measure
A set of all possible correlations between test items
What is split half reliability
Taking half the items and seeing how they correlate with the other half
What are item total correlations
Getting the item and comparing it to the rest of the scale
What is test-retest reliability
- correlation between two testing intervals
- stability over time
- uses Pearson’s r
What are some problems with test-retest reliability
- affected by factors associated with how the test is administered on each occasion
- carryover effect: remember answer, practice effect
- should only be used for meaningful data
Internal consistency
The correlations between different items on the same test, or with the entire test
Kuder-richardson reliability and coefficient alpha
- based on the intercorrelations among all comparable parts f the test
Kuder-richardson formula 20
- calculated by the proportion of people who pass and fail each item and the variance of the test scores
Inter-rater reliability
- agreement through multiple raters
- measured using a kappa statistic
Kappa statistic
Measures inter rater agreement for qualitative (categorical) items
Parallel-forms reliability
Equivalent forms of the same test are administered to the same group
Types of reliability
- inter-rater
- test-retest
- split half
- parallel forms
Validity
The extent to which a test measures what it is supposed to measure
What are the three types of validity
- content
- criterion related
- construct
Content validity
Degree to which content (items) represents behaviour/characteristics associated that trait
What are the two types of criterion validity
Predictive and concurrent
What is criterion validity
The relationship between test scores and some type of criterion or outcome, such as ratings, classifications or other test scores
Concurrent validity
Refers to whether the test scores are related to some CURRENTLY AVAILABLE criterion measure
Predictive validity
The correlation between a test and criterion obtained at a FUTURE time e.g. ATAR scores predicting success at uni
Validity coefficient
Correlation between test scores and some criterion
What are the two types of construct validity?
Convergent and discriminant
Construct validity
The extent to which a test measures a psychological construct or trait
Convergent validity
Convergent validity takes two measures that are supposed to be measuring the same construct and shows that they are related.
Discriminant validity
Discriminant validity shows that two measures that are not supposed to be related are in fact, unrelated.
List the types of reliability
- test-retest
- internal
- interrater
In test-retest reliability, what are some sources that might affect a result?
- time
- place
- mood
- temperature
- noise
What are some core issues with content validity?
- the appropriateness of the questions and domain relevance
- comprehensiveness
- level of mastery assessed
What are some procedures to ensure content validity?
- specialist panels to map content domain
- accurate test specifications
- communication of validation procedures in test manual
What are some applications of content validity?
- achievement and occupational tests
- usually not appropriate for personality or aptitude tests
What is standard error
The population level of standard deviation
Do we want small or large SEM
Small, because larger lowers reliability and increases confidence intervals
Which confidence level is most common?
z = 1.96 (95%)
Why are confidence intervals better than p-values
- p value is a random arbitrary number
- p values are biased towards high samples
- p-values don’t pick up on small effects that reoccur consistently
When is a confidence interval result significant
When the confidence interval doesn’t overlap 0
Why are effect sizes beneficial?
They address significant affects that don’t mean much in real life e.g. does someone .5 higher on depression really have a worse time
How can test scoring and interpretation be a source of random error (reliability)?
Because projective tests are all answered differently, there is a large role for inter rater disagreement e.g. TAT, rorschach
What is the domain sampling model?
Test items represent a sample of all possible items
What is the reliability ratio
Variance of observed score on test divided by variance of true score on long test
How many items should you have for optimal reliability
10
List some examples of concurrent validity
- depression scale and clinical interview
- 2 measures at a similar time
- IQ and exam scores
To be concurrently valid what kind of assessments should the measure be correlated with
The gold standard
What kind of test do you use for predictive validity
Multivariate ANOVA
What test do you use for convergent validity
Factor analysis
The lower reliability, the…
Higher the error in a test
The larger the Standard error of measurement, the
Less precise measurements and larger confidence intervals