Week 3 - Reliability and Validity Flashcards
Classical Test Theory
Test scores are result of
- Factors that contribute to consistency
- Factors that contribute to inconsistency (characteristics of test takers, things that have nothing to do with attribute such as situation, environment)
X = T + e
X = obtained score T = true score e = errors of measurement
Sources of Error
- Item selection
- Test administration
- Test scoring
- Systematic measurement error
Domain-sampling mode
a way of thinking that sees the test as a representative sample of a large domain of possible items that could be included on the test
- considers the problem of using only a sample of items to represent a construct
- as test gets longer, should represent construct better, increase reliability
Inter-rater reliability
the extent to which different raters agree in their assessments
Method variance
the variability among scores that arises because of the form as distinct from the content of the test - the method of administering the test
Reliability
the consistency that a test will give the same result each time it is used to measure the same thing
Stability over time
the extent to which test scores remain stable when a test is administered on more than one occasion
Internal consistency
the extent to which a psychological test is homogeneous or heterogeneous
Social desirability bias
a form of method variance that arises when people respond to questions that place them in a favourable or unfavourable light
Test-Retest Stability
The same test administered to the same group twice at different points
- may not get identical scores due to practice effects, maturation, treatment effects or setting
Parallel or alternate forms of reliability
Two forms of same test developed, different items selected according to the same rules
Parallel - same distribution of scores (mean and variance equal)
Alternate - different distribution of scores (mean and variance may not be equal)
Both matched for content and difficulty
Split half method
Test is divided into halves that are compared
- useful in overcoming logistical difficulties of test-retest reliability
Measuring Internal consistency
Cronbach’s Alpha
Cronbach’s alpha - a generalised reliability coefficient for scoring systems that are graded (i.e agree - disagree)
Acceptable levels of reliability
- .70-.80 acceptable or good
- greater than .91 may indicate redundancy
Standard error of measurement (SEM)
allows estimation of precision of an individual test score
- the larger the SEM, the less certain we are that the test score represents the true score
Reliability coefficient (r) - an index of the ratio of true score to error score variance in a test - SEM = (1-r)
Validity
the extent to which a test measures the construct it is intended to measure
- inferences from test must be appropriate, meaningful, and useful
Face validity
does the test look like it measures the relevant construct
Content validity
the extend to which items on a test represent the universe of behaviour the test was designed to measure
- logical deduction rather than strict analysis
Construct underrepresentation
failure to capture important components of a construct
Construct-irrelevant variance
measuring things other than the construct of interest
Criterion related validity
the extent to which a measure is related to an outcome
Good criterions share - reliable, appropriate
relationship between test and criterion usually expressed as correlation
Predictive evidence
criterion related validity
how well the test predicts performance on a criterion
Concurrent evidence
criterion related validity
refers to a comparison between the measure in question and an outcome assessed at the same time
Incremental validity
the extent to which knowledge of a score adds to that obtained by a pre-existing test score or psychological characteristic
Construct validity
concerned with establishing how well a test measures a psychological construct
Convergent evidence (construct validity)
refers to the degree that two constructs which should be related, are related
- identify relationships we would expect if the test is actually measuring the construct
Discriminant (divergent) evidence
construct validity
aims to demonstrate that the test is unique
- low correlations should be observed with constructs that are unrelated to what the test is trying to measure
Factor analysis (construct validity)
- to observe patterns
- items may cluster, which is attributed to action of latent/unobserved variables or factors
- Exploratory factor analysis
- Confirmatory factor analysis
Decision-theoretic approach to predictive validity
includes
- cutting point
- valid positive and negative decisions
- false positive and negative decisions
- base rate
- selection ratio
Cutting point
the test score or point on the scale that is used to split those being assessed into two groups predicted to show and not show the studied behaviour
Valid positive and negative decisions
Positive - where the person is predicted to show the behaviour and shows it
Negative - where the person is not predicted to show the behaviour and does show it
False positive and negative decisions
Positive - the prediction is that the person has the characteristic but does not
Negative - the prediction is that the person does not have the characteristic but does
Base rate
the proportion of individuals in the population show show the behaviour of interest
Selection ratio
the proportion of those assessed who can be allocated to the category of showing the behaviour