Lecture 4.2 Reliability Flashcards
Reliability
• The consistency with which a test measures what it purports to measure in any given set of circumstances
True
True or False
A reliable test will result in the same score every time it is used to measure the same thing under the same conditions
Reliability coefficient
An index of reliability that indicates the ratio between the true score variance on a test and the total variance (SD2)
> .90
Reliability coefficient of _______ is excellent for research purposes, appropriate for individual assessment purposes
> .80
Reliability coefficient of _______s good for research purposes, marginal for individual assessment
Reliability coefficient
- higher scores = higher reliability
- > .6 is marginal for research purposes
- > .70 is adequate for research purposes
Classic Test Theory
assumes that each person has an innate true score. It can be summed up with an equation:
X = T + E,
Real score is true score plus error
more reliable
higher proportion of true variance =
less reliable
higher proportion of error variance
increase or decrease
error variance may______________ or _________________ a test score by varying amounts –leading to lower reliability
Systematic error and unsystematic
Two types of testing error
Systematic error
Testing error that doesn’t affect reliability. Consistent error, predictable (when aware) – leaking tyre
Unsystematic error
Testing error that effects reliability. Inconsistent, unpredictable – electrical problem
Test construction
Sources of Error Variance T_______ C_______
The content covered by test items, the way questions are asked, and the response format all add to the error variance of a test
Test administration
Sources of Error Variance T_______ A_______
• Test environment (including test materials), test-taker variables (e.g., alertness, wellbeing, mistakes) & administrator-related variables (e.g., presence or absence, demeanour, departure from procedure, unconscious cues, etc.)
Test scoring & interpretation
Sources of Error Variance T_______ s _______ a ________
Human error - data entry, transcription, coding, calculation, timing, etc.
Level of objectivity/subjectivity
Human fallibility
Sources of Error Variance h _______ f _________.
• Forgetting or misremembering
• Failing to notice or not being aware
• Not understanding or following instructions
• Under- and over-reporting
• Differences of opinion
• Lying or misleading
Time and practice effects
Sources of Error Variance ti_________ and pr______eff________.
Domain Sampling Model
This model assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items. Error that occurs in the development of a test.
Domain Sampling Model
• Seeks to determine how precisely the test score assesses the domain from which the test draws a sample
True score
The score you would get if you answered all the items that could be conceivable.
Standard Error of Measurement (SEM)
• Measures the precision of an observed score & provides an estimate of the amount of error inherent in an observed score or measurement
Standard Error of Difference (SED)
Can be used to compare:
• an individual’s scores on two different tests
• two different people’s scores on the same test
• two different people’s scores on two different tests
Test-Retest Reliability
- Calculated by correlating scores from the same people on two different administrations of the same test
- Used for measuring characteristics that are thought to be stable (e.g. personality traits or intelligence)
amount of time between administrations
Any interventions, treatment or trauma, taking place between test administrations;
Test-retest reliability will be affected by
Parallel & Alternate Forms Reliability
Different versions of a test, matched for content and difficulty
Split-Half Reliability
Scores from one half of a test are correlated with the other half of the test, using equivalent halves
• Random, odds & evens, content & difficulty
Inter-Rater Reliability
The degree of agreement between two or more scorers. Reduced by appropriate training.
Test-retest
correlate scores from 2 administrations of the same test
Parallel forms
correlate scores from 2 versions of the same test
Split-half
correlate scores from 2 equivalent halves of the same test
Internal consistency
correlate items within the same test
Inter-rater
correlate scores from 2 scorers for one test taker
reliability coefficients
Indicates the ratio between the true score variance on a test and the total variance
Range from 0 to 1: closer to 1, the higher the reliability
Homogenous
__________________ test unifactorial, so consist of items measuring a single trait or factor
Heterogenous
________________ test is multifactorial, so measure more than one trait or factor
static
a characteristic, trait, or ability that is presumed to be relatively unchanging
dynamic
a characteristic, state, or ability that is presumed to be ever changing as a function of situational and cognitive experiences
Restricted range or variance
sampling procedure used to gather the test scores does not result in a full spread of scores (e.g., having only university students complete an IQ test)
Inflated range or variance
when the sample includes people who are outside of the range of the test so the scoring range is inflated (e.g., adults completing a test designed for children)
speed test
all items of equal difficulty, and time limited so that no-one is likely to be able to answer all items
power test
time limit is long enough for all items to be attempted, but some items are so difficult that no-one is likely to get them all right
Criterion-Referenced
Designed to provide an indication of where a test taker stands with respect to some criterion (i.e., pass/fail type tests)
Validity
The extent to which evidence supports the meaning and use of a psychological test (or other assessment device)
The validity coefficient
A correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure
Validity
How well a test or measurement tool measures what it purports to measure in a particular context
Classic (trinitarian) Model
focuses on three categories of validity
Content Validity
Type of validity - scrutinizing the test’s content
Criterion-related validity
Type of validity - relating scores obtained on the test to other test scores or other measures
Construct validity
Type of validity - ‘umbrella validity’; comprehensive analysis of how test scores relate to scores on other tests/measures & how test scores relate to the construct that the test was designed to measure
Unitary Model of validity
_____________ view takes everything into account, from implications of test scores in terms of societal values to the consequences of use
Test validation
- The process of gathering and evaluating validity evidence.
- Test developer is responsible for supplying validity info in the test manual and/or through a ‘test validation’ journal article
Content Validity
• Describes a judgement of how adequately a test samples behaviour representative of the universe of behaviour that the test was designed to sample
Face Validity
Type of content validity
A judgement concerning how relevant the test items appear to be to the test-taker
Quantifying content validity
Important in employment settings, where tests are used to hire & promote
• Tests must be shown to include relevant items in terms of job skills required for the position
• Lawshe (1975):
• Is the skill or knowledge measured by this item: 1) Essential; 2) Useful but not essential; 3) Not necessary to the performance of the job?
Culture
C____________ has an impact on judgements concerning the validity of tests and test items
Criterion-Related Validity
C __________ R________ V __________
A judgement of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest – the measure of interest being the criterion
criterion
A _____________ is the standard against which a test or test score is evaluated -can be almost anything:
- RELEVANT
- VALID
- UNCONTAMINATED
A criterion should be:
- R___________ – pertinent or applicable to the matter at hand
- V___________ for the purpose for which it is being used
- U____________ – not based on predictor measures
Predictive Validity
P ______________ V ______________ is the degree to which a test score predicts a criterion measure at a future time
Concurrent Validity
C___________ v_________ is the degree to which a test score is related to a criterion measure that is obtained at (about) the same time
Incremental Validity
I___________ V__________
The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
False negatives
test takers predicted not to show characteristic but do
False positives
test takers predicted to show characteristic but don’t
Miss rate
M_____ r_______the proportion of people incorrectly classified
Hit rate
H________ r_______the proportion of people correctly identified
Base rate
B______ r________ the extent to which a particular trait, behaviour, characteristic or attribute exists in the population
Construct validity
C_________ v___________
A judgement about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct
Homogeneity of items Changes with age Pre-test to post-test changes Group differences Convergent evidence Divergent evidence Factor analysis
Evidence of construct validity H\_\_\_\_\_\_\_\_\_\_\_\_\_ of items Changes with a\_\_\_\_ Pre-test to p\_\_\_\_\_\_\_\_\_\_\_\_\_changes G\_\_\_\_\_\_\_\_ differences C\_\_\_\_\_\_\_\_\_\_ evidence D\_\_\_\_\_\_\_\_\_\_ evidence F\_\_\_\_\_\_\_\_\_ analysis
Evidence of homogeneity
E__________ of h___________ - How uniform the test is in measuring a single concept
Evidence of changes with age
Some constructs are expected to change with age, particularly during childhood/adolescence
Evidence of pre-test/post-test changes
Evidence that scores change as the result of some experience between a pre-test and a post-test can be evidence of construct validity
Evidence from distinct groups
Demonstrating that scores on the test vary in a predictable way as a function of membership in some group
Convergent evidence
When test scores on a new test are found to correlate highly in the predicted direction with scores on a older, more established and validated test designed to measure the same construct
Discriminant evidence
Shown when test scores are found to have little or no relationship with test scores or variables for which theoretically there should be no relationship
Factor Analysis
Can be used to determine both convergent and discriminant evidence of construct validity
Confirmatory Factor Analysis
A factor structure is explicitly hypothesised and is tested for its fit with the observed covariance structure of the measured variables
Exploratory Factor Analysis
Estimating or extracting factors, deciding how many factors to retain, rotating factors to an interpretable orientation