Standardized Assessment Tools: Reliability and Validity Flashcards
What is a standardized assessment tool?
A tool used for the purpose of assessment that has been developed following a careful standardisation procedure
- Specific instructions for how to administer, how to apply and how to interpret.
What is reliability?
Related to the consistency or repeatability of your measures; degree to which observed scores are free of measurement error
What is validity?
Related to the soundness of the relevance of the interpretation of the measures
- the appropriateness, meaningfulness and usefulness of the specific inferences made from the test scores
- do the scores actually inform us about what we want to know
What is the classical true score theory of measurement?
Observed score (measurement) = true score + error X = T +e
Every measurement has error in it
You cannot measure either the true score or error
Smaller error = more certain of true score
What are assumptions about the classical true score theory of measurement?
The true score is real and the observed score in itself cannot be trusted ( must consider the error present)
If the error is small, a large proportion of the observed scores is accounted for by the true score
Need to estimate both true score and error
What is the true score?
considered the average of all the observed scores that you collect
What is error?
Synonymous with variability or variance
Computed as the difference between the observed and the true score
What is variance?
Sum of (x-x(average)^2/(n-1)
What is random error?
Caused by any factor that randomly affects measurement of a variable across a sample.
- adds variability to the data but does not affect the average score
- amount of sleep prior to midterm
What is systematic error?
Caused by factors that systematically affect measurement of the variable across the sample
- do not change the variability of the data but does affect the average score
- party the night before the exam that everyone went to
What is test-retest reliability?
The same test is administered to the same sample (or a very similar) on two different occasions
Investigating the consistency of the repeated measures over time
Assumption: that there is no substantial change ing the construct being measured between the two occasions
What is inter-rater reliability?
Interested in the consistency of measurements taken by different people(rates)
Interested in the consistency of the interpretation of the measurement done by different people
What is intra-rate reliability?
Interested in the consistency of measurements taken by one person at different times
How consistent you are as a rater
What is alternate form of reliability?
Parallel forms
Allows you to determine if the scores from two different versions of the same test are equivalent
Strong relationship between the scores from the different instruments = increases the reliability of the scores from each instrument
What is internal consistency reliability?
Reliability is assessed by estimating how well the items that represent the same construct yields similar results
Consistency of results for different items representing the same construct
What are common problems with reporting reliability?
Reliability is based on previous studies (not actually measured in the study you are reading)
Time between test and retest sometimes not mentioned
Reliability often based on a small sample
What are the basic elements of a theory?
Construct - things that exist but are not directly measurable or observable
Hypothesis - specific ways in which variation in one construct is assumed to be related
Operational definitions
What is construct validity?
The degree to which the measures actually reflect or represent the construct they were designed to measure
-How things actually work corresponds with how you think they work
What is content validity?
The degree to which a measurement is judged to reflect the meaningful elements of a construct and not to reflect any extraneous elements
All about determining if you have a good definition of your construct and if your operational measures align well with your definition of the construct
What are the requirements for content validity?
A thorough description of your construct
Ability to distinguish your construct from others
The items included in the measurement should be representative of many different elements of your construct
What is criterion validity?
You check the performance of your operationalization/measure against some criteria
You make predictions about relationships between measures based on your theory of the construct
The degree to which your measures behave the way they should, given the theory of the construct
What is predictive validity?
The extent to which one measure can correctly predict another measure that you theoretically think should be able to predict
What is convergent validity?
The degree to which two tests measuring the same construct are highly correlated
What is discriminant validity?
The degree to which two tests measuring different constructs should be uncorrelated
What is face validity?
The degree to which the instrument appears to or looks like it measures the construct
Main function: increases acceptability of a test
If evident does not mean other forms of validity are present (vice versa)