Validity and Reliability Flashcards
What are the three aspects of reliability?
stability, internal consistency and interrater agreement
What is stability?
Test-retest reliability: the extent to which the same results are obtained on repeated applications
What is internal consistency?
the extent to which all of the items of the measure address the same underlying concept. Items that propose to measure the same general construct should produce similar scores.
What is interrater agreement?
The extent to which the results are in agreement when different individuals administer the same instrument to the same individuals/groups
What are the two types of interrater reliability?
- intra-rater: indicates how consistently a rater administers and scores a measure
- inter-rater: how well more than one rater agree in the way they administer and score the measure
What is validity?
The degree to which an instrument measures what it intends to measure
What are the types of validity?
face, content, consensual, criterion, construct and predicitive
What is face validity?
The relevance of the measurement - do the questions yield relevant info to the topic investigated? The perceived relevance to the test taker.
What may indicate poor face validity?
many ‘don’t know’ answers in a questionnaire
What is content validity?
The extent to which a measure represents all facets of a given phenomena or covers the whole concept
What is consensual validity?
a number of experts agree that the measure is valid
What is criterion validity?
concurrent and predictive validity. The extent to which the test agrees with a gold standard test known to represent the phenomena accurately.
What is predictive validity?
The extent to which the measurement can depict what may occur in the future
What is construct validity?
The extent to which an assessment measures a theoretical construct
What are the types of construct validity?
- convergent: scale should be related to variables and other measures of the same construct
- discriminative: demonstrate that is discriminates between groups and individuals
- factorial: items go together to create factors (correlation between test and major factors)
- discriminant: new test should not correlate with dissimilar, unrelated constructs
What is external validity?
The tool’s generalisability to other settings
What research evidence is needed for construct validity?
- hypothesis about relationship between variables
- select test items of behaviors that represent the construct
- collect data to test hypothesis
- determine if data supports hypothesis
What is a Rasch analysis used for?
determines unidimensionality of test items
What are the two types of criterion validity?
- concurrent - with established measure
- predictive - with future outcome
What research evidence is needed for predictive validity?
- identify criterion behavior and population sample
- administer test and keep until criterion data is available
- obtain measure of performance on each criterion
- determine strength of relationship
When is a measure reliable?
If it is`stable over time, across different examiners and across different forms of the measure
What are the different types of reliability?
- test-retest (stability)
- internal consistency (homogeneity)
- inter-rater (agreement)
- intra-rater (agreement)
- parallel form (agreement)
What are correlation statistics?
descriptive measures that show direction (positive and negative) and degree (how strong) of the relationship between two variables
What does an r value between 0 and 0.1 mean?
positive relationship
What does an r value between -0.1 and 0 mean>
negative relationship
Outline the strength of a relationship when the value lies between 0.0 - 0.25
little or no relationship
Outline the strength of a relationship when the value lies between 0.25 - 0.50
fair degree of relationship
Outline the strength of a relationship when the value lies between 0.5 - 0.75
moderate to good relationship
Outline the strength of a relationship when the value >0.75
good to excellent relationship
What statistical test should be used for categorical data?
- kappa statistic
- weighted kappa
- Spearman rho’s correlation
What statistical test should be used for continuous data?
- intraclass correlation (ICC)
- Pearson’s correlation
How do you calculate the reliability parameter?
variability between study objects divided by variability between raters + variability between study objects + measurement error
When does the reliability parameter approach 1?
if measurement error is small compared to the variability between persons
What are agreement statistics?
The degree to which scores taken on one occasion are different to scores from another occasion
What is the formula of absolute agreement?
score 1 = score 2 (expressed as a %, i.e. 100%)
What are limits of agreement?
assessment of the variability between scores
When are agreement statistics used?
when examining inter-rater and intra-rater reliability
What is the standard error of measurement?
estimated amount of error in a measurement
What is internal consistency?
degree to which all test items measure the same construct
What research is needed for internal consistency?
- a range of items in rest administered to a sample
- correlation between items is assessed
- correlation between items and overall test score (item-total correlation)
What are acceptable levels of internal consistency?
- conbach’s alpha between 0.7 & 0.9
- Rasch analysis showing unidimensionality
What is the importance of good test-retest reliability?
if the score changes when the person’s ability has not, we can’t be confident we can measure change when it occurs
What research evidence is needed for test-retest reliability?
- tests are administered on two or more occasions
- time between is not too soon (subjects remember the test) and not too long (changed ability)
- same clients and raters are used each time
What are acceptable levels of test-retest reliability?
ICC >0.7
Kappa or weighted kappa >0.7
Why is inter-rater reliability important?
important for when clients must change services or if more than one therapist sees the client
What research is needed for inter-rater reliability?
- studies between 50 - 100 clients and >5 raters
- therapists administer the same performances of the test independently from one another
What is an acceptable inter-rater reliability?
ICC >0.7
kappa or weighted kappa >0.7
What research is needed to determine intra-rater reliability?
- 50-100 clients and >1 rater
- time between tests usually brief or if possible the same performance is assessed
What effects intra-rater reliability?
experience of the rater using the test
What are acceptable levels of intra-rater reliability?
ICC >0.7
Kappa or weighted kappa >0.7
What is parallel form?
correlation between scores for the same person on two or more forms of the test
What are acceptable levels of parallel forms?
ICC >0.8