lecture 5: standardized assessment and psychometrics (1) Flashcards
what is an outcome?
the end result of clinical activity/intervention
what is an outcome measure?
- instrument shown to measure desirable traits accurately
- any measurement system used to uncover or identify health outcome of treatment
- process by which changes measure over two or more points in time
why is it important to measure?
- help determine status at the start of intervention
- help determine is someone is actually improving during and at end of intervention
- improves clinical decision-making, care and client outcomes
- component of EBP
- aid w/ objectivity –> concrete evidence
what is the instrument evaluation process (IEP)?
- used to guide process of appraising outcome collection
- if answer no to any question = need to select another instrument
- IN SLIDES**
what is step 1 of the IEP?
- is the assessment clinically useful?
- determine usefulness and usability for specific setting/purpose
what factors do you need to consider in step 1 of the IEP?
- clinical applicability
- specificity
- availability
- time/training demands
- acceptability to clients
- cost
what is step 2 of the IEP?
- is it standardized?
what are standardized assessments?
- questions/methods/conditions for the administration, scoring, and interpretation of the results are consistent
- allows for trustworthy comparison of scores (time to time, client to similar group)
- any deviation from standardized procedures may = invalid conclusions about test performance (ex. modification of test instructions)
** want to stay as close to standardized instructions as possible - stray = lose standardization
what are the components of a standardized assessment?
- assessment manual
- instructions for administration
- standardized equipment/questions
- data on test construction, reliability, validity
- normative data
what is step 3 of the IEP?
- what is the instruments purpose?
what are three possible purposes to standardized assessments?
- descriptive - describe the status of the person/group of interest
- predictive - predict the clients future status
- evaluative - evaluate the change in status of a client overtime
- within each measure, need to look at construction, reliability, and validity
what are descriptive measures?
- describe 1+ aspects of a person’s status at one moment in time
- ex. occupational strengths/limitations, characteristics, behaviours
- often be used to classify an individual via comparison with norms
- information collected can be used to identify problems and to evaluate the need for intervention
what are predictive measures?
- show/foretell 1+ aspects of the client’s future status
- used to predict abut potential of client (ex. safety at home)
- often have norms
- used to screen individuals to determine eligibility for intervention/potential to benefit from a program
what are evaluative measures?
- evaluate change in status of a client overtime
- used at more than one point in time (beginning and end)
- must be sensitive to change (i.e. responsive)
- ex. COPM
what is test construction?
- instrument development process
- item inclusion/exclusion - does it include the questions you’d expect to see?
- scaling/weighting - how to scoring work? how is it totaled?
descriptive construction = descriptive items
predictive construction = predictive items
evaluative construction = responsive items
what is a norm-referenced measure?
- measures of the average or typical performance form the basis of how scores are interpreted
- norm = reference point for test score
- norm-referenced interpretation = comparing examinee’s test score to scores obtained by others in normative sample
- IMPORTANT to consider…
> characteristics of the sample in which the norms were developed
> how they were obtained
> how much that group is representative of the population the measure is intended for
what is reliabilty?
- consistency
- trustworthiness of a measure and its results
- reliable measure yields dependable and consistent measurement of what you are trying to measure
- degree to which the measure yields results that are free from measurement error (.e. works to dec measurement error)
what is a measurement error?
- difference between the true value of a phenomenon and its measured value
- caused by factors that…
> are irrelevant to what is being measured by the test > have an unpredictable effect on the test score
what are sources of measurement error?
- test construction
- test administration
- test scoring
- test interpretation
what are test construction errors?
- test questions worded unclearly
- test administration instructions unclear
- scoring procedure not clear
what are test administrations errors?
- test environment issues
- test-taker motivation and attention
- examiner-related variables
what are test scoring/interpretation errors?
- hand vs. computer scoring
- level of training
- subjectivity
what are ways to minimize measurement error?
- choose assessment with strong psychometric properties
- pilot-test assessments and instruments
- follow standardized instructions
- train interviewers or observers
- make oberservation/measurement as unobtrusive as possible
- keep test environment/equipment consistant with standardization
- double-check data
how is reliability usually scored?
- reported as a value of 0-1
> 1 = perfect reliability (no error) - closer to 1 = better reliability and less measurement error
- if value…
> 0.1 = poor reliability
> 0.9 = high reliability - statistic measures…
> Pearson product moment correlation coefficient (r)
> intra-class correlations coefficient (ICC)
> Spearman rank-order correlation (rho)
> Kappa statistic (k)
> Cronbach’s alpha
what are the ways to establish reliability?
- test-retest
- inter-rater
- internal consistency
what is test-retest reliability?
- stability of the measure overtime
- determined by calculating the agreement of scores at two different times for a characteristic that have NOT changed
> don’t use in client where statues to often variable - ICC > .70 = acceptable test-retest reliability (under = poor)
- time interval can vary depending on what is being measured
whats is inter-rater reliablity?
- degree to which scores by different raters yield the same results
- applies to assessments where test administrator assesses result
- determined by having several raters measure the same phenomena
- acceptable inter-rater = > .70
- descriptive, evaluative and predictive assessments should have high inter-rater reliability
what is internal consistency?
- degree of the relatedness among the items of an instrument
- used to determine if items on test are consistent with one another
- an estimate of the homogeneity of the structure of the test
- high internal consistency = items closely related
- measured with Cronbach’s alpha
- acceptable = 0.8 - 0.9
- if too high (0.97) = item redundancy
what are the most important reliabilities for each instrument purpose?
descriptive:
- internal consistency
- observer
predictive:
- test-retest
- observer
evaluative:
- test-retest
- observer
what is validity?
- accuracy
- extent to which assessment measures what it is intended to measure (ex. fatigue, balance, OP)
what are the types of validity?
- face
- content
- criterion - concurrent and predictive
- construct - convergent, divergent, discriminative
- responsiveness
what is face validity?
- assumption of validity based on a measure’s appearance
- subjective judgement - does test appear to measure what is says
- least reliable validity - ONLY used as preliminary screening
- if minimum requirement of face validity can not be established, its unlikely that it’ll hold up against other validity measures
what is content validity?
- degree to which the instrument items are a comprehensive reflection of what the instrument reports to be measuring
- does measure include ALL elements of a given concept
- established based on theoretical frameworks, expert opinions, or literature review
what is criterion validity?
- extent to which scores of assessments relate to gold standard/valid external criterion
- assessed by correlating the scores of a sample of individuals on the predictor with the scores on the criterion
- test = predictor, gold standard = criterion
what are the two types of criterion validity?
- concurrent: criterion data collected at the same time as data on predictor test
> ex. TB test (skin test = predictor, chest x-ray = criterion) - predictive: criterion data collected after the predictors test was administered
> scores on the MCAT (predictor) predict performance in medical school (criterion)
what is construct validity?
- degree to which scores of an instrument are consistent with a hypothesis about how they should perform
- based on testing a measure against an idea based on theory
- involves…
1. developing theoretical hypotheses relevant to the construct being assessed
2. investigating whether these hypotheses are upheld when the assessment is used - most difficult of all validities to establish
what are the types of construct validity?
- convergent
- degree to which scores are consistent with a hypothesis that the instrument WILL CORRELATE with another measurement - divergent
- degree to which the scores of an instrument are consistent with a hypothesis that the instrument WILL NOT CORRELATE with another measurement - discriminative
- degree to which the scores of an instrument are consistent with a hypothesis concerning DIFFERENCES BETWEEN GROUPS
what is responsiveness validity?
- ability of instrument to detect change over time in what it reports to be measuring
- aka. sensitivity to change
- evaluative assessments must have evidence of responsiveness
- done by taking a group that does change and seeing if the measure picks up change
- results expressed as effect size/standardized response mean
what are two characteristics of responsiveness validity?
- minimal detectable change (MDC): what amount of change, taking error into account, means that a change has actually occurred
> change by 1 point = probably error
> change by 3 points = change has occurred - minimal clinically important difference (MCID): what a patient would notice to be a meaningful change
> grip strength increased by 3, MCID = 2 (can assume there was change)
> grip strength increased by 3, MCID = 5 (yes, grip strength improved BUT it hasn’t effected QOL)
what are the most important validities for each instrument purpose?
descriptive:
- content
- construct
predictive:
- content
- criterion
evaluative:
- content
- construct
what is cross-cultural validity?
- degree to which the performance of items on a translated or culturally adapted measure are a reflection of the performance of items of the original version
what is ecological validity?
- degree to which a measure reflects real life
- ex. ability to memorize random words vs an address/name
what are tips for evaluating a measure?
- obtain a copy of the measure
- not always easy
- detective work - refer to books that evaluate the measure
- read literature (especially the first publication of a measure)
- check wide range of literature
- follow a template for evaluation and do your own evaluation
how do you find measures to use?
- search engine
- textbooks
- library database
- measurement cupboard