Chapter 5: Measurement* Flashcards
How do you evaluate the accuracy of your measurement tool?
Reliability and construct validity
Reliability
the degree to which a measure (of behavior/personality/intelligence/psychological construct) is consistent, providing a stable form of measurement
Construct validity
the degree to which your measure actually measures what you want it to measure
True score theory
If only measuring a variable, true score is a person’s real score. If conducting an experiment, true score is a person’s score affected by the condition they are in.
Errors
sources of variability in your measure caused by things other than your IV (if there is an IV)
Types of errors
random and systematic
True score
an individual’s actual level of the variable being measured, not the score they get on the measure of that variable
Measurement error
any contributor to the measure’s score that is not based on the actual level of the variable of interest (i.e. not the true score); responsible for the degree to which a measure’s score deviates from the true score
Random error
has no pattern, unavoidable, unpredictable, and can’t be replicated by repeating the experiment e.g. misreading or misunderstanding questions, time of day
Systematic error
has a pattern, produces consistent errors, and affects a participant’s scores in all conditions e.g. response biases, individual differences, incorrectly calibrated measuring instruments
Why is low reliability a problem?
Difference between conditions can be misleadingly inflated or deflated due to unreliable measurements
Types of reliability for a measure using the correlation coefficient
Test-retest reliability, internal consistency reliability, inter-rater reliability
Test-retest reliability
degree of reliability assessed by administering the same measure on two different occasions, then calculating the correlation between the two different scores obtained
Alternate forms reliability (solution for practice effects in test-retest)
two different forms of the same test are administered on two separate occasions
Challenges of test-retest reliability
practice effects and demand characteristics
Internal consistency reliability
form of reliability assessing the degree to which items in a scale are consistent in measuring the same construct or variable
Cronbach’s alpha
indicator of internal consistency reliability assessed by examining the average correlation of each item in a measure (inter-item correlations) with every other question; higher alpha = more reliable (max is 1)
Challenge of internal consistency reliability
makes the assumption that items actually measure the same construct
Interrater reliability
an indicator of reliability that examines the degree to which two or more raters agree on an observation (score), having the same or similar judgements for a set of stimuli
Intraclass correlation coefficient (ICC)
higher ICC = greater inter-rater reliability (max of 1)
Challenge of inter-rater reliability
judges need to be trained and independent, and can be expensive
Uses of inter-rater reliability
behavioral coding, personality measures, thematic/content coding
Construct validity
the degree to which a measure accurately measures the theoretical construct it is designed to measure; the quality of operationalization
Construct or conceptual variable
abstract variable that, in its natural form, can’t be quantified; needs an operational definition
Face validity
the degree to which a measure appears to measure the intended variable; a subjective process
Content validity
form of construct validity evaluated by comparing the content of the measure to the theoretical definition of the construct, ensuring that all aspects of the construct are measured and no other extraneous elements
Predictive validity
aspect of construct validity that involves examining if a measure can predict a theoretically relevant FUTURE behavior or criterion
Concurrent validity
type of construct validity that examines whether the measure can predict a criterion measured at the same time the measure is administered
Convergent validity
aspect of construct validity assessed by examining the extent to which scores on the measure are related to other measures of the same or similar constructs
Discriminant/divergent validity
aspect of construct validity in which scores on a measure are not related to scores on conceptually unrelated measures
Indicators of construct validity of a measure
face validity, content validity, predictive validity, concurrent validity, convergent validity, discriminant validity
When are reliability and validity necessary?
(1) reliability alone is necessary (but not sufficient) to establish validity (2) construct validity is not necessary to establish reliability (3) reliability and indicators of construct validity are both necessary to establish construct validity
Reactivity
when reacting to the act of measuring or observing something changes a person’s behavior; minimized by using nonreactive or unobtrusive operationalizations
Scales of measurement
nominal, ordinal, interval, ratio
Nominal scale
scale of measurement with 2 or more categories that have no numerical properties; a.k.a. categorical variables
Ordinal scale
scale of measurement in which the measurement categories form a rank order along a continuum but the distance is unknown
Interval scale
a scale of measurement in which the intervals between numbers on the scale are all equal and zero is arbitrary (i.e. does not indicate a complete absence of quantity) e.g. temperature in fahrenheit and celsius
Ratio scale
numeric scale of measurement with equal intervals and has a meaningful zero indicating total absence of the variable measured e.g. temperature in Kelvin