Chapter 5 Flashcards
Reliability (def)
consistency in measurement
Reliability coefficient
0 to 1 statistic.
4 types of reliability coefficients
1) test-retest reliability
2) alternate-forms reliability
3) split-half reliability
4) inter-scorer reliability
Measurement error (textbook def)
Inherent uncertainty with any measurement, even after minimizing preventable mistakes
2 influences that interfere with repeated measurement (in psych)
1) changes in object (eg. a constant flux of mood, alertness, motivation)
2) the act of measurement (i.e., carryover effects like fatigue, practice)
“True Score”
not actually true to concept. True score is tied to the specific measurement instrument.
What ‘score’ measures the truth independent of measurement?
Construct score.
the underlying score of some construct (eg. depression)
variance is made of what two subtypes of variance?
True variance (actual differences between people?) + Error variance (random variances that are irrelevant)
Define reliability in terms of variance
Proportion of total variance attributed to true variance
Random vs. Systematic Error
Random: unpredictable, inconsistent, without pattern
Systematic: predictable, constant, can be adjusted for
Bias (in error)
The degree of systematic error that influences measurement
How does item/content sampling contribute to error variance?
The specific content in some test may affect the results (eg. i hope they ask this question and not this)
What test administration effects contribute to error variance?
Environment: war, heat, gum, pencil, etc.
Testtaker variables: lack of sleep, emotions, drugs, etc.
Examiner-related variables: physical appearance, presence/absence
How does test scoring and interpretation contribute to error variance?
Some subjectivity in certain tests (eg. essays, creativity, etc.) can influence measurement.
test-retest reliability coefficient is also called what?
Coefficient of stability
What might affect test-retest reliability estimates?
Experience, practice, memory, fatigue, etc. may intervene.
alternate-forms/parallel-forms reliability estimates coefficient name
Coefficient of equivalence
Parallel vs. Alternate forms reliability
Parallel forms: Means and variances of test scores are equal
Alternate forms: different versions of same test, but aren’t parallel
2 similarities between parallel/alternate and test-retest reliability
1) two test administrations with same group
2) test scores can be affected by factors like fatigue, practice, learning, etc.
What additional source of error variance is present in alternate/parallel-forms reliability?
Item/Content sampling
Split-half reliability
Correlating two pairs of scores from a single test.
one half of a test Pearson r with another half, then adjust with Spearman-Brown formula
Odd-even reliability
split-half reliability by using odd vs. even numbers
How do number of items affect reliability coefficient? What method can see how many items needed?
Spearman-Brown.
More items is more reliability
What coefficient for inter-item consistency?
Coefficient alpha
Inter-scorer reliability
What coefficient?
Degree of consistency between 2 or more scorers.
Coefficient of inter-scorer reliability
DSM-5 Inter-rater reliability
Kappa = 0.44 (fair level moderately greater than chance)
Transient error
Error due to testtaker’s feelings, moods, or mental state over time
Homogeneity vs. Heterogeneity of test items
Homogenous: Functionally uniform items. Measures one factor (eg. one ability/trait). High internal consistency should happen
Heterogenous: Not just one factor measured in the test.
Does high internal consistency mean homogeneity of items?
Not necessarily.
More items will lead to high internal consistency coefficients as long as they’re positively correlated
Dynamic vs. static characteristics
Dynamic: Presumed to be relatively situational and changing
Static: presumed to be relatively unchanging
Restriction/Inflation of range
When some subgroup inflates or restricts the correlational analysis??
Power Test
Enough time to attempt all items, but so difficult that nobody gets perfect score
Speed test
Same level of difficulty in items and testtakers should complete everything correctly if unlimited time.
But only some will be able to complete the whole test
What’s differences in assumptions between CTT and IRT? (not specific, but ya..)
CTT assumptions are weak and easily met. IRT are rigorous.
Domain Sampling Theory
Reliability is based on how well a score assesses the domain of where a sample is drawn.
What is universe score in generalizability theory?
The true score (given same conditions, the same score will be obtained)
Generalizability Study
Coefficient of generalizability
how generalizable scores from a particular test are if administered in different situations.
Decision study
Usefulness of test scores in helping user make decisions. Follows generalizability study
Another way to say Item response theory
Latent-trait theory
Within CTT, what is the weight assigned to each item on a test?
Equal weight. IRT is differentital weight.
Dichotomous test items
can only answer with one of two responses
Polytomous test items
3 or more alternative responses
Rasch Model
a type of IRT model with underlying distribution assumption
Which measure is used to compare differences between scores?
Standard error of the difference