Stats & Test Construction Flashcards
Type I error
Mistakenly rejecting the null hypothesis when it’s true
Alpha
Type II error
Mistakenly retaining the null hypothesis when it is false
Beta
Discriminant analysis
Technique in multivariate statistics that describes differences between 2+ groups on a set of measures or that classifies subjects into groups based on a set of measures
Threats to internal validity
Maturation, history, instrumentation, statistical regression, selection, attrition/mortality, interaction w/ selection
Ways to control threats to internal validity
Random assignment, within-subjects designs, blocking, matching subjects, ANCOVA
Threats to external validity
interaction b/t testing & treatment, interaction b/t selection & tx, reactivity, multiple tx interference (order/carryover effects)
Ways to control external validity
Random sampling, naturalistic/field research, single or double-blind designs, counterbalance
What are some ways to increase power?
Increase alpha, increased N, increase effect size, decrease error, use powerful statistics, one-tailed if possible
What percentage of scores on the normal curve fall between +/- 1 SD, +/- 2 SD, +/- 3 SD?
68%
95%
99%
What percentiles are equivalent to the following z-scores? -3 -2 -1 1 2 4
0.1 = -3 2 = -2 16 = -1 84 = 1 98 = 2 99.9 = 3
Factors affecting test reliability
Test characteristics (length, item type, item homogeneity, influence of guessing), sample characteristics (sample size, range, variability), extent of test clarity
Sources of error in internal reliability
Content sampling, heterogeneity of content domain
Sources of error in test-retest reliability
Time-sampling factors
Which type of reliability is best for speed tests?
Alternate forms
Sources of error in inter-rater reliability
Factors related to raters (motivation, biases), characteristics of measuring device, consensual observer drift
Dimensions of relevance in item analysis
1) Content appropriateness (item assesses bx domain the test is intended to evaluate)
2) Taxonomic level (does item reflect appropriate cognitive or ability level of population intended for)
3) Extraneous abilities (to what extent are knowledge or skills needed that is outside the domain being evaluated)
Item difficulty
The %age of people who get an item correct
Item discrimination
Extent an item differentiates between those who get a high vs. low score
.35 or more is acceptable
Item response theory
Tests based on examinee’s level on the trait being measured vs total test score
Reliability coefficient
Proportion of variability in obtained test scores that reflects true score variability
Never squared to interpret
Standard error of measurement (SEM)
An index of the amount of error that can be expected in a person’s obtained scores due to the unreliability of the test
What qualitative evidence do you look for in a task that has good content validity?
Coefficient of internal consistency will be large
Test will correlate highly with other tests of the same domain
Pre- and post-test evals of the program designed to increase familiarity with domain will indicate appropriate changes
Orthogonal rotation
Resulting factors are uncorrelated; attribute measured by one factor is independent from the attributes measured by the other factor
Oblique rotation
Resulting factors are correlated & attributes measured by the factors are not independent