Test Construction Flashcards by Adrienne Rosenberg

Alternate Forms Reliability

coefficient of equivalence

How well did you know this?

Not at all

Perfectly

Coefficient Alpha (Chronback’s Alpha)

method for assessing internal consistency reliability when items are not answered dichotomously

How well did you know this?

Not at all

Perfectly

KR-20

method for assessing internal consistency reliability when items are answered dichotomously (they are either correct or not correct)

How well did you know this?

Not at all

Perfectly

Kappa Statistic

used to measure inter-rater reliability when data are nominal or ordinal (discontinuous)

How well did you know this?

Not at all

Perfectly

Test-Retest Reliability

yields a coefficient of stability

How well did you know this?

Not at all

Perfectly

Spearman Brown Formula

corrects for the artificially low reliability coefficient from testing via split-half reliability (low coefficient due to shorter test length)

How well did you know this?

Not at all

Perfectly

Size of reliability coefficient

smaller if it’s easy to get correct answer via random chance

How well did you know this?

Not at all

Perfectly

Difficulty Index

btwn 0 (no one can answer correct) - 1 (everyone answers correct)

How well did you know this?

Not at all

Perfectly

orthogonal factors v. oblique factors

orthogonal=uncorrelated (independent), oblique=correlated (dependent)

How well did you know this?

Not at all

Perfectly

Concurrant Validity

type of criterion-related validity. extent to which scores related to an external criterion

How well did you know this?

Not at all

Perfectly

Divergent (Discriminant) Validity

When scores on a measure are correlated with scores on unrelated traits (large coefficient) that is bad

How well did you know this?

Not at all

Perfectly

cross-validation

done during test revision, associated with “shrinkage” of the criterion-related validity

How well did you know this?

Not at all

Perfectly

external validity

researcher’s ability to generalize the results of the study to other individuals, settings, conditions

How well did you know this?

Not at all

Perfectly

internal validity

researcher’s ability to determine whether there is a causal relationship between variables

How well did you know this?

Not at all

Perfectly

pearson r

method of measuring inter-rater reliability, method for calculating criterion-related validity when both are on continuous scale

How well did you know this?

Not at all

Perfectly

methods of assessing internal consistency reliability

-split half (must correct with spearman brown) -KR-20 -chronback’s alpha

How well did you know this?

Not at all

Perfectly

4 methods of assessing reliability

inter-rater, internal consistency, alternate forms, test-retest

How well did you know this?

Not at all

Perfectly

standard error of measurement

Study These Flashcards

the standard deviation of a theoretically normal distribution of test scores acquired by one individual on equivalent tests (related to the reliability coefficient and the SD of the test)

calculating confidence interval of a true test score

Study These Flashcards

person’s score + or - one or two standard errors of measurement (68% vs 95%)

standard error of estimate

Study These Flashcards

standard deviation of a theoretically normal distribution of criterion scores obtained by one person measured repeatedly

Taylor-Russell Tables

Study These Flashcards

numerically describe amount of improvement in decisions when a predictor is introduced

incremental validity is optimized when

Study These Flashcards

base rate is moderate (.5), and selection ratio is low

item response theory

Study These Flashcards

used to predict to what extent an examinee contains a certain trait based on response to a particular item

factors affecting criterion-related validity

Study These Flashcards

range of scores (more heterogenous testers means higher validity), reliability of the predictor, reliability of the predictor and criterion, criterion contamination (usually results in inflated validity)

relationship between reliability of the predictor and criterion related validity

criterion-related validity cannot be higher than the square root of the reliability of the predictor

percentile corresponding with one standard deviation above the mean, and 2 standard deviations above

84, 97

type 1 error

the null hypothesis is falsely rejected

Spearman rank order (rho) correlation coefficient

used when both variables are ranks

phi correlation coefficient

used when both variables are true dichotomies

biserial correlation coefficient

when one variable is continuous and one is an artificial dichotomy

contingency correlation coefficient

when both variables are nominal

when there is a moderator variable, make sure the test has

differential validity

in a positively skewed distribution from greatest to lowest

mean, median, mode

MST

"mean square total"=measure of treatment effects and error (MSB+MSW)

MSW

"mean square within"=estimate of variability that is due purely to error

MSB

"mean square between"=estimate of variability due to treatment effects plus error

degrees of freedom for t-test for independent samples

N-2

where item characteristic curve hits the y axis

probability of getting answer right by guessing

statistical regression

tendency for extreme test scores to move toward the mean on retesting (threat to internal consistency if participants selected due to extreme scores on pre-test)

purpose of rotation in factor analysis

makes pattern of factor loadings easier to interpret

Solomon Four Group Design

controls for testing/test practice (which is a threat to internal validity)

F ratio

mean square between groups divided by mean square within groups (MSB/MSW)

eta

used to calculate correlation btwn x and y when relationship thought to be curvilinear

Test Construction Flashcards

(43 cards)