Test Construction Flashcards

Question 1

Q

Item Difficulty

Answer

A

• The proportion of examinees in the tryout sample who answer the item correctly
• Used to measure examinees knowledge or skill level
Range 0-1

Question 2

Q

“p”

Answer

A

Item Difficulty

Question 3

Q

Item Discrimination

Answer

A

• refers to how much an item discriminates between examinees who obtain low or high scores on the test or an external criterion
• Calculated by subtracting the percent of examinees in the lower scoring group who answered the item correctly from the percent of examinees in the upper-scoring group who answered the item correctly
“D”; Range: -1-+1, .35 is acceptable

Question 4

Q

Item Discrimintation:

-1, 0, +1

Answer

A

+1 (positive answer), all upper-scoring group answered the item lower-scoring group answered the item incorrectly
0, both groups answered the item correctly
-1, lower-scoring group answered the item correctly upper-scoring group answered the item incorrectl

Question 5

Q

Classical Test Theory

Answer

A

• variability in test scores reflects a combination of true score variability and variability due to measurement (random) error 
X = T + E 
Total Variability (X) = True Score Measurement Error Variability (T) + Measurement Error (E)

Question 6

Q

True Score Variability

Answer

A

Actual test knowledge

Question 7

Q

Measurement Error

Answer

A

environment or guessing

Question 8

Q

Reliability (4 things)

Answer

A

make sure examinees score reflects their true score, rxx, range is 0-1, .8 is acceptable

Question 9

Q

Reliability Coefficient

Answer

A

measure of true score variability

Question 10

Q

Reliability 4 methods (4)

Answer

A

test-retest
alternate forms
internal consistency
inter-rater

Question 11

Q

Test-Retest Reliability (3)

Answer

A

Consistency over time, also known as coefficient of stability, NOT fluctuating chara.. or random affects

Question 12

Q

Alternate Forms Reliability (3)

Answer

A

consistency over 2 forms of a test, aka parallel test reliability, appropriate for forms stable over time, NOT fluxuacting chara or random affects

Question 13

Q

Internal Consistency (3)

Answer

A

degree of consistency across different test items. Appropriate for a single content or behavior domain Measured by split-half reliability and Cronbach’s coefficient alpha

Question 14

Q

Split-Half Reliability (6)

Answer

A

associated with Internal Consistency, splits test in half and correlates, coefficient corrected by Spearman Brown prophesy formula, NOT for speeded test, Crohnbech’s coefficient alpha, Kudar-Richardson formula 20

Question 15

Q

Spearman-Brown Formula

Answer

A

ass w/ Internal Consistency, used along with split-half more generally to estimate the effect of shortening or lengthening a test on its reliability coefficient,

Question 16

Q

Cronbach’s Coefficient Alpha

Answer

A

Used with Split-Half Reliability
“mean of all possible split-half a correlation coefficients”, Can’t use Alpha with “forced choice”, Cronbach’s α: used with continuous variables

Question 17

Q

Kuder-Richardson Formula 20 (KR-20) (3)

Answer

A

Ass w/ Split-Half Reliability, Can be used as a substitute for coefficient alpha when test items are scored dichotomously, Used for T/F Multiple choice questions when there is right or wrong

Question 18

Q

Inter-rater Reliability (5)

Answer

A

Important for measures that are subjectively scored ex. essay, make sure obtain the same score no matter who is doing the scoring, measured using percent agreement (over est Interrater reliability), or using Cohen’s Kappa statistic, or Kendall’s coefficient of concordance

Question 19

Q

Cohen’s kappa statistic

Answer

A

Ass w/ Inter-rater reliability, used to measure agreement between two raters when scores represent a nominal scale

Question 20

Q

Kendall’s coefficient of concordance

Answer

A

ass with inter-rater reliability, is used to measure agreement between three or more raters when scores are reported as ranks

Question 21

Q

Factors that Affect Reliability (4)

Answer

A

Test length: longer tests are more reliable
Range of scores: increases the size of the reliability coefficient
Content of the test: more homogenous, more reliable
Likelihood of items can be answered by guessing: less choice/guessing, more reliable

Question 22

Q

Confidence Interval

Answer

A

indicates the range within which an examinee’s true score is likely to fall given his/her obtained score
derived using the standard error of measurement (SEM)

Question 23

Q

Standard Error of Measurement (SEM)

Answer

A

o used to obtain a confidence interval around obtained test score
•68% confidence interval: one SEM is added to and subtracted
• 95% confidence interval: two SEM’s are added to and subtracted
•99% confidence interval: three SEM’s are added to and subtracted
Standard deviation x square root of 1-rxx;

Question 24

Q

Consensual observer drift

Answer

A

ass with inter-rater reliability, occurs when two or more observers working together influence each other’s ratings on a behavioral rating scale so that they assign ratings in a similar idiosyncratic way.

Question 25

Q

Coefficient of concordance:

Answer

A

is another measure of inter-rater reliability.

Question 26

Q

Validity

Answer

A

Refers to a test’s accuracy in terms of the extent to which the test measures what it was designed to measure (main ones: content, construct, criterion);

Question 27

Q

Content Validity

Answer

A

measures a specific content or behavior domain

Question 28

Q

Construct Validity

Answer

A

measures a theoretical hypothetical trait or construct

Question 29

Q

Criterion-related Validity

Answer

A

used to predict or estimate an examinee’s status on an external criterion

Question 30

Q

Face Validity

Answer

A

Refers to whether or not test items “look like” they’re measuring what the test is designed to measure, not an actually type of validity

Question 31

Q

multitrait-multimethod matrix

Answer

A

Used with construct validity; a table of correlation coefficients that provide information about a test’s convergent and divergent (discriminant) validity

Question 32

Q

Discriminant Validity

Answer

A

convergent and divergent validity

Question 33

Q

Multitrait-Multimethod Matrix (4 measures)

Answer

A

Measure being validated
Measure of the same trait using a different method
Measure of an unrelated trait using the same method
Measure of the same unrelated trait using a different method

Question 34

Q

Convergent Validity

Answer

A

ass with discriminant validity; Correlation between the test we’re validating and the measure of the same trait using a different method

Question 35

Q

Divergent Validity

Answer

A

ass with Discriminant Validity; Correlations between the test we’re validating and the measures of unrelated traits

Question 36

Q

Factor Analysis

Answer

A

more complex way to measure construct validity as well as discriminant validity; 1. Administer tests to a sample of examinees

Derive and interpret the correlation matrix
Extract the initial factor matrix (difficult to interpret)
Rotate the factor matrix (make it easier to interpret)

Question 37

Q

communality

Answer

A

Ass with factor analysis; single variable, multiple factors

Question 38

Q

Factor Matrix

Answer

A

Orthogonal means uncorrelated,

Oblique means correlated

Question 39

Q

Criterion-Related Validity

Answer

A

Important when test scores will be used to predict or estimate status on a criterion (on a different measure). Coefficient is always less than ±1; Evaluated by correlating scores on the test (predictor) with scores on the criterion for a sample of examinees to obtain a criterion-related validity coefficient

Question 40

Q

Concurrent Validity:

Answer

A

involves obtaining scores on the predictor and criterion at about the same time (current status) vs. predictive Validity

Question 41

Q

Predictive Validity

Answer

A

Involves obtaining predictor scores prior to obtaining criterion scores

Question 42

Q

Standard Error of Estimate:

Answer

A

is used to construct a confidence interval around a predicted criterion score (vs. SEM for obtained test score) SEest= SDy√1-rxy2

Question 43

Q

SEM vs Sest

Answer

A

SEM : confidence interval around a measure or obtained score

SEest: confidence interval around a predicted score

Question 44

Q

Validity vs. Reliability

Answer

A

reliability is a necessary but not sufficient condition for validity, ex. A valid test must be reliable but reliability doesn’t guarantee validity; Validity coefficient must be less than the square root of the Reliability Coefficient
Rxy ≤ √Rxx

Question 45

Q

Steps in Validating a Predictor:

Answer

A

Conduct a job analysis
Select/develop the predictor and criterion
Obtain and correlate scores on the predictor and criterion
Check for adverse impact
Evaluate incremental validity
Cross-validate

Question 46

Q

Incremental Validity:

Answer

A

Refers to the increase in decision-making accuracy that use of a predictor provides

Question 47

Q

Incremental

Validity Scatterplot:

Answer

A

Criterion-Y axis Predictor-X axis

Question 48

Q

Incremental Validity Calculation

Answer

A

Calculated by subtracting the base rate from the positive hit rate

Question 49

Q

Positive Hit Rate

Answer

A

ass w/ Incremental Validity; (# of people hired and successful on criterion, true positives/total positives) incremental validity: Calculated by subtracting the base rate from the positive hit rate

Question 50

Q

Base Rate

Answer

A

Ass w/ Incremental Validty; ( # of people hired without the predictor who are successful # of successful individuals/total number of individuals); incremental validity: Calculated by subtracting the base rate from the positive hit rate

Question 51

Q

Specificity

Answer

A

Ass w/ Incremental validity; Refers to the identification of true negatives (percent of cases in the validation sample who do not have the disorder and were accurately classified by the test as not having the disorder).

Question 52

Q

Sensitivity

Answer

A

Ass with Incremental Validity; refers to the probability that a predictor will correctly identify people with the disorder from the pool of people with the disorder. It is calculated using following formula: true positives or (true positives + false negatives).

Question 53

Q

Norm-Referenced Interpretation:

Answer

A

Compares examinees’ test scores obtained in a test score to scores obtained in a standardization sample or other comparison group; raw score is converted to a score that indicates his/her relative standing in the comparison group ex. Standard Score, Percentile rank, Z score, T score, IQ

Question 54

Q

Percentile Rank (4)

Answer

A

Ranges from 1-99
examinee s’ score in terms of score in terms of percentage of examinees who achieved lower scores
Distribution is always flat (rectangular) regardless of the shape of the raw score distribution
Maximize differences in the middle of the raw score distribution and minimize differences at the extremes.

Question 55

Q

Nonlinear Transformation:

Answer

A

Changes the shape of the original raw score
Limitation: they indicate an examinee’s relative position in a distribution but do not provide information about differences between examinees on raw scores

Question 56

Q

Standard Scores:

Answer

A

Indicates the examinee’s relative standing in the comparison group in terms of standard deviations from the mean

Question 57

Q

Z-scores, T-scores, and deviation IQ

Answer

A

• Z-score distribution has a mean of 0 and standard deviation of 1
o if an examinee obtains a score of 110 on a test that has a mean of 100 and standard deviation of 10, his/her z-score is +1.0
•T score: mean of 50 and SD of 10
•Deviation IQ score: Mean score of 10 and SD of 15

Question 58

Q

Z Scores

Answer

A

o calculated by subtracting the mean of the distribution from the examinee’s score to obtain a deviation score and dividing the deviation score by the distribution’s standard deviation
o if an examinee obtains a score of 110 on a test that has a mean of 100 and standard deviation of 10, his/her z-score is +1.0

Question 59

Q

Criterion-Referenced Interpretation

Answer

A

Involves interpreting an examinee s’ score in terms of a predefined standard; • percent correct (percentage) score; cutoff score is usually set; also used to interpret likely status on an external criterion using regression equation or expectancy table
Ex. Pass or fail test

Question 60

Q

Leptokurtic

Answer

A

Distribution of scores that is more pointed than normal distribution

Question 61

Q

Platokurtic

Answer

A

distribution of scores that is more flat than the normal distribution

Question 62

Q

Eigenvalue

Answer

A

Indicates the total amount of variability in a set of tests or other variables that is explained by an identified component or factor