Test Construction Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Item Difficulty

A

• The proportion of examinees in the tryout sample who answer the item correctly
• Used to measure examinees knowledge or skill level
Range 0-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

“p”

A

Item Difficulty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Item Discrimination

A

• refers to how much an item discriminates between examinees who obtain low or high scores on the test or an external criterion
• Calculated by subtracting the percent of examinees in the lower scoring group who answered the item correctly from the percent of examinees in the upper-scoring group who answered the item correctly
“D”; Range: -1-+1, .35 is acceptable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Item Discrimintation:

-1, 0, +1

A

+1 (positive answer), all upper-scoring group answered the item lower-scoring group answered the item incorrectly
0, both groups answered the item correctly
-1, lower-scoring group answered the item correctly upper-scoring group answered the item incorrectl

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classical Test Theory

A
• variability in test scores reflects a combination of true score variability and variability due to measurement (random) error 
X = T + E 
Total Variability (X) = True Score Measurement Error Variability (T) + Measurement Error (E)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True Score Variability

A

Actual test knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measurement Error

A

environment or guessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Reliability (4 things)

A

make sure examinees score reflects their true score, rxx, range is 0-1, .8 is acceptable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Reliability Coefficient

A

measure of true score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reliability 4 methods (4)

A
  1. test-retest
  2. alternate forms
  3. internal consistency
  4. inter-rater
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Test-Retest Reliability (3)

A

Consistency over time, also known as coefficient of stability, NOT fluctuating chara.. or random affects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Alternate Forms Reliability (3)

A

consistency over 2 forms of a test, aka parallel test reliability, appropriate for forms stable over time, NOT fluxuacting chara or random affects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Internal Consistency (3)

A

degree of consistency across different test items. Appropriate for a single content or behavior domain Measured by split-half reliability and Cronbach’s coefficient alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Split-Half Reliability (6)

A

associated with Internal Consistency, splits test in half and correlates, coefficient corrected by Spearman Brown prophesy formula, NOT for speeded test, Crohnbech’s coefficient alpha, Kudar-Richardson formula 20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Spearman-Brown Formula

A

ass w/ Internal Consistency, used along with split-half more generally to estimate the effect of shortening or lengthening a test on its reliability coefficient,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cronbach’s Coefficient Alpha

A

Used with Split-Half Reliability
“mean of all possible split-half a correlation coefficients”, Can’t use Alpha with “forced choice”, Cronbach’s α: used with continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Kuder-Richardson Formula 20 (KR-20) (3)

A

Ass w/ Split-Half Reliability, Can be used as a substitute for coefficient alpha when test items are scored dichotomously, Used for T/F Multiple choice questions when there is right or wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Inter-rater Reliability (5)

A

Important for measures that are subjectively scored ex. essay, make sure obtain the same score no matter who is doing the scoring, measured using percent agreement (over est Interrater reliability), or using Cohen’s Kappa statistic, or Kendall’s coefficient of concordance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cohen’s kappa statistic

A

Ass w/ Inter-rater reliability, used to measure agreement between two raters when scores represent a nominal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Kendall’s coefficient of concordance

A

ass with inter-rater reliability, is used to measure agreement between three or more raters when scores are reported as ranks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Factors that Affect Reliability (4)

A

Test length: longer tests are more reliable
Range of scores: increases the size of the reliability coefficient
Content of the test: more homogenous, more reliable
Likelihood of items can be answered by guessing: less choice/guessing, more reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Confidence Interval

A
  • indicates the range within which an examinee’s true score is likely to fall given his/her obtained score
  • derived using the standard error of measurement (SEM)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standard Error of Measurement (SEM)

A

o used to obtain a confidence interval around obtained test score
•68% confidence interval: one SEM is added to and subtracted
• 95% confidence interval: two SEM’s are added to and subtracted
•99% confidence interval: three SEM’s are added to and subtracted
Standard deviation x square root of 1-rxx;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Consensual observer drift

A

ass with inter-rater reliability, occurs when two or more observers working together influence each other’s ratings on a behavioral rating scale so that they assign ratings in a similar idiosyncratic way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Coefficient of concordance:

A

is another measure of inter-rater reliability.

26
Q

Validity

A

Refers to a test’s accuracy in terms of the extent to which the test measures what it was designed to measure (main ones: content, construct, criterion);

27
Q

Content Validity

A

measures a specific content or behavior domain

28
Q

Construct Validity

A

measures a theoretical hypothetical trait or construct

29
Q

Criterion-related Validity

A

used to predict or estimate an examinee’s status on an external criterion

30
Q

Face Validity

A

Refers to whether or not test items “look like” they’re measuring what the test is designed to measure, not an actually type of validity

31
Q

multitrait-multimethod matrix

A

Used with construct validity; a table of correlation coefficients that provide information about a test’s convergent and divergent (discriminant) validity

32
Q

Discriminant Validity

A

convergent and divergent validity

33
Q

Multitrait-Multimethod Matrix (4 measures)

A
  1. Measure being validated
  2. Measure of the same trait using a different method
  3. Measure of an unrelated trait using the same method
  4. Measure of the same unrelated trait using a different method
34
Q

Convergent Validity

A

ass with discriminant validity; Correlation between the test we’re validating and the measure of the same trait using a different method

35
Q

Divergent Validity

A

ass with Discriminant Validity; Correlations between the test we’re validating and the measures of unrelated traits

36
Q

Factor Analysis

A

more complex way to measure construct validity as well as discriminant validity; 1. Administer tests to a sample of examinees

  1. Derive and interpret the correlation matrix
  2. Extract the initial factor matrix (difficult to interpret)
  3. Rotate the factor matrix (make it easier to interpret)
37
Q

communality

A

Ass with factor analysis; single variable, multiple factors

38
Q

Factor Matrix

A

Orthogonal means uncorrelated,

Oblique means correlated

39
Q

Criterion-Related Validity

A

Important when test scores will be used to predict or estimate status on a criterion (on a different measure). Coefficient is always less than ±1; Evaluated by correlating scores on the test (predictor) with scores on the criterion for a sample of examinees to obtain a criterion-related validity coefficient

40
Q

Concurrent Validity:

A

involves obtaining scores on the predictor and criterion at about the same time (current status) vs. predictive Validity

41
Q

Predictive Validity

A

Involves obtaining predictor scores prior to obtaining criterion scores

42
Q

Standard Error of Estimate:

A

is used to construct a confidence interval around a predicted criterion score (vs. SEM for obtained test score) SEest= SDy√1-rxy2

43
Q

SEM vs Sest

A

SEM : confidence interval around a measure or obtained score

SEest: confidence interval around a predicted score

44
Q

Validity vs. Reliability

A

reliability is a necessary but not sufficient condition for validity, ex. A valid test must be reliable but reliability doesn’t guarantee validity; Validity coefficient must be less than the square root of the Reliability Coefficient
Rxy ≤ √Rxx

45
Q

Steps in Validating a Predictor:

A
  1. Conduct a job analysis
  2. Select/develop the predictor and criterion
  3. Obtain and correlate scores on the predictor and criterion
  4. Check for adverse impact
  5. Evaluate incremental validity
  6. Cross-validate
46
Q

Incremental Validity:

A

Refers to the increase in decision-making accuracy that use of a predictor provides

47
Q

Incremental

Validity Scatterplot:

A

Criterion-Y axis Predictor-X axis

48
Q

Incremental Validity Calculation

A

Calculated by subtracting the base rate from the positive hit rate

49
Q

Positive Hit Rate

A

ass w/ Incremental Validity; (# of people hired and successful on criterion, true positives/total positives) incremental validity: Calculated by subtracting the base rate from the positive hit rate

50
Q

Base Rate

A

Ass w/ Incremental Validty; ( # of people hired without the predictor who are successful # of successful individuals/total number of individuals); incremental validity: Calculated by subtracting the base rate from the positive hit rate

51
Q

Specificity

A

Ass w/ Incremental validity; Refers to the identification of true negatives (percent of cases in the validation sample who do not have the disorder and were accurately classified by the test as not having the disorder).

52
Q

Sensitivity

A

Ass with Incremental Validity; refers to the probability that a predictor will correctly identify people with the disorder from the pool of people with the disorder. It is calculated using following formula: true positives or (true positives + false negatives).

53
Q

Norm-Referenced Interpretation:

A

Compares examinees’ test scores obtained in a test score to scores obtained in a standardization sample or other comparison group; raw score is converted to a score that indicates his/her relative standing in the comparison group ex. Standard Score, Percentile rank, Z score, T score, IQ

54
Q

Percentile Rank (4)

A
  1. Ranges from 1-99
  2. examinee s’ score in terms of score in terms of percentage of examinees who achieved lower scores
  3. Distribution is always flat (rectangular) regardless of the shape of the raw score distribution
  4. Maximize differences in the middle of the raw score distribution and minimize differences at the extremes.
55
Q

Nonlinear Transformation:

A

Changes the shape of the original raw score
Limitation: they indicate an examinee’s relative position in a distribution but do not provide information about differences between examinees on raw scores

56
Q

Standard Scores:

A

Indicates the examinee’s relative standing in the comparison group in terms of standard deviations from the mean

57
Q

Z-scores, T-scores, and deviation IQ

A

• Z-score distribution has a mean of 0 and standard deviation of 1
o if an examinee obtains a score of 110 on a test that has a mean of 100 and standard deviation of 10, his/her z-score is +1.0
•T score: mean of 50 and SD of 10
•Deviation IQ score: Mean score of 10 and SD of 15

58
Q

Z Scores

A

o calculated by subtracting the mean of the distribution from the examinee’s score to obtain a deviation score and dividing the deviation score by the distribution’s standard deviation
o if an examinee obtains a score of 110 on a test that has a mean of 100 and standard deviation of 10, his/her z-score is +1.0

59
Q

Criterion-Referenced Interpretation

A

Involves interpreting an examinee s’ score in terms of a predefined standard; • percent correct (percentage) score; cutoff score is usually set; also used to interpret likely status on an external criterion using regression equation or expectancy table
Ex. Pass or fail test

60
Q

Leptokurtic

A

Distribution of scores that is more pointed than normal distribution

61
Q

Platokurtic

A

distribution of scores that is more flat than the normal distribution

62
Q

Eigenvalue

A

Indicates the total amount of variability in a set of tests or other variables that is explained by an identified component or factor