Test construction Flashcards

1
Q

What is the item discrimination index?

A

item discrimination index (D) indicates the difference between the percentage of examinees with high total test scores who answered the item correctly and the percentage of examinees with low total test scores who answered the item correctly. When the same percentage of examinees in the two groups answered the item correctly, D equals 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you increase reliability coefficient?

A

Reliability coefficients tend to be larger for longer tests than shorter tests as long as the added items address similar content as the original items do and when the tryout sample is heterogeneous with regard to the content measured by the test so that there is an unrestricted range of scores.

maximized when range of scores is unrestricted, when examinees are heterogeneous the range of scores is maximized

-difficulty level of items will also affect range, all easy or all hard items will lead to all high or low test scores, want average difficulty level of item to be mid-range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain classical test theory

A

Classical test theory is also known as true score theory and predicts that obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E), with measurement error referring to random factors that affect test performance in unpredictable ways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you interpret a reliability coefficient?

A

A reliability coefficient is interpreted directly as the amount of variability in test scores that’s due to true score variability. When a test’s reliability coefficient is .90, this means that 90% of variability in test scores is due to true score variability and the remaining 10% is due to measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the spearman-brown formula used for?

A

Test length is one of the factors that affects the size of the reliability coefficient, and the Spearman-Brown formula is often used to estimate the effects of lengthening or shortening a test on its reliability coefficient. This formula is especially useful for correcting the split-half reliability coefficient because assessing split-half reliability involves splitting the test in half and calculating a reliability coefficient for the two halves of the test. Therefore, split-half reliability tends to underestimate a test’s actual reliability, and the Spearman-Brown formula is used to estimate the reliability coefficient for the full length of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is Cohen’s Kappa coefficient used?

A

The kappa coefficient is used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale (e.g., when a rating scale classifies children as either meeting or not meeting the diagnostic criteria for ADHD).

used to evaluate inner-rater reliability

corrected for change agreement between raters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When do you use the Kuder-Richardson 20?

A

Kuder-Richardson 20 (KR-20) can be used to assess a test’s internal consistency reliability when test items are dichotomously scored (e.g., as correct or incorrect) (altenrative to cronbach’s alpha)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is test reliability?

A

extent to which a test provides consistent info

r =reliability coefficent (a correlation coefficient)
-ranges from 0-1
-interpreted as amoung of variability in test scores that’s due to true score variability

-do NOT square this, interpret as is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

formula to calculate standard error of measurement

A

SEM = (SD)(square root of 1 - r)

where r = reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to construct CI for 68%, 95% and 99%

A

from the person’s score add/subtract 1 SEM for 68% CI, 2 SEM for 95% CI, and 3 SEM for 99% CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does squaring a correlation coefficient tell you?

A

can only score correlation coefficient when it represents the correlation between two different tests

when squared it provides a measure of shared variability or uses terms like “accounted for by” or “explained by”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does cronbach’s alpha measure?

A

internal consistency reliabilty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the problem with split-half reliability?

A

for split-half reliability you split the test in half and adminster then look at the correlation between the two halves

problem is that shorter tests are less reliable than longer tests, so the reliability coefficent of a split-half test underestimates the full tests true reliability

this is corrected with the spearman-brown prophecy formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is percent agreement?

A

used to assess inter-rater reliability for 2 or more raters, does not take chance agreement into account and can overestimate reliability

cohen’s kappa preferred because it is corrected for change agreement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are factors that affect the reliability coefficient?

A

content homogeneity- leads to larger reliability coefficients

range of scores- reliability coefficients are larger when range of test scores are unrestricted

guessing- easier it is to choose the correct answer the lower the reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is item analysis used for in test construction?

A

to determine which items to include based on difficulty level and ability to discriminate between examinees who obtain high and low scores

17
Q

how is item difficulty determined

A

for dichotomous items it is the % of examinees who answered the item correctly, ranges 0-1, smaller values more difficult

18
Q

What is item response theory?

A

an alternative to classical test theory. CTT is test based, IRT is item based.

overcomes limitations of CTT: better suited for developing computerized adaptive tests

19
Q

What is an item characteristic curve and what does it tell you?

A

tells about the relationship between each item and the latent trait being measured by the test

x-axis = total test scores
y-axis = probability of answer item correctly

location of curve= difficulty parameter, more likely to be answered correctly are on the left side of the graph and less likely to be answered correctly are on the right side of the graph

slope of the curve= discrimination parameter, how well the item can discriminate between individuals which high and low levels of the trait, steeper slope = better discrimination

point at which curve crosses the y-axis = probability of guessing correctly, closer to 0, more difficult to guess

20
Q

What is content validity?

A

items of the test are a clearly representative sample of the domain being tested

21
Q

What is construct validity?

A

important for tests designed to measure a hypothetical trait that cannot be directly observed but is inferred from behavior

includes convergent and divergent (discriminant) validity

convergent- degree to which scores on test have high correlation with scores on other measures designed to assess the same or related construct

divergent- degree to which test scores have low correlations with measures of other unrelated constructs

22
Q

What is multitrait-multimethod matrix used for?

A

provides info about a tests reliability, and convergent and divergent validity

test and 3 other measures are administered: 1) test assessing same trait but with different method, 2) test of unrelated trait using same method, 3) unrelated trait using different method

correlate all pairs of test scores and interpret

23
Q

how do you interpret the correlations from a multitrait-multimethod matrix

A

monotrait-monomethod- this is the reliabilty coefficient or coefficient alpha (correlating test with its self)

monotrait-heteromethod-correlation between the new test and the test that measures the same trait with a different method, when this coefficient is large, provides evidence for convergent validity

heterotrait-monomethod- correlation between new test and the test of a different trait using the same method, small correlation demonstrates divergent validity

heterotrait-heteromethod- correlation between the new test and the test that assesses unrelated trait with a diferent method, small correlation is evidence for divergent validity

24
Q

What is factor analysis used for?

A

to assess a test’s convergent and divergent validity

administer test being developed as well as tests of similar and unrelated traits, correlate all pairs of scores and put in a correlation matrix and derive a factor matrix, rotate the matrix and then name and interpret the factors

matrix has to be rotated to be more easily interpreted

25
Q

how do you interpret the factor loadings of a factor analysis?

A

factor loadings are correlation coeffiecients between each test and each factor identified by the statistical procedure

square each coefficient to determine how much variability in the test is explained by variability in the factor, look at which factor each test loads on. factor loading of .80 for test A on factor I means that .64 of variance in test A is accounted for by Factor 1

communality column= amount of variability in each test that is explained by all the identified factors. calculate this by squaring each of the correlation coefficients and adding so.80 and .1 = .64 + .01 = .65, 65% of variability in test A scores is explained by factors I and II

26
Q

What is criterion-related validity and what are concurrent and predictive validity?

A

criterion-related validity is important for tests that will be used to predict or estimate scores on another measure for example predictor =job knowledge, criterion= measure of job performance

concurrent= scores on bother predictor and criterion obtained around the same time, for use when predictor will be used to predict current status on the criterion

predictive validity= when predictor will be used to predict future performance on the criterion, test to predict future job performance if hired

27
Q

interpreting criterion-related validity coefficient

A

-ranges from -1 to +1
-closer to +/- 1 the more accurate predictor scores are at predicting criterion
-squaring the correlation coefficient tells you amount of variability shared by the two measures

28
Q

What are cross-validation and shrinkage?

A

initial correlation coefficient for predcitor and criterion are likely overestimate of true correlation

When tests are cross-validated (validated for a new sample) the correlation coefficient is likely to be smaller because the same chance factors that were there before are not likely to be present again = shrinkage

shrinkage is greatest when initial sample is small, and for multiple correlation, the number of predictors is large

29
Q

Interpreting standard error of the estimate

A

can use the SEE to calcualte CI around the person’s score (just like CI with SEM). SEE= (sd)(square root of 1-criterion-related validity coefficient squared)

SEE ranges from 0 to the size of the SD

30
Q

What is incremental validity and how do you calculate it?

A

the increase in accuracy of predictions about criterion performance that occurs by adding the new predictor method

conduct a criterion-related validity study to see how many more accurate predictions are made using the new predictor compared to the old way

31
Q

how do you perform a criterion-related validity study?

A

administer new measures along with old measure to make hiring decisions, then 3 months later and set cut off scores for predictor and criterion and see how many employees in each category:

true positives- high scores on predictor and criterion

false positives- high scores on predictor and low scores on criterion

true negatives- low scores on predictor and criterion

false negatives- low score on predictor, high score on criterion

calculate incremental validity by subtracting base rate from hit rate:

positive hit rate (high predictor, high criterion) MINUS base rate(employes with high criterion score by the total number of employees)

32
Q

how does changing the predictor cutoff score in a criterion-related validity study affect the number of trust and false positives and negatives?

A

raising the predictor cut off score will result in fewer ppl being hired and fewer true and false positives and more true and false negatives

lower the cutoff score will result in more people being hired and more true/false positives and less true/false negatives

33
Q

What is diagnostic efficiency?

A

aka diagnostic validity or diagnostic accuracy, ability of a test to directly distinugish between people who do and do not have a disorder

34
Q

What are sensitivity and specificity? Hit rate?

A

sensitivity- proportion of people with disorder who are correctly identified as having the disorder (TP/TP+FN)

specificity- proportion of people without the disorder who are correctly identfied as not having the disorder (TN/TN+FP)

hit rate- overall correct classification rate, proportion of people correctly classified by the test

35
Q

what are positive predictive value and negative predictive value?

A

positive predictive value = probability that a person who tests positive for a disorder actually has the disorder
(TP/TP+FP)

negative predictive value= proabbility that person who tests negative for a disorder does not actually have the disorder
(TN/ TN+FN)

sensitivity and specificty do not vary setting to setting, but positive and negative predictive value depend on the prevalence of the disorder in each setting

36
Q

relationship between reliability and validity

A

a predcitor’s reliability places a celing on its validity: criterion-related valdity coefficient can be no greater than its reliability index (which is the square root of the predictors reliaibility coefficient)

if reliability coefficient = .81, then criterion related validity can be no greater than .90

37
Q

What are the SD equivalents for percentile ranks

A

2 %ile = -2SD
16%ile = -1SD
50%ile= 0 SD
84%ile = 1 SD
98%ile = 2 SD

t-score = mean of 50, SD of 10
z score = mean of zero, sd off 1
full scale iq score= mean of 100, SD of 15
stanines= mean of 5, SD of 2