Test Construction Flashcards

1
Q

Classical Test Theory

A

*Assumes that obtained test scores are due to 1. true score variability and 2. measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reliability Coefficient

A

Range from 0 to 1.0
Amount of variability obtained test scores due to true score variability.

r= .80 (80% of score are due to true score variability and 20% due to measurement error).

.70 or higher is seen as minimally acceptable for most test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Test-retest reliability

A

Provide info on the consistency of scores over time.

Administer test at baseline and then again later on. Correlating the 2 scores.

*Useful for test that are measuring characteristics that are stable over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Alternate Forms Reliability

A

Provide info on the consistency of scores over different forms of the test and when second form is administered at a later time.

*When a measure has more than 1 form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Internal Consistency Reliability

A

Provide info on the consistency of scores over different test items

*useful for test that measure a single content domain

Methods
-Coefficient Alpha (administering test to examinees and averaging the inter-item consistency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inter-Rater Reliability

A

*subjectively test

Provide info on the consistency of scores assigned by different raters.

Methods
-Percent agreement
-Cohen Capa Coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Content Homogeneity

A

*Effect size of reliability coefficient

-Test that’s contents are homogeneous have larger reliability coefficients then those that are heterogeneous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Range of scores

A

*Effect size of reliability coefficient

-reliability coefficients are larger when tests scores are larger/ unrestricted in range.
-unrestricted range occurs when examinees included the sample are heterogeneous with regard to characteristics measured by the test (high, moderate and low)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Guessing

A

*Effect size of reliability coefficient

-likelihood a test answer can be correctly answered by guessing, the lower the reliability coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reliability index

A

*Effect size of reliability coefficient

-theoretical correlation between true test scores and observed test scores.

*When the reliability coefficient is .81 the Reliability Index is the square root of .81 = .90

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Item Analysis

A

Used to determine which items to include in the test.

Used to determine item difficulty level and ability to discriminate between high and low total test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Item Difficulty

A

P = number of participants who answer correctly
- calculated by dividing the number of correct responses by the total number of responses.

value of P ranges from 0 to 1.0

*Moderately difficult items are preferred for most test. (.30 to .70)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Item Discrimination

A

D Ranges from -1.0 to 1.0

Difference between examinees with high total test scores and percentage of examinees with low total test scores who also answered the question correctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard error of measurement & confidence intervals

A

Standard error of measurement is used to obtain a confidence interval.

-calculated by multiplying the test SD times the square root of 1 minus the reliability coefficient

68% confidence interval = add and subtract 1 standard error of measurement to and from the obtained score.

95% confidence interval = you add and subtract 2 standard error of measurement

99% confidence interval = you add and subtract 3 standard error of measurement

(an examinee obtained a score of 90 on a test that has a standard error of measurement of 5 and ask you to identify the 95% confidence interval for this score. To do so, you add and subtract 10 (two standard errors) to and from 90, which gives you a 95% confidence interval of 80 to 100. )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Item Response Theory

A

Item based
Focuses on responses to individual test items.

*determine the probability of answering a test item correctly

*better suited for developing computerized tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Content Validity

A

*Test that are designed to measure more than 1 behavior domains.

*Is it measuring all aspects of the domain.

17
Q

Construct Validity

A

*Test designed to measure a hypothetical trait (motivation, extraversion, ect).

*obtaining convergent and divergent validity.

18
Q

Convergent Validity

A

Degree to which scores on the test have high correlations with scores on other similar measures.

19
Q

Divergent Validity

A

Degree to which scores on the test have low correlation with scores on measures of unrelated constructs.

20
Q

Multitrait-Multimethod Matrix

A

a table of correlation coefficients that provide information about a test’s reliability and convergent and divergent validity.

21
Q

Concurrent Validity

A

*used to develop criterion validity

Obtaining scores on the predictor and criterion at the same time.

22
Q

Predictive Validity

A

Obtaining scores on the predictor before obtaining scores on the criterion.

23
Q

Criterion-Related Validity Coefficient

A

Ranges from -1 to +1
(closer to 1 the more accurate predictor scores will be at predicting criterion scores)

-Can be squared to determine the variability in one measure that is explained by or shared with the other measure.

24
Q

Sensitivity

A

the proportion of people with the disorder who are identified by the test as having the disorder. It’s calculated by dividing the true positives by the true positives plus the false negatives (TP/TP + FN).

25
Q

Specificity

A

the proportion of people without the disorder who are identified by the test as not having the disorder. It’s calculated by dividing the true negatives by the true negatives plus the false positives (TN/TN + FP).

26
Q

Hit Rate

A

the proportion of people who are correctly categorized by the test. It’s calculated by dividing the true positives and true negatives by the sample size (TP + TN)/(TP + TN + FP +FN)

27
Q

Z score

A

distribution has a mean of 0 and standard deviation of 1.0

(Subtracting raw score from the mean score and dividing by the SD)

M (raw score) = 110
X (mean score) = 100
SD = 5

110-100/5 = 2.0 (z score)

28
Q

T score

A

mean of 50 and standard deviation of 10. When an examinee’s raw score is equivalent to a T-score of 40, this means that his/her raw score is one standard deviation below the mean.

29
Q

Stanines

A

mean of 5 and standard deviation of 2 and range from 1 to 9.

30
Q

Percentile rank (PR) equivalents in a normal distribution

A

Percentile Rank of 2 = -2 SD
Percentile Rank of 16 = -1 SD
Percentile Rank of 84 = 1 SD
Percentile Rank of 98 = 2 SD