Test Construction - Domain Quiz Flashcards
Content appropriateness, taxonomic level, and extraneous abilities are factors that are considered when evaluating:
Select one:
a. a test’s factorial validity.
b. a test’s incremental validity.
c. the relevance of test items.
d. the adequacy of the “actual criterion.”
In the context of test construction, relevance refers to the extent to which test items contribute to achieving the goals of testing.
Answer C is correct: Content appropriateness, taxonomic level, and extraneous abilities are three factors that may be considered when determining the relevance of test items.
Answer A is incorrect: Factorial validity refers to the extent to which a test has high correlations with factors it is expected to correlate with and low correlations with factors it is not expected to correlate with.
Answer B is incorrect: Incremental validity refers to the degree to which a test improves decision-making accuracy.
Answer D is incorrect: The actual criterion refers to the actual (versus ultimate) measure of performance.
The correct answer is: the relevance of test items.
For an achievement test item that has an item discrimination index (D) of +1.0, you would expect:
Select one:
a. high achievers to be more likely than low achievers to answer the item correctly.
b. low achievers to be more likely than high achievers to answer the item correctly.
c. moderate achievers to be more likely than high and low achievers to answer the item correctly.
d. low and high achievers to be equally likely to answer the item correctly.
The item discrimination index is calculated by subtracting the percent of examinees in the lower-scoring group who answered the item correctly from the percent of examinees in the upper-scoring group who answered the item correctly and ranges in value from -1.0 to +1.0.
Answer A is correct: When all examinees in the upper-scoring group and none in the lower-scoring group answered the item correctly, D is equal to +1.0.
The correct answer is: high achievers to be more likely than low achievers to answer the item correctly.
The item difficulty index (p) ranges in value from:
Select one:
a. -1.0 to +1.0.
b. -.50 to +.50.
c. 0 to +1.0.
d. 0 to 50.
The item difficulty index (p) indicates the proportion of examinees in the tryout sample who answered the item correctly.
Answer C is correct: The item difficulty index ranges in value from 0 to +1.0, with 0 indicating that none of the examinees answered the item correctly and +1.0 indicating that all examinees answered the item correctly.
The correct answer is: 0 to +1.0.
The optimal item difficulty index (p) for items included in a true or false test is:
Select one:
a. .25.
b. .50.
c. .75.
d. 1
One factor that affects the optimal difficulty level of an item is the likelihood that an examinee can choose the correct answer by guessing, with the preferred level being halfway between 100% and the level of success expected by chance alone.
Answer C is correct: For true or false items, the probability of obtaining a correct answer by chance alone is .50. Therefore, the optimal difficulty level for true or false items is .75, which is halfway between 1.0 and .50.
The correct answer is: .75.
The slope (steepness) of an item characteristic curve indicate's the item's: Select one:
a. difficulty level.
b. discrimination.
c. reliability.
d. validity.
The various item response theory models provide item characteristic curves that provide information on one, two, or three parameters – i.e., difficulty level, discrimination, and probability of guessing correctly. Additional information on the item characteristic curve is provided in the Test Construction chapter of the written study materials.
Answer B is correct: An item’s ability to discriminate between high and low achievers is indicated by the slope of the item characteristic curve – the steeper the slope, the greater the discrimination.
The correct answer is: discrimination.
According to classical test theory, total variability in obtained test scores is composed of:
Select one:
a. true score variability plus random error
b. true score variability plus systematic error
c. a combination of communality and specificity
d. a combination of specificity and error
Answer A is correct: As defined by classical test theory, total variability in test scores is due to a combination of true score variability plus measurement (random) error - i.e., X = T + E.
The correct answer is: true score variability plus random error
A problem with using percent agreement as a measure of inter-rater reliability is that it doesn’t take into account the effects of:
Select one:
a. sample heterogeneity.
b. test length.
c. chance agreement among raters.
d. inter-item inconsistency.
Inter-rater reliability can be assessed using percent agreement or by calculating the kappa statistic.
Answer C is correct: A disadvantage of percent agreement is that it doesn’t take into account the amount of agreement that could have occurred among raters by chance alone, which can provide an inflated estimate of the measure’s reliability. The kappa statistic is more accurate because it adjusts the reliability coefficient for the effects of chance agreement.
The correct answer is: chance agreement among raters.
A researcher correlates scores on two alternate forms of an achievement test and obtains a reliability coefficient of .80. This means that ___% of observed test score variability reflects true score variability.
Select one:
a. 80
b. 64
c. 36
d. 20
Answer A is correct: A reliability coefficient is interpreted directly as a measure of true score variability.
The correct answer is: 80
A test has a standard deviation of 12, a mean of 60, a reliability coefficient of .91, and a validity coefficient of .60. The test’s standard error of measurement is equal to:
Select one:
a. 12
b. 9.6.
c. 3.6.
d. 2.8.
To calculate the standard error of measurement, you need to know the standard deviation of the test scores and the test’s reliability coefficient.
Answer C is correct: The standard deviation of the test scores is 12 and the reliability coefficient is .91. To calculate the standard error, you multiply the standard deviation times the square root of one minus the reliability coefficient: 1 minus .91 is .09; the square root of .09 is .3; .3 times 12 is 3.6. Additional information about the calculation and use of the standard error of measurement is provided in the Test Construction chapter of the written study materials.
The correct answer is: 3.6.
Consensual observer drift tends to:
Select one:
a. increase the probability of answering a test item correctly by chance alone.
b. decrease the probability of answering a test item correctly by chance alone.
c. produce an overestimate of a test’s inter-rater reliability.
d. produce an underestimate of a test’s inter-rater reliability.
Consensual observer drift occurs when two or more observers working together influence each other’s ratings on a behavioral rating scale so that they assign ratings in a similar idiosyncratic way.
Answer C is correct: Consensual observer drift makes the ratings of different raters more similar, which artificially increases inter-rater reliability.
The correct answer is: produce an overestimate of a test’s inter-rater reliability.
For a newly developed test of cognitive flexibility, coefficient alpha is .55. Which of the following would be useful for increasing the size of this coefficient?
Select one:
a. adding more items that are similar in terms of content and quality
b. adding more items that are similar in terms of quality but different in terms of content
c. reducing the heterogenity of the tryout sample
d. using a true or false format for the items rather than a multiple-choice format
For the exam, you want to be familiar with the methods for increasing reliability that are described in the Test Construction chapter of the written study materials.
Answer A is correct: A test’s reliability is increased when the test is lengthened by adding items of similar content and quality, the range of scores is unrestricted (i.e., the tryout sample heterogeneity is maximized), and the ability to choose the correct answer by guessing is reduced.
The correct answer is: adding more items that are similar in terms of content and quality
Sally Student receives a score of 450 on a college aptitude test that has a mean of 500 and standard error of measurement of 50. The 68% confidence interval for Sally’s score is:
Select one:
a. 400 to 450.
b. 400 to 500.
c. 450 to 550.
d. 350 to 550.
The standard error of measurement is used to construct a confidence interval around an obtained test score.
Answer B is correct: To construct the 68% confidence interval, one standard error of measurement is added to and subtracted from the obtained score. Since Sally obtained a score of 450 on the test, the 68% confidence interval for her score is 400 to 500. Additional information on constructing confidence intervals is provided in the Test Construction chapter of the written study materials.
The correct answer is: 400 to 500.
The kappa statistic for a test is .95. This means that the test has:
Select one:
a. adequate inter-rater reliability.
b. adequate internal consistency reliability.
c. inadequate intra-rater reliability.
d. inadequate alternate forms reliability.
The kappa statistic (coefficient) is a measure of inter-rater reliability.
Answer A is correct: The reliability coefficient ranges in value from 0 to +1.0. Therefore a kappa statistic of .95 indicates a high degree of inter-rater reliability.
The correct answer is: adequate inter-rater reliability.
To assess the internal consistency reliability of a test that contains 50 items that are each scored as either “correct” or “incorrect,” you would use which of the following?
Select one:
a. KR-20
b. Spearman-Brown
c. kappa statistic
d. coefficient of concordance
For the exam, you want to be familiar with all of the measures listed in the answers to this question.
Answer A is correct: The Kuder-Richardson Formula 20 (KR-20) is a measure of internal consistency reliability that can be used when test items are scored dichotomously (correct or incorrect).
Answer B is incorrect: The Spearman-Brown formula is used to estimate the effects of lengthening or shortening a test on its reliability.
Answer C is incorrect: The kappa statistic (also known as the kappa coefficient) is a measure of inter-rater reliability.
Answer D is incorrect: The coefficient of concordance is another measure of inter-rater reliability.
The correct answer is: KR-20
To determine a test’s internal consistency reliability by calculating coefficient alpha, you would:
Select one:
a. administer the test to a single sample of examinees two times.
b. administer two alternate forms of the test to a single sample of examinees.
c. administer the test to a single sample of examinees and have the tests scored by two raters.
d. administer the test to a single sample of examinees one time.
Knowing that coefficient alpha is a measure of internal consistency reliability would have helped you identify the correct answer to this question.
Answer D is correct: Determining internal consistency reliability with coefficient alpha involves administering the test once to a single sample of examinees and using the formula to determine the degree of inter-item consistency.
Answer A is incorrect: Administering the same test to a single sample of examinees on two occasions would be the procedure for assessing test-retest reliability.
Answer B is incorrect: Administering two alternate forms of the test to a single sample of examinees is the procedure for assessing alternate (equivalent) forms reliability.
Answer C is incorrect: Having a test that was administered to a single sample of examinees scored by two raters is the procedure for assessing inter-rater reliability.
The correct answer is: administer the test to a single sample of examinees one time.
To estimate the effects of lengthening a 50-item test to 100 items on the test’s reliability, you would use which of the following?
Select one:
a. eta
b. KR-20
c. kappa coefficient
d. Spearman-Brown formula
For the exam, you want to be familiar with the measures listed in the answers to this questions. These are described in the Test Construction chapter of the written study materials.
Answer D is correct: The Spearman-Brown prophecy formula is used to estimate the effects of lengthening or shortening a test on its reliability coefficient.
The correct answer is: Spearman-Brown formula
Which of the following methods for evaluating reliability is most appropriate for speed tests?
Select one:
a. split-half
b. coefficient alpha
c. kappa statistic
d. coefficient of equivalence
Answer D is correct: Of the methods for evaluating reliability, the coefficient of equivalence (also known as alternative or equivalent forms reliability) is most appropriate for speed tests. Split-half reliability and coefficient alpha are types of internal consistency reliability, and measures of internal consistency reliability overestimate the reliability of speed tests. The kappa statistic is a measure of inter-rater reliability.
The correct answer is: coefficient of equivalence
You administer a test to a group of examinees on April 1st and then re-administer the same test to the same group of examinees on May 1st. When you correlate the two sets of scores, you will have obtained a coefficient of:
Select one:
a. internal consistency.
b. determination.
c. equivalence.
d. stability.
Correlating two sets of scores obtained by the same group of examinees produces a test-retest reliability coefficient.
Answer D is correct: Test-retest reliability indicates the stability of scores over time, and the test-retest reliabiity coefficient is also known as the coefficient of stability.
The correct answer is: stability.
A test developer uses a sample of 50 current employees to identify items for and then validate a new selection test (predictor). When she correlates scores on the test with scores on a measure of job performance (criterion) for this sample, she obtains a criterion-related validity coefficient of .63. When the test developer administers the test and the measure of job performance to a new sample of 50 employees, she will most likely obtain a validity coefficient that is:
Select one:
a. greater than .63.
b. less than .63.
c. about .63.
d. negative in value.
This question is asking about “shrinkage,” which occurs when a test is cross-validated on another sample.
Answer B is correct: The validity coefficient tends to “shrink” (be smaller) on the second sample because the test was tailor-made for the initial sample and the chance factors that contributed to the validity coefficient in the initial sample will not all be present in the second sample.
The correct answer is: less than .63.
A test’s content validity is established primarily by which of the following?
Select one:
a. conducting a factor analysis
b. assessing the test’s convergent and divergent validity
c. having subject matter experts systematically review the test’s items
d. testing hypotheses about the attribute(s) measured by the test
For the exam, you want to be familiar with the differences between content, construct, and criterion-related validity.
Answer C is correct: Content validity refers to the degree to which test items are an adequate sample of the content domain and is determined primarily by the judgment of subject matter experts. The methods listed in the other answers are used to establish a test’s construct validity.
The correct answer is: having subject matter experts systematically review the test’s items
A test’s specificity refers to the number of __________ that were identified by the test.
Select one:
a. true positives
b. false positives
c. true negatives
d. false negatives
For the exam, you want to know the difference between specificity and sensitivity, which are terms that are used to describe a test’s accuracy.
Answer C is correct: Specificity refers to the identification of true negatives (percent of cases in the validation sample who do not have the disorder and were accurately classified by the test as not having the disorder). Additional information on sensitivity and specificity is provided in the Test Construction chapter of the written study materials.
Answer A is incorrect: Sensitivity refers to the number of true positives.
The correct answer is: true negatives
In a multitrait-multimethod matrix, a test’s construct validity would be confirmed when:
Select one:
a. monotrait-monomethod coefficients are low and heterotrait-heteromethod coefficients are high.
b. monotrait-heteromethod coefficients are high and heterotrait-monomethod coefficients are low.
c. monotrait-monomethod coefficients are high and monotrait-heteromethod coefficients are low.
d. heterotrait-monomethod coefficients and heterotrait-heteromethod coefficients are low.
This question is asking about the pattern of correlation coefficients in a multitrait-multimethod matrix that provide evidence of a test’s construct validity.
Answer B is correct: When monotrait-heteromethod (same trait-different methods) coefficients are large, this provides evidence of the test’s convergent validity – i.e., it shows that the test is measuring the trait it was designed to measure. Conversely, when heterotrait-monomethod (different traits-same method) coefficients are small, this provides evidence of the test’s discriminant validity – i.e., it shows that the test is not measuring a different trait. Additional information on the correlation coefficients contained in a multitrait-multimethod matrix is provided in the Test Construction chapter of the written study materials.
The correct answer is: monotrait-heteromethod coefficients are high and heterotrait-monomethod coefficients are low.
In a scatterplot constructed from data collected in a concurrent validity study, the number of “false negatives” is likely to increase if:
Select one:
a. the predictor and criterion cutoff scores are both raised.
b. the predictor and criterion cutoff scores are both lowered.
c. the predictor cutoff score is raised and or or the criterion cutoff score is lowered.
d. the predictor cutoff score is lowered and or or the criterion cutoff score is raised.
An illustration is provided in the Test Construction materials that can help you visualize what happens when the predictor and or or criterion cutoff scores are changed.
Answer C is correct: The number of false negatives increases as the predictor cutoff score is raised (moved to the right in a scatterplot) and when the criterion cutoff score is lowered (moved toward the bottom of the scatterplot).
The correct answer is: the predictor cutoff score is raised and or or the criterion cutoff score is lowered.
____________ refers to the percent of examinees who have the condition being assessed by a predictor who are identified by the predictor as having the condition.
Select one:
a. Specificity
b. Sensitivity
c. Positive predictive value
d. Negative predictive value
Answer B is correct: Sensitivity refers to the probability that a predictor will correctly identify people with the disorder from the pool of people with the disorder. It is calculated using following formula: true positives or (true positives + false negatives).
The correct answer is: Sensitivity
The results of a factor analysis indicate that Test A has a factor loading of .70 for Factor I and a factor loading of .20 for Factor II. Assuming that only two factors were extracted and that the factors are orthogonal, you can conclude that the communality for Test A scores is:
Select one:
a. 90%.
b. 53%.
c. 49%.
d. 4%.
Factor loadings are interpreted like correlation coefficients between two or more variables and are squared to obtain a measure of shared variability. When the factors are orthogonal (uncorrelated), the squared factor loadings can be added to obtain the communality.
Answer B is correct: The factor loading for Factor I is .70 and the factor loading for Factor II is .20: .70 squared is 49% and .20 squared is 4%, so the communality is 49% plus 4%, which equals 53%. This means that the total amount of variability in Test A scores explained by the factor analysis is 53%.
The correct answer is: 53%.
The standard error of estimate is used to:
Select one:
a. estimate the difference between an examinee’s obtained test score and his or her true test score.
b. estimate the difference between an examinee’s predicted criterion score and his or her true criterion score.
c. determine the maximum a predictor’s validity coefficient can be given the reliabilities of the predictor and the criterion.
d. predict the probability that an examinee will obtain a particular score on one or more predictors.
For the exam, you want to be sure you know the difference between the standard error of measurement and the standard error of estimate.
Answer B is correct: The standard error of estimate is used to estimate the range within which an examinee’s true criterion score is likely to fall given his or her predicted score on the criterion.
The correct answer is: estimate the difference between an examinee’s predicted criterion score and his or her true criterion score.
To ascertain if the test you have developed is valid as a screening test for determining whether a person has an anxiety or affective disorder, you would be most interested in evaluating the test’s:
Select one:
a. content validity.
b. external validity.
c. concurrent validity.
d. differential validity.
This situation is analogous to using a predictor to estimate current performance on a criterion. The predictor, in this case, is the screening test, while the criterion is the accuracy of the diagnosis.
Answer C is correct: Concurrent validity is a type of criterion-related validity. It is used to establish validity when the purpose of the test is to estimate current status on a criterion. In this case, the criterion would be some method of diagnosis that is known to be accurate.
d. Incorrect A test has differential validity when it has different validity coefficients for different groups. Differential validity is not relevant to this situation.
Answer A is incorrect: Content validity would be of interest when a test is designed to be a sample of a particular content domain.
Answer B is incorrect: External validity refers to the generalizability of research results and does not apply to this situation.
Answer D is incorrect: A test has differential validity when it has different validity coefficients for different groups. Differential validity is not relevant to this situation.
The correct answer is: concurrent validity.
To evaluate the concurrent validity of a new selection test for computer programmers, you would:
Select one:
a. use factor analysis to determine if the test measures the abilities it was designed to measure.
b. have subject matter experts review test items to ensure they are relevant to success as a computer programmer.
c. administer the test to current computer programmers and correlate their test scores with recently assigned job performance ratings
d. administer the test to applicants for computer programmer jobs, hire all applicants regardless of their scores on the test, and correlate their test scores with job performance ratings they receive six months later
Answer C is correct: Concurrent validity is a type of criterion-related validity and involves correlating scores on the predictor and criterion when both measures have been administered to examinees at about the same time.
The correct answer is: administer the test to current computer programmers and correlate their test scores with recently assigned job performance ratings
Validity is best described as:
Select one:
a. consistency.
b. accuracy.
c. distinctiveness.
d. stability.
A test is valid to the degree that it measures what it was designed to measure.
Answer B is correct: When a test is valid, it accurately measures the attribute(s) it was designed to measure.
Answer A is incorrect: Reliability is a measure of consistency.
The correct answer is: accuracy.
When conducting a factor analysis, you would choose an oblique rotation of the factors if:
Select one:
a. you are assessing the construct validity of a test designed to measure a single trait.
b. you believe that each test included in the analysis measures a different construct.
c. you believe the constructs measured by the tests included in the analysis are correlated.
d. you want to determine if a test has an adequate level of incremental validity.
For the exam, you want to know what “orthogonal” and “oblique” mean in the context of factor analysis.
Answer C is correct: In factor analysis, orthogonal means uncorrelated, while oblique means correlated. Therefore, you would conduct an oblique rotation if you believe the test you are validating measures constructs that correlate with the constructs measured by the other tests you’ve included in the analysis.
The correct answer is: you believe the constructs measured by the tests included in the analysis are correlated.
When determining a predictor’s incremental validity, the positive hit rate is calculated by:
Select one:
a. dividing the number of true positives by the total number of positives.
b. dividing the total number of positives by the number of people in the sample.
c. dividing the base rate by the number of true positives.
d. dividing the total number of positives by the base rate.
The positive hit rate is the proportion of people who would have been selected on the basis of their predictor scores and who are successful on the criterion.
Answer A is correct: The positive hit rate is calculated by dividing the number of true positives by the total number of positives. The result indicates the percent of positives who were actually successful on the criterion.
The correct answer is: dividing the number of true positives by the total number of positives.
Which of the following best defines the relationship between a predictor’s reliability coefficient and its criterion-related validity coefficient?
Select one:
a. A test’s validity coefficient cannot exceed its reliability coefficient.
b. A test’s validity coefficient cannot exceed the square root of its reliability coefficient.
c. A test’s validity coefficient cannot exceed the square of its reliability coefficient.
d. A test’s reliability coefficient cannot exceed its validity coefficient.
For the exam you want to know that reliability places a ceiling on validity and be familiar with the formula for the relationship between reliability and validity so that you can answer questions like this one on the exam.
Answer B is correct: This answer describes the formula that defines the relationship between reliability and validity – i.e., a test’s validity coefficient cannot be greater than the square root of its reliability coefficient.
The correct answer is: A test’s validity coefficient cannot exceed the square root of its reliability coefficient.
Your newly developed measure of integrity correlates highly with a well-known and widely used measure of integrity. This correlation provides evidence of your measure’s ________ validity.
Select one:
a. incremental
b. internal
c. discriminant
d. convergent
In this situation, one measure of a specific construct correlates highly with another measure of the same construct.
Answer D is correct: A high correlation between a new and an established measure of the same construct provides evidence of the new measure’s convergent validity (which, in turn, provides evidence of its construct validity).
Answer A is incorrect: Incremental validity is a measure of decision-making accuracy and is associated with criterion-related validity.
Answer B is incorrect: Internal validity is one of the standards used to evaluate research designs and is not relevant to the situation described in this question.
Answer C is incorrect: Discriminant validity (also known as divergent validity) refers to the extent to which a test does not correlate with measures of different constructs.
For additional information on convergent and discriminant validity, see the section on construct validity in the Test Construction chapter of the written study materials.
The correct answer is: convergent
Assuming a normal distribution, which of the following represents the highest score?
Select one:
a. a Z score of 1.5
b. a T score of 70
c. a WAIS Full Scale IQ score of 120
d. a percentile rank of 88
For the exam, you want to be familiar with the relationship of z scores, T scores, WAIS IQ scores, and percentile ranks in a normal distribution so that you can answer questions like this one.
Answer B is correct: A T score of 70 is two standard deviations above the mean.
Answer A is incorrect: A Z score of 1.5 is one and one-half standard deviations above the mean.
Answer C is incorrect: A WAIS IQ score of 120 is slightly over one standard deviation above the mean.
Answer D is incorrect: A percentile rank of 88 is slightly over one standard deviation above the mean.
The correct answer is: a T score of 70
Dina received a percentile rank of 48 on a test, while her twin brother, Dino, received a percentile rank of 98. Their teacher realizes she made an error in scoring their tests and adds four points to Dina’s and Dino’s raw scores. (The other students’ tests were scored correctly.) When she recalculates Dina’s and Dino’s percentile ranks, the teacher will find that:
Select one:
a. Dina’s percentile rank will change by more points than Dino’s.
b. Dino’s percentile rank will change by more points than Dina’s.
c. Dina’s and Dino’s percentile ranks will change by the same number of points.
d. Dina and Dino’s percentile ranks will not change.
As described in the Test Construction chapter of the written study materials, percentile ranks maximize differences in the middle of the raw score distribution and minimize differences at the extremes.
Answer A is correct: This general rule means that Dina’s percentile rank (which is near the middle of the distribution) will be affected more by the four-point addition to her raw score than Dino’s percentile rank (which is extremely high)
The correct answer is: Dina’s percentile rank will change by more points than Dino’s.
Eigenvalues are associated with:
Select one:
a. internal consistency reliability.
b. criterion-referenced interpretation.
c. the multitrait-multimethod matrix.
d. principal components analysis.
An eigenvalue indicates the total amount of variability in a set of tests or other variables that is explained by an identified component or factor.
Answer D is correct: Eigenvalues can be calculated for each component “extracted” in a principal component analysis. Additional information on eigenvalues and principal component analysis is provided in the Test Construction chapter of the written study materials.
The correct answer is: principal components analysis.
Stanford-Binet and Wechsler IQ scores are:
Select one:
a. percentile ranks.
b. ipsative scores.
c. standard scores.
d. stanine scores.
Standard scores report an examinee’s performance in terms of standard deviations from the mean.
Answer C is correct: Stanford-Binet and Wechsler IQ scores are standard scores that indicate an examinee’s performance in terms of standard deviations from the mean obtained by examinees in the norm group.
The correct answer is: standard scores.
Which of the following scores is NOT a norm-referenced score?
Select one:
a. percentile rank
b. T-score
c. pass or fail
d. grade-equivalent scores
When using norm-referenced interpretation, an examinee’s score indicates how well he or she did on the test relative to examinees in the norm group.
Answer C is correct: Pass or fail is a criterion-referenced score. It indicates whether a person has or has not mastered the test content and does not measure performance in terms of other examinees. A “pass” score obtained by one examinee does not indicate how many other examinees passed or failed.d. Incorrect A grade-equivalent score is a norm-referenced score. It allows a test user to compare an examinee’s test performance to that of students in different grade levels.
Answer A is incorrect: Percentile ranks are norm-referenced scores. A percentile rank indicates the percent of examinees in the norm group who obtained a lower score.
Answer B is incorrect: A T-score is a type of standard score, and standard scores are norm-referenced scores that indicate how well an examinee did in terms of standard deviation units from the mean score of the norm group.
Answer D is incorrect: A grade-equivalent score is a norm-referenced score. It allows a test user to compare an examinee’s test performance to that of students in different grade levels.
The correct answer is: pass or fail
Zelda Z. obtains a score of 41 on a test that has a mean of 50 and a standard deviation of 6. If all of the scores in the distribution are transformed so that the test now has a mean of 100 and a standard deviation of 12, Zelda’s score in the new distribution would be:
Select one:
a. 91
b. 82
c. 41
d. 20.5.
To identify the correct answer to this question, you have to recognize that Zelda’s original score was 1-1/2 standard deviations below the mean.
Answer B is correct: A score of 82 is 1-1/2 standard deviations below the mean of the new distribution and, therefore, equivalent to a score of 41 in the original distribution.
The correct answer is: 82
A test developer is concerned that her newly developed test of academic achievement has limited floor. Therefore, she should be best advised to increase the proportion of items in the test that have an item difficulty index (p) of:
a. .80 to .95
b. .15 to .30
c. 0.
d. -.95 to -1.0
Correct answer: .80 to .95
a. Easy items.
b. Hard items.
c. No one answered correctly.
An item discrimination index (D) of ____ indicates that the item was answered correctly by more low achieving examinees than high-achieving examinees.
a. 10
b. +.50
c. 0
d. -.50
b. answered correctly more by high achieving than by low achieving
c. answered equally by both high and low achieving (doesn’t discriminate between the two groups
d. was answered more by low achieving than by high achieving
Correct answer: D
A kappa coefficient of .94 indicates:
a. a low level of alternate forms reliability
b. a low level of item discrimination
c. an acceptable level of internal consistency reliability
d. an acceptable level of inter-rater reliability
Correct Answer: D
It indicates a HIGH LEVEL of inter-rater reliability.
A test designed to measure knowledge of test construction is likely to have the (LOWEST) reliability coefficient when the test consists of ____ items and the tryout sample consists of examinees who are ______ with regard to their knowledge of test construction.
a. 40; heterogenous
b. 40; homogenous
c. 80; heterogenous
d. 80; homogenous
Correct Answer: B
b. shorter test and homogenous examinees will produce lowest Reliability Coefficient
When a test’s reliability coefficient is equal to 0, the standard error of measurement for the test:
a. is equal to the test’s mean.
b. is equal to 1.
c. is equal to the test’s standard deviation.
d. cannot be determined.
Correct Answer: C
MEMORISE FORMULA
Ex. What is the SEM when the standard error of measurement is equal to 1?
Answer: C
In a multitrait-multimethod matrix, a large monotrait heteromethod coefficient provides evidence that the test being validated has:
a. adequate divergent validity
b. adequate convergent validity
c. inadequate divergent validity
d. inadequate convergent validity
Correct answer: B
b. adequate convergent validity
Same Trait / Different Methods
You want this coefficient to be large.
In factor analysis, a communality indicated the proportion of variance accounted for in:
a. a single variable by a single factor
b. multiple variables by a single factor
c. a single variable by all of the identified factors.
d. multiple variables by common error.
Correct answer: C
An educational psychologist designs a screen test to identify underachieving first and second-grade students who may have a learning disability. The psychologist will be most concerned that her test has adequate _____ validity.
a. content
b. construct
c. concurrent
d. predictive
Correct answer: C (concurrent)
If they were asking about learning disability in the future, the correct answer would have been “predictive.”
A personnel director uses an assertiveness test to hire salespeople. However, several of the people who are hired based on their test results turn out to be less than adequate performers. These individuals are:
a. false positives
b. false negatives
c. true positives
d. true negatives
Correct answers: A
b. individuals who are predicted to do well on the predictor but do poorly on the criterion