CH 5, 6 Flashcards
A criterion-referenced achievement test would be least useful for
(A) planning classroom instruction.
(B) analyzing individual achievement.
(C) comparing individuals’ performance in a group.
(D) determining minimum competency in a content area.
(A) planning classroom instruction.
*A criterion-referenced achievement test is used to determine if the students mastered the subject matter, but not for comparing individuals’ performance in a group.
What measurement scale is ethnicity?
(A) Nominal scale
(B) Ordinal scale
(C) Interval scale
(D) Ratio scale
(A) Nominal scale
*Ethnicity categorizes individuals into mutually exclusive groups. Thus, it is a nominal scale.
What measurement scale is household income in the unit of U.S. dollars?
(A) Nominal scale
(B) Ordinal scale
(C) Interval scale
(D) Ratio scale
(D) Ratio scale
*Household income is a ratio scale variable because it has equal intervals and it has an absolute zero.
Scholastic aptitude tests are useful in schools because they
(A) can be used to predict achievement.
(B) provide a measure of academic achievement.
(C) provide a measure of ability uninfluenced by academic experience.
(D) are not influenced by subject’s motivation, home background, and so on.
(A) can be used to predict achievement.
*Scholastic aptitude tests measure the potential for learning a body of knowledge and can be used to predict future achievement.
Standardized and researcher-made tests share some of the same characteristics. Which of the following is not usually characteristic of a researcher-made test?
(A) the minimal influence of random errors of measurement
(B) the use of objective-type items
(C) the availability of norms for comparison
(D) the availability of raw scores which can be converted to percentile rank
(C) the availability of norms for comparison
*Norms for comparison are available for standardized tests, but not for researcher-made tests.
The ratings that three teachers made of the leadership ability of a particular high school senior agreed closely. This agreement among raters is referred to as interrater
(A) validity.
(B) reliability.
(C) objectivity.
(D) convergence.
(B) reliability.
*The close agreement or high correlation between raters is called interrater reliability.
A teacher reports having a kindergarten child who is withdrawn and does not interact with other students. What measurement tool would a researcher use to get a better grasp of the problem before suggesting behavior modification therapy?
(A) attitude scale
(B) direct observation
(C) personality inventory
(D) semantic differential
(B) direct observation
*Direct observation is the best measurement if we want to measure the degree to which the child interacts with people around him or her.
Which one of the following statements would be most suitable for a Likert-type scale measuring students’ attitudes toward math?
(A) Math is easy for some students and difficult for others.
(B) Math is fun.
(C) Math is one of the basic skills.
(D) Some students like math.
(B) Math is fun.
*For measuring students’ attitudes towards math, the most suitable item among the four would be “ Math is fun”.
Predictive validity evidence ________.
(A) is the relationship between scores on a measure and criterion scores available at a future time.
(B) is evidence-based on internal structure
(C) is evidence-based on relationship to other variables
(D) is the relationship between two scores that measure the construct at the same time.
(A) is the relationship between scores on a measure and criterion scores available at a future time.
*Tests in Print (and Mental Measurements Yearbook) provide feedback to the user on the types of measurement instruments available for a given construct.
The standard error of measurement is based on the test’s
(A) validity.
(B) difficulty.
(C) reliability.
(D) discriminability.
(C) reliability.
*SSCI allows you to take an important piece of work in your !eld and look at who has referenced that work in later studies.
A test has a reliability coefficient of 0.84. What percent of test variance is error?
(A) 4%
(B) 16%
(C) 32%
(D) 84%
(B) 16%
*The percent of error variance equals 1 minus reliability.
Adding 10 items similar to those already in the test would
(A) raise reliability.
(B) lower reliability.
(C) neither raise nor lower reliability.
(D) cannot be determined.
(A) raise reliability.
*Usually a longer test with good items yields higher reliability.
Reliability is defined as the ratio between the variance of
(A) the true scores and observed scores.
(B) error scores and observed scores.
(C) two sets of scores from identical or equivalent tests.
(D) error scores and true scores.
(A) the true scores and observed scores.
*Reliability is defined as the ratio of the variance o true scores over observed scores.
The score validity of a test is related highly to the
(A) test format.
(B) number of items.
(C) purpose of the test.
(D) availability of equivalent forms.
(C) purpose of the test.
*Validity is highly related to what the test intends to measure, that is, the purpose of the test.
Which of the following would contribute the best evidence for the score validity of a new group intelligence test?
(A) The correlation between Form A and Form B of the test.
(B) The correlation between test scores and grades in reading.
(C) The correlation between test scores and scores from the Stanford-Binet intelligence test.
(D) An examination of the homogeneity of scores on the test.
(C) The correlation between test scores and scores from the Stanford-Binet intelligence test.
*The validity of an intelligence test can be established by the high correlation between the new test and an established IQ test. This is also referred to as criterion-related validity evidence.