Assessment Flashcards
If the mean of a set of test scores is 50 and the standard deviation is 5, which of the following has the highest magnitude (assuming that the scores are normally distributed)? A. A raw score of 60 B. A stanine score of 8 C. A z-score of +3.0 D. A T-score of 70
C. A z-score of +3.0
Explain z-scores.
A z-score directly indicates how many standard deviation units a score falls above or below the mean. For example, a z-score of +1.0 means that the score is one standard deviation above the mean and a z-score of -1.0 indicates that the score is one standard deviation below the mean.
Explain stanines.
Stanines have a mean of 5 and a standard deviation of approximately 2. Thus, a stanine score of 8 is 1.5 standard deviations above the mean.
Explain T-scores.
T-scores have a mean of 50 and a standard deviation of 10. Thus, a T-score of 70 is 2 standard deviations above the mean.
What has a mean of 5 and a standard deviation of approximately 2?
Stanines
What has a mean of 50 and a standard deviation of 10?
T-score
Aptitude tests are to achievement tests as \_\_\_\_\_ is to ability to perform a task. A. General ability B. Ability to learn a task C. Previous learning D. Any of the above
B. Ability to learn a task
What does an aptitude test measure?
An individual’s potential capacity for future learning
What does an achievement test measure?
What an individual has already learned (i.e., her/his developed capacity)
A person got a score of 85 on a norm-referenced test. This means that the person:
A. Insufficient information
B. Mastered 85% of the material covered in the test
C. Achieved a score of better than 83% of those taking the test
D. Answered 83 questions correctly
A. Insufficient information
Person A and Person B both took the same test. Person A got a score of 100 while Person B got a score of 75. In order for a counselor to determine whether the difference between their scores was because of "chance," the counselor would need to know which of the following characteristics of the test: A. Standard error of measurement B. Standard error of the mean C. Mean D. Standard deviation
A. Standard error of measurement
If internal consistency is of concern, what reliability coefficient will most likely be used? A. A coefficient of equivalence B. A coefficient of stability C. A coefficient of determination D. Coefficient alpha
D. Coefficient alpha
When is a coefficient alpha used?
To determine a test’s internal consistency reliability by giving a test once to a single group of examinees. A special formula is used to determine the degree of inter-item consistency.
Define a coefficient of equivalence.
An alternate form of reliability that is measured when 2 equivalent forms of a test are administered to the same group of examinees at about the same time and their scores are correlated
Define a coefficient of stability.
When the same test is administered to the same group of examinees on 2 different occasions
Define the coefficient of determination.
The squared correlation between X and Y; it indicates how much of the variability in Y is accounted for by the variability in X
To determine a test's split-half reliability for items that are scored on a correct/incorrect scale, one would use which formula: A. Coefficient alpha B. Spearman-Brown prophesy formula C. Kuder-Richardson Formula 20 (KR-20) D. Pearson r
C. Kuder-Richardson Formula 20 (KR-20)
What is the Spearman-Brown prophecy formula used for?
To correct the split-half reliability coefficient by estimating what the reliability coefficient would have been had it been based on the full length of the test
What is the Pearson r and when is it used?
It is a correlation coefficient that can be used when both variables have been measured on an interval or ratio scale
A test of computer skills has a reliability coefficient of 0.75, a mean of 100, and a variance of 16. What is the test's standard error of measurement? A. 2 B. 4 C. 8 D. 16
A. 2
What is the formula for calculating standard error of measurement?
Standard error of measurement (SEM) = (standard deviation of test scores) multiplied by (the square root of [1 minus the reliability coefficient])
You develop a test of “common-sense intelligence” and correlate scores on it with another test that purports to measure academic intelligence. Apparently, you wish to:
A. Obtain a high correlation to gain evidence of the test’s convergent validity
B. Obtain a low correlation to gain evidence of the test’s convergent validity
C. Obtain a high correlation to gain evidence of the test’s divergent validity
D. Obtain a low correlation to gain evidence of the test’s divergent validity
D. Obtain a low correlation to gain evidence of the test’s divergent validity
Which item difficulty level is associated with the greatest differentiation between examinees of high and low ability? A. +1.0 B. 0.50 C. 0.01 D. -0.50
B. 0.50
What does an item difficulty level of 0.05 mean?
It means that 50% of examinees answered the question correctly. Ideally, it will be those examinees with the greatest ability who answered the question correctly and those with the lowest ability who answered it incorrectly. You are best able to differentiate those with low ability from those with high ability with a difficulty level of 0.50.
Appraisal can be defined as
a. the process of assessing or estimating attributes.
b. testing which is always performed in a group setting.
c. testing which is always performed on a single individual.
d. a pencil and paper measurement of assessing attributes.
a. the process of assessing or estimating attributes.
A test can be defi ned as a systematic method of measuring a sample of behavior. Test format refers to the manner in which test items are presented. The format of an essay test is considered a(n) _______ format.
a. subjective.
b. objective.
c. very precise.
d. concise.
a. subjective.
The National Counselor Exam (NCE) is a(n) _______ test because the scoring procedure is specific.
a. subjective.
b. objective.
c. projective.
d. subtest.
b. objective.
A short answer test is a(n) _______ test.
a. objective.
b. culture free.
c. forced choice.
d. free choice.
d. free choice.
The NCE is a(n) _______ test.
a. free choice.
b. forced choice.
c. projective.
d. intelligence.
b. forced choice.
The _______ index indicates the percentage of individuals who answered each item correctly.
a. difficulty.
b. critical.
c. intelligence.
d. personal.
a. difficulty.
Short answer tests and projective measures utilize free response items. The NCE and the CPCE uses forced choice or so-called _______ items.
a. vague.
b. subjective.
c. recognition.
d. numerical.
c. recognition.
A true/false test has _______ recognition items.
a. similar.
b. free choice.
c. dichotomous.
d. no.
c. dichotomous.
A test format could be normative or ipsative. In the normative format
a. each item depends on the item before it.
b. each item depends on the item after it.
c. the client must possess an IQ within the normal range.
d. each item is independent of all other items.
d. each item is independent of all other items.
A client who takes a normative test
a. cannot legitimately be compared to others who have taken the test.
b. can legitimately be compared to others who have taken the test.
c. could not have taken an IQ test.
d. could not have taken a personality test.
b. can legitimately be compared to others who have taken the test.
In an ipsative measure the person taking the test must compare items to one another. The result is that
a. an ipsative measure cannot be utilized for career guidance.
b. you cannot legitimately compare two or more people who have taken an ipsative test.
c. an ipsative measure is never valid.
d. an ipsative measure is never reliable.
b. you cannot legitimately compare two or more people who have taken an ipsative test.
Tests are often classifi ed as speed tests versus power tests. A timed typing test used to hire secretaries would be
a. a power test.
b. neither a speed test nor a power test.
c. a speed test.
d. a fi ne example of an ipsative measure.
c. a speed test.
A counseling test consists of 300 forced response items. The person taking the test can take as long as he or she wants to answer the questions.
a. This is most likely a projective measure.
b. This is most likely a speed test.
c. This is most likely a power test.
d. This is most likely an invalid measure.
c. This is most likely a power test.
An achievement test measures maximum performance while a personality test or interest inventory measures
a. typical performance.
b. minimum performance.
c. unconscious traits.
d. self-esteem by always relying on a Q-Sort design.
a. typical performance.
In a spiral test
a. the items get progressively easier.
b. the diffi culty of the items remains constant.
c. the client must answer each question in a specified period of time.
d. the items get progressively more diffi cult.
d. the items get progressively more diffi cult.
In a cyclical test
a. the items get progressively easier.
b. the difficulty of the items remains constant.
c. you have several sections which are spiral in nature.
d. the client must answer each question in a specified periodof time.
c. you have several sections which are spiral in nature.
A test battery is considered
a. a horizontal test.
b. a vertical test.
c. a valid test.
d. a reliable test.
a. a horizontal test.
In a counseling research study two groups of subjects took a test with the same name. However, when they talked with each other they discovered that the questions were different. The researcher assured both groups that they were given the same test. How is this possible?
a. The researcher is not telling the truth. The groups could not possibly have taken the same test.
b. The test was horizontal.
c. The test was not a power test.
d. The researcher gave parallel forms of the same test.
d. The researcher gave parallel forms of the same test.
The most critical factors in test selection are
a. the length of the test and the number of people who took the test in the norming process.
b. horizontal versus vertical.
c. validity and reliability.
d. spiral versus cyclical format.
c. validity and reliability.
Which is more important, validity or reliability?
a. Reliability.
b. They are equally important.
c. Validity.
d. It depends on the test in question.
c. Validity.
In the fi eld of testing, validity refers to
a. whether the test really measures what it purports to measure.
b. whether the same test gives consistent measurement.
c. the degree of cultural bias in a test.
d. the fact that numerous tests measure the same traits.
a. whether the test really measures what it purports to measure.
A counselor peruses a testing catalog in search of a test which will repeatedly give consistent results. The counselor
a. is interested in reliability.
b. is interested in validity.
c. is looking for information which is not available.
d. is magnifying an unimportant issue.
a. is interested in reliability.
Which measure would yield the highest level of reliability?
a. A TAT, projective test popular with psychodynamic helpers.
b. The WAIS-III, a popular IQ test.
c. The MMPI-2, a popular personality test.
d. A very accurate scale.
d. A very accurate scale.
Construct validity refers to the extent that a test measures an abstract trait or psychological notion. An example would be
a. height.
b. weight.
c. ego strength.
d. the ability to name all men who have served as U.S. presidents.
c. ego strength.
Face validity refers to the extent that a test
a. looks or appears to measure the intended attribute.
b. measures a theoretical construct.
c. appears to be constructed in an artistic fashion.
d. can be compared to job performance.
a. looks or appears to measure the intended attribute.
A job test which predicted future performance on a job very well would
a. have high criterion/predictive validity.
b. have excellent face validity.
c. have excellent construct validity.
d. not have incremental validity or synthetic validity.
a. have high criterion/predictive validity.
A new IQ test which yielded results nearly identical to other standardized measures would be said to have
a. good concurrent validity.
b. good face validity.
c. superb internal consistency.
d. all of the above.
a. good concurrent validity.
When a counselor tells a client that the Graduate Record Examination (GRE) will predict her ability to handle graduate work, the counselor is referring to
a. good concurrent validity.
b. construct validity.
c. face validity.
d. predictive validity.
d. predictive validity.
A reliable test is _______ valid.
a. always.
b. 90%.
c. not always.
d. 80%.
c. not always.
A valid test is _______ reliable.
a. not always.
b. always.
c. never.
d. 80%.
b. always.
One method of testing reliability is to give the same test to the same group of people two times and then correlate the scores. This is called
a. test–retest reliability.
b. equivalent forms reliability.
c. alternate forms reliability.
d. the split-half method.
a. test–retest reliability.
One method of testing reliability is to give the same population alternate forms of the identical test. Each form will have the same psychometric/statistical properties as the original instrument. This is known as
a. test–retest reliability.
b. equivalent or alternate forms reliability.
c. the split-half method.
d. internal consistency.
b. equivalent or alternate forms reliability.
A counselor doing research decided to split a standardized test in half by using the even items as one test and the odd items as a second test and then correlating them. The counselor
a. used an invalid procedure to test reliability.
b. was testing reliability via the split-half method.
c. was testing reliability via the equivalent forms method.
d. was testing reliability via the inter-rater method.
b. was testing reliability via the split-half method.
Which method of reliability testing would be useful with an essay test but not with a test of algebra problems?
a. test–retest.
b. alternate forms.
c. split-half.
d. interrater/interobserver.
d. interrater/interobserver.
A reliability coeffi cient of 1.00 indicates
a. a lot of variance in the test.
b. a score with a high level of error.
c. a perfect score which has no error.
d. a typical correlation on most psychological and counseling
tests.
c. a perfect score which has no error.
An excellent psychological or counseling test would have a reliability coefficient of
a. 50.
b. .90.
c. 1.00.
d. −.90.
b. .90.
A researcher working with a personality test discovers that the test has a reliability coefficient of .70 which is somewhat typical. This indicates that
a. 70% of the score is accurate while 30% is inaccurate.
b. 30% of the people who are tested will receive accurate
scores.
c. 70% of the people who are tested will receive accurate
scores.
d. 30% of the score is accurate while 70% is inaccurate.
a. 70% of the score is accurate while 30% is inaccurate.
A career counselor is using a test for job selection purposes. An acceptable reliability coefficient would be _______ or higher.
a. .20.
b. .55.
c. .80.
d. .70.
c. .80.
The same test is given to the same group of people using the test-retest reliability method. The correlation between the first and second administration is .70. The true variance (i.e., the percentage of shared variance or the level of the same thing measured in both) is
a. 70%.
b. 100%.
c. 50%.
d. 49%.
d. 49%.
IQ means
a. a query of intelligence.
b. indicationofintelligence.
c. intelligence quotient.
d. intelligence questions for test construction.
c. intelligence quotient.
_______ did research and concluded that intelligence was normally distributed like height or weight and that it was primarily genetic.
a. Spearman.
b. Guilford.
c. Williamson.
d. FrancisGalton.
d. FrancisGalton.
Francis Galton felt intelligence was
a. a unitary faculty.
b. best explained via a two factor theory.
c. best explained via the person’s environment.
d. fluid and crystallized in nature.
a. a unitary faculty.
J. P. Guilford isolated 120 factors which added up to intelligence. He also is remembered for his
a. thoughts on convergent and divergent thinking.
b. work on cognitive therapy.
c. work on behavior therapy.
d. work to create the first standardized IQ test.
a. thoughts on convergent and divergent thinking.
A counselor is told by his supervisor to measure the internal con- sistency reliability (i.e., homogeneity) of a test but not to divide the test in halves. The counselor would need to utilize
a. the split-half method.
b. the test–retest method.
c. the Kuder-Richardson coefficients of equivalence.
d. cross-validation.
c. the Kuder-Richardson coefficients of equivalence.
The first intelligence test was created by
a. DavidWechsler.
b. J. P. Guilford.
c. FrancisGalton.
d. Alfred Binet and Theodore Simon.
d. Alfred Binet and Theodore Simon.
IQ stands for intelligence quotient, which is expressed by
a. CA/MA×100.
b. CA/MA×100.
c. MA/CA×50.
d. MA/CA×100.
d. MA/CA×100.
The Binet stressed age-related tasks. Utilizing this method, a 9- year-old task would be one which
a. only a 10-year-old child could answer.
b. only an 8-year-old child could answer.
c. 50% of the 9-year-olds could answer correctly.
d. 75% of the 9-year-olds could answer correctly.
c. 50% of the 9-year-olds could answer correctly.
Simon and Binet pioneered the first IQ test around 1905. The
test was created to
a. assess high school seniors in America.
b. assess U.S. military recruits.
c. discriminate normal from retarded Parisian children.
d. measure genius in the college population.
c. discriminate normal from retarded Parisian children.
Today the Stanford-Binet is used from ages 2 to adulthood. The IQ formula has been replaced by the
a. SAS.
b. SUDS.
c. entropy.
d. ACPA.
a. SAS.
Most experts would agree that the Wechsler IQ tests gained popularity, as the Binet
a. must be administered in a group.
b. favored the geriatric population.
c. didn’t seem to be the best test for adults.
d. was biased toward women.
c. didn’t seem to be the best test for adults.
The best IQ test for a 22-year-old single male would be the
a. WPPSI-III.
b. WAIS-III.
c. WISC-IV.
d. Computer-based testing.
b. WAIS-III.
The best intelligence test for a sixth-grade girl would be the
a. WPPSI-III.
b. WAIS-III.
c. WISC-IV.
d. Merrill-Palmer.
c. WISC-IV.
The best intelligence test for a kindergartner would be the
a. WPPSI-III.
b. WAIS-III.
c. WISC-IV.
d. Myers-BriggsTypeIndicator.
a. WPPSI-III.
The mean on the Wechsler and the Binet is _______ and the standard deviation is _______.
a. 100;100.
b. 100; 15 Wechsler, 16 Stanford-Binet.
c. 100;20.
d. 100;1.
b. 100; 15 Wechsler, 16 Stanford-Binet.
Group IQ tests like the Otis Lennon, the Lorge-Thorndike, and the California Test of Mental Abilities are popular in school set- tings. The advantage is that
a. group tests are quicker to administer.
b. group tests are superior in terms of predicting school performance.
c. group tests always have a higher degree of reliability.
d. individual IQ tests are not appropriate for school children.
a. group tests are quicker to administer.
The group IQ test movement began
a. in1905.
b. with the work of Binet.
c. with the Army Alpha and Army Beta in World War I.
d. with the AGCT in World War II.
c. with the Army Alpha and Army Beta in World War I.
In a culture-fair test
a. items are known to the subject regardless of his or her culture.
b. the test is not standardized.
c. culture-free items cannot be utilized.
d. African Americans generally score higher than Whites.
a. items are known to the subject regardless of his or her culture.
The Black versus White IQ controversy was sparked mainly by a 1969 article written by _______.
a. JohnErtl.
b. RaymondB.Cattell.
c. ArthurJensen.
d. RobertWilliams.
c. ArthurJensen.
The MMPI-2 is
a. anIQtest.
b. a neurological test.
c. a projective personality test.
d. a standardized personality test.
d. a standardized personality test.
The word psychometric means
a. a form of measurement used by a neurologist.
b. any form of mental testing.
c. a mental trait which cannot be measured.
d. the test relies on a summated or linear rating scale.
b. any form of mental testing.
In a projective test the client is shown
a. something which is highly reinforcing.
b. something which is highly charged from an emotional
standpoint.
c. a and b.
d. neutral stimuli.
d. neutral stimuli.
The 16 PF reflects the work of
a. RaymondB.Cattell.
b. CarlJung.
c. JamesMcKeenCattell.
d. OscarK.Buros.
a. RaymondB.Cattell.