Assessment & Testing Flashcards

1
Q

Appraisal can be defined as

a. the process of assessing or estimating attributes.
b. testing which is always performed in a group setting.
c. testing which is always performed on a single individual.
d. a pencil and paper measurement of assessing attributes.

A

a. the process of assessing or estimating attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A test can be defined as a systematic method of measuring a sample of behavior. Test format refers to the manner in which test items are presented. The format of an essay test is considered a(n) ________ format.

a. subjective
b. objective
c. very precise
d. concise

A

a. subjective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The National Counselor Exam (NCE) is a(n) ________ test because the scoring procedure is specific.

a. subjective
b. objective
c. projective
d. subtest

A

b. objective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A short answer test is a(n) ________ test.

a. objective
b. culture-free
c. forced choice
d. free choice

A

d. free choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The ________ index indicates the percentage of individuals who answered each item correctly.

a. difficulty
b. critical
c. intelligence
d. personal

A

a. difficulty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A test format could be normative or ipsative. In the normative format

a. each item depends on the item before it.
b. each item depends on the item after it.
c. the client must possess an IQ within the normal range. d. each item is independent of all other items.

A

d. each item is independent of all other items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A client who takes a normative test

a. cannot legitimately be compared to others who have taken the test.
b. can legitimately be compared to others who have taken the test.
c. could not have taken an IQ test.
d. could not have taken a personality test.

A

b. can legitimately be compared to others who have taken the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In an ipsative measure the person taking the test must compare items to one another. The result is that

a. an ipsative measure cannot be utilized for career guidance.
b. you cannot legitimately compare two or more people who have taken an ipsative test. c. an ipsative measure is never a forced choice format.
d. an ipsative measure is never reliable.

A

b. you cannot legitimately compare two or more people who have taken an ipsative test.

Since the ipsative measure does not reveal absolute strengths, comparing one person’s score to another is relatively meaningless.

The person is measured in response to his or her own standard of behavior.

The ipsative measure points out the highs and lows that exist within a single individual.

Hence, when a colleague tells you that Mr. Johnson’s anxiety is improving, she has given you an ipsative description. This description, however, would not lend itself to comparing say Mr. Johnson’s anxiety to Mrs. McBee’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tests are often classified as speed tests versus power tests. A timed typing test used to hire secretaries would be

a. a power test.
b. neither a speed test nor a power test.
c. a speed test.
d. a fine example of an ipsative measure.

A

c. a speed test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An achievement test measures maximum performance or present level of skill. Tests of this nature are also called attainment tests, while a personality test or interest inventory measures

a. typical performance.
b. minimum performance.
c. unconscious traits.
d. self-esteem by always relying on a Q-Sort design.

A

a. typical performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In a spiral test

a. the items get progressively easier.
b. the difficulty of the items remains constant.
c. the client must answer each question in a specified period of time.
d. the items get progressively more difficult.

A

d. the items get progressively more difficult.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In a cyclical test

a. the items get progressively easier.
b. the difficulty of the items remains constant.
c. you have several sections which are spiral in nature.
d. the client must answer each question in a specified period of time.

A

c. you have several sections which are spiral in nature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A test battery is considered

a. a horizontal test.
b. a vertical test.
c. a valid test.
d. a reliable test.

A

a. a horizontal test.

In a test battery, several measures are used to produce results that could be more accurate than those derived from merely using a single source. Say, this can get confusing. Remember, that in the section on group processes I talked about vertical and horizontal interventions.

In testing, a vertical test would have versions for various age brackets or levels of education (e.g., a math achievement test for preschoolers and a version for middle school children).

A horizontal test measures various factors (e.g., math and science) during the same testing procedure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which is more important, validity or reliability?

a. Reliability.
b. They are equally important.
c. Validity.
d. It depends on the test in question.

A

c. Validity.

Experts nearly always consider validity the number one factor in the construction of a test. A test must measure what it purports to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the field of testing, validity refers to

a. whether the test really measures what it purports to measure.
b. whether the same test gives consistent measurement.
c. the degree of cultural bias in a test.
d. the fact that numerous tests measure the same traits.

A

a. whether the test really measures what it purports to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which measure would yield the highest level of reliability?

a. A TAT, projective test popular with psychodynamic helpers.
b. The WAIS-IV, a popular IQ test.
c. The MMPI-2, a popular personality test.
d. A very accurate postage scale.

A

d. A very accurate postage scale.

In the real world physical measurements are more reliable than psychological ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Construct validity refers to the extent that a test measures an abstract trait or psychological notion. An
example would be

a. height.
b. weight.
c. ego strength.
d. the ability to name all men who have served as U.S. presidents.

A

c. ego strength.

Any trait you cannot “directly” measure or observe can be considered a construct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Face validity refers to the extent that a test

a. looks or appears to measure the intended attribute.
b. measures a theoretical construct.
c. appears to be constructed in an artistic fashion.
d. can be compared to job performance.

A

a. looks or appears to measure the intended attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A job test which predicted future performance on a job very well would

a. have high criterion/predictive validity.
b. have excellent face validity.
c. have excellent construct validity.
d. not have incremental validity or synthetic validity.

A

a. have high criterion/predictive validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A new IQ test which yielded results nearly identical to other standardized measures would be said to have

a. good concurrent validity.
b. good face validity.
c. superb internal consistency.
d. all of the above.

A

a. good concurrent validity.

Criterion validity could be “concurrent” or “predictive.” Concurrent validity answers the question of how well your test stacks up against a well-established instrument that measures the same behavior, construct, or trait.

Evidence for reliability and validity is expressed via correlation coefficients. Suffice to say that the closer they are to 1.00 the better.

You also should be familiar with the terms convergent and discriminant validity. These terms relate to both criterion validity and construct validity.

The relationship or correlation of a test to an independent measure or trait is known as convergent validity.

Convergent validity is actually a method used to assess a test’s construct/criterion validity by correlating test scores with an outside source. Say, for example, that a measure purports to measure phobic responses.

A client, who has a snake phobia, is then exposed to a snake and experiences extreme panic. If the client scores higher on the test than he would in a relaxed state, then this would display convergent validity.

The test also should show discriminant validity. This means the test will not reflect unrelated variables. Hence, if phobias are unrelated to IQ, then when one correlates clients’ IQ scores to their scores on the test for phobias, this should produce a near zero correlation.

Similarly, if discriminant validity is evident, a counselor who is genuinely qualified to sit for a state licensing exam should score higher on the exam than a student who flunked an introductory counseling course.

When a researcher is engaged in test validation, both convergent and discriminant validity should be thoroughly examined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A valid test is ________ reliable.

a. not always
b. always
c. never
d. 80%

A

b. always

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

One method of testing reliability is to give the same test to the same group of people two times and then correlate the scores. This is called

a. test–retest reliability.
b. equivalent forms reliability.
c. alternate forms reliability.
d. the split-half method.

A

a. test–retest reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

One method of testing reliability is to give the same population alternate forms of the identical test. Each form will have the same psychometric/statistical properties as the original instrument. This is known as

a. test–retest reliability.
b. equivalent or alternate forms reliability.
c. the split-half method.
d. internal consistency.

A

b. equivalent or alternate forms reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A counselor doing research decided to split a standardized test in half by using the even items as one test and the odd items as a second test and then correlating them. The counselor

a. used an invalid procedure to test reliability.
b. was testing reliability via the split-half correlation method.
c. was testing reliability via the equivalent forms method.
d. was testing reliability via the inter-rater method.

A

b. was testing reliability via the split-half correlation method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which method of reliability testing would be useful with an essay test but not with a test of algebra problems?

a. Test–retest.
b. Alternate forms.
c. Split-half.
d. Inter-rater/inter-observer.

A

d. Inter-rater/inter-observer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

A reliability coefficient of 1.00 indicates

a. a lot of variance in the test.
b. a score with a high level of error.
c. a perfect score which has no error.
d. a typical correlation on most psychological and counseling tests.

A

c. a perfect score which has no error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

An excellent psychological or counseling test would have a reliability coefficient of

a. 50.
b. .90.
c. 1.00.
d. –.90.

A

b. .90.

Ninety percent of the score measured the attribute in question, while 10% of the score is indicative of error.

28
Q

A researcher working with a personality test discovers that the test has a reliability coefficient of .70 which is somewhat typical. This indicates that

a. 70% of the score is accurate while 30% is inaccurate.
b. 30% of the people who are tested will receive accurate scores.
c. 70% of the people who are tested will receive accurate scores.
d. 30% of the score is accurate while 70% is inaccurate.

A

a. 70% of the score is accurate while 30% is inaccurate.

Seventy percent of the obtained score on the test represented the true score on the personality attribute, while 30% of the obtained score could be accounted for by error. Seventy percent is true variance while 30% constitutes error variance.

29
Q

A career counselor is using a test for job selection purposes. An acceptable reliability coefficient would be ________ or higher.

a. .20
b. .55
c. .80
d. .70

A

c. .80

30
Q

The same test is given to the same group of people using the test–retest reliability method. The correlation between the first and second administration is .70. The true variance (i.e., the percentage of shared variance or the level of the same thing measured in both) is

a. 70%.
b. 100%.
c. 50%.
d. 49%.

A

d. 49%.

To demonstrate the variance of one factor accounted for by another you merely square the correlation (i.e., reliability coefficient).

So .70 × .70 = .49 and .49 × 100 = 49%.

Your exam could refer to this principle as the coefficient of determination.

31
Q

IQ means

a. a query of intelligence.
b. indication of intelligence.
c. intelligence quotient.
d. intelligence questions for test construction.

A

c. intelligence quotient.

32
Q

________ did research and concluded that intelligence was normally distributed like height or weight and that it was primarily genetic.

a. Spearman
b. Guilford
c. Williamson
d. Galton

A

d. Galton

Francis Galton felt intelligence was a single or so-called unitary factor.

33
Q

Francis Galton felt intelligence was

a. a unitary faculty.
b. best explained via a two factor theory.
c. best explained via the person’s environment.
d. fluid and crystallized in nature.

A

a. a unitary faculty.

34
Q

J. P. Guilford isolated 120 factors which added up to intelligence. He also is remembered for his

a. thoughts on convergent and divergent thinking.
b. work on cognitive therapy.
c. work on behavior therapy.
d. work to create the first standardized IQ test.

A

a. thoughts on convergent and divergent thinking.

35
Q

A counselor is told by his supervisor to measure the internal consistency reliability (i.e., homogeneity) of a test but not to divide the test in halves. The counselor would need to utilize

a. the split-half method.
b. the test–retest method.
c. the Kuder–Richardson coefficients of equivalence.
d. cross-validation

A

c. the Kuder–Richardson coefficients of equivalence.

36
Q

The first intelligence test was created by

a. David Wechsler.
b. J. P. Guilford.
c. Francis Galton.
d. Alfred Binet and Theodore Simon.

A

d. Alfred Binet and Theodore Simon.

The year was 1904 and the French government appointed a commission to ferret out feeble-minded Parisian children from those who were normal.

Alfred Binet led the committee and the rest is history. By 1905, Binet, along with his coworker Theodore Simon, created a 30-question test with school- related items of increased difficulty.

37
Q

Today, the Stanford–Binet IQ test is

a. a nonstandardized measure.
b. a standardized measure.
c. a projective measure.
d. b and c.

A

b. a standardized measure.

38
Q

IQ stands for intelligence quotient, which is expressed by

a. CA/MA × 100.
b. CA/MA × 100.
c. MA/CA × 50.
d. MA/CA × 100.

A

d. MA/CA × 100.

39
Q

The Binet stressed age-related tasks. Utilizing this method, a 9-year-old task would be one which

a. only a 10-year-old child could answer.
b. only an 8-year-old child could answer.
c. 50% of the 9-year-olds could answer correctly.
d. 75% of the 9-year-olds could answer correctly.

A

c. 50% of the 9-year-olds could answer correctly.

40
Q

Today the Stanford–Binet is used from age 2 to adulthood. The IQ formula has been replaced by the

a. SAS.
b. SUDS.
c. entropy.
d. KR-20 formula.

A

a. SAS.

The Binet today actually relies on a standard age score (SAS) with a mean of 100 and a standard deviation of 16.

41
Q

Most experts would agree that the Wechsler IQ tests gained popularity, as the Binet

a. must be administered in a group.
b. favored the geriatric population.
c. didn’t seem to be the best test for adults.
d. was biased toward women.

A

c. didn’t seem to be the best test for adults.

42
Q

The best IQ test for a 22-year-old single male would be the

a. WPPSI-III.
b. WAIS-IV.
c. WISC-IV.
d. any computer-based IQ test.

A

b. WAIS-IV.

Choice “a,” the WPPSI (Wechsler Preschool and Primary Scale of Intelligence), is suitable for children ages 2 years and 6 months to 7 years and 7 months.

Choice “b,” the WAIS-IV (Wechsler Adult Intelligence Scale), is intended for ages 16–90 years.

Choice “c,” the WISC-IV (Wechsler Intelligence Scale for Children), is appropriate for kids ages 6–16 years and 11 months.

43
Q

The mean on the Wechsler and the Stanford–Binet Intelligence scales (SB5) is ________ and the standard deviation is ________.

a. 100; 100
b. 100; 15 Wechsler, 16 Stanford–Binet
c. 100; 20
d. 100; 1

A

b. 100; 15 Wechsler, 16 Stanford–Binet

44
Q

The black versus white IQ controversy was sparked mainly by a 1969 article written by ________.

a. John Ertl
b. Raymond B. Cattell
c. Arthur Jensen
d. Robert Williams

A

c. Arthur Jensen

Jensen, choice “c” mentioned earlier, sparked tremendous controversy—actually that’s putting it mildly—when he suggested in a 1969 Harvard Educational Review article (“How Much Can We Boost IQ and Scholastic Performance?”) that the closer people are genetically, the more alike their IQ scores.

Adopted children, for example, will sport IQs closer to their biological parents than to their adopted ones. Jensen then leveled the charge that whites score 11 to 15 IQ points higher than African Americans (regardless of social class).

His theory stated that due to slavery it was possible that African Americans were bred for strength rather than intelligence. He estimated that heredity contributed 80%, while environment influenced 20% of the IQ.

45
Q

The MMPI-2 is

a. an IQ test.
b. a neurological test.
c. a projective personality test.
d. a standardized personality test.

A

d. a standardized personality test.

46
Q

The word psychometric means

a. a form of measurement used by a neurologist.
b. any form of mental testing.
c. a mental trait which cannot be measured.
d. the test relies on a summated or linear rating scale.

A

b. any form of mental testing.

47
Q

The 16 PF reflects the work of

a. Raymond B. Cattell.
b. Carl Jung.
c. James McKeen Cattell.
d. Oscar K. Buros.

A

a. Raymond B. Cattell.

The 16 PF (16 Personality Factor Questionnaire), developed by Raymond B. Cattell, is suitable for persons age 16 and above and has been the subject of over 2,000 papers or other communications! The test measures key personality factors such as assertiveness, emotional maturity, and shrewdness.

48
Q

The Myers–Briggs Type Indicator reflects the work of

a. Raymond B. Cattell.
b. Carl Jung.
c. William Glasser.
d. Oscar K. Buros.

A

b. Carl Jung.

49
Q

The counselor who favors projective measures would most likely be a

a. Rogerian.
b. strict behaviorist.
c. TA therapist.
d. psychodynamic clinician.

A

d. psychodynamic clinician.

50
Q

An aptitude test is to ________ as an achievement test is to ________.

a. what has been learned; potential
b. potential; what has been learned
c. profit from learning; potential
d. a measurement of current skills; potential

A

b. potential; what has been learned

51
Q

Both the Rorschach and the Thematic Apperception Test (TAT) are projective tests. The Rorschach uses 10 inkblot cards while the TAT uses

a. a dozen inkblot cards.
b. verbal and performance IQ scales.
c. pictures.
d. incomplete sentences.

A

c. pictures.

52
Q

A counselor who fears the client has an organic, neurological, or motoric difficulty would most likely use
the

a. Bender Gestalt II.
b. Rorschach.
c. Minnesota Multiphasic Personality Inventory-2.
d. Thematic Apperception Test.

A

a. Bender Gestalt II.

The Bender Visual Motor Gestalt Test (named after psychiatrist Lauretta Bender) is actually an expressive projective measure, though first and foremost it is known for its ability to discern whether brain damage is evident.

53
Q

An interest inventory would be least valid when used with

a. a first-year college student majoring in philosophy.
b. a third-year college student majoring in physics.
c. an eighth-grade male with an IQ of 136.
d. a 46-year-old white male construction worker.

A

c. an eighth-grade male with an IQ of 136.

54
Q

One major criticism of interest inventories is that

a. they have far too many questions.
b. they are most appropriate for very young children.
c. they emphasize professional positions and minimize blue-collar jobs.
d. they favor jobs that will require a bachelor’s degree or higher.

A

c. they emphasize professional positions and minimize blue-collar jobs.

55
Q

Interest inventories are positive in the sense that

a. they are reliable and not threatening to the test taker.
b. they are always graded by the test taker.
c. they require little or no reading skills.
d. they have high validity in nearly all age brackets.

A

a. they are reliable and not threatening to the test taker.

56
Q

A counselor who had an interest primarily in testing would most likely be a member of

a. HS-BCP.
b. AARC.
c. NASW.
d. ACES.

A

b. AARC.

The AARC (Association for Assessment and Research in Counseling)

57
Q

The ________ are examples of aptitude tests.

a. O*NET Ability Profiler and the MCAT
b. GZTS and the MMPI-2
c. CPI and the MMPI-2
d. Strong and the LSAT

A

a. O*NET Ability Profiler and the MCAT

58
Q

An aptitude test predicts future behavior while an achievement test measures what you have mastered or learned. In the case of a test like the ________ the distinction is unclear.

a. Binet
b. Wechsler
c. GRE
d. Bender

A

c. GRE

The GRE attempts to predict graduate school performance, but it also tests your level of knowledge.

59
Q

The standard error of measurement tells you

a. how accurate or inaccurate a test score is.
b. what population responds best to the test.
c. something about social loafing.
d. the number of people used in norming the test.

A

a. how accurate or inaccurate a test score is.

60
Q

A new IQ test has a standard error of measurement (SEM) of 3. Tom scores 106 on the test. If he takes the test a lot, we can predict that about 68% of the time

a. Tom will score between 100 and 103.
b. Tom will score between 100 and 106.
c. Tom will score between 103 and 109.
d. Tom will score higher than Betty who scored 139.

A

c. Tom will score between 103 and 109.

61
Q

A counselor created an achievement test with a reliability coefficient of .82. The test is shortened since many clients felt it was too long. The counselor shortened the test but logically assumed that the reliability coefficient would now

a. be approximately .88.
b. remain at .82.
c. be at least 10 points higher or lower.
d. be lower than .82.

A

d. be lower than .82.

62
Q

A colleague of yours invents a new projective test. Seventeen counselors rated the same client using the measure and came up with nearly identical assessments. This would indicate

a. high validity.
b. high reliability.
c. excellent norming studies.
d. culture fairness.

A

b. high reliability.

This is known as “inter-rater” reliability.

63
Q

The WAIS-IV is given to 100,000 individuals in the United States who are picked at random. A counselor would expect that

a. approximately 68% would score between 85 and 115.
b. approximately 68% would score between 70 and 130.
c. the mean IQ would be 112.
d. 50% of those tested would score 112 or above.

A

a. approximately 68% would score between 85 and 115.

64
Q

You want to admit only 25% of all counselors to an advanced training program in psychodynamic group therapy. The item difficulty on the entrance exam for applicants would be best set at

a. 0.0.
b. .5 regardless of the admission requirement.
c. 1.0.
d. .25.

A

d. .25.

65
Q

Lewis Terman

a. constructed the Wechsler tests.
b. constructed the initial Binet prior to 1910.
c. constructed the Rorschach.
d. Americanized the Binet.

A

d. Americanized the Binet.

66
Q

In constructing a test you notice that all 75 people correctly answered item number 12. This gives you an
item difficulty of

a. 1.2.
b. .75.
c. 1.0.
d. 0.0.

A

c. 1.0.