Assessment & Testing (Appraisal) Flashcards
- *Appraisal can be defi ned as**
- *a. the process of assessing or estimating attributes.**
- *b. testing which is always performed in a group setting.**
- *c. testing which is always performed on a single individual.**
- *d. a pencil and paper measurement of assessing attributes.**
a. the process of assessing or estimating attributes.
Appraisal is a broad term which includes more than merely
“testing clients.” Appraisal could include a survey, observations,
or even clinical interviews.
score has been assigned to the per-
son’s attribute or performance. An effective counselor will al-
ways inform clients about the limitations of any test that
he or she administers. Some evidence indicates that neo-
phyte counselors are sometimes tempted to administer
tests merely to boost their credibility. I think it is safe to
say this is not a desirable practice.
- *A test can be defi ned as a systematic method of measuring a**
- *sample of behavior. Test format refers to the manner in which**
- *test items are presented. The format of an essay test is consid-**
- *ered a(n) _______ format.**
- *a. subjective.**
- *b. objective.**
- *c. very precise.**
- *d. concise.**
a. subjective.
A “subjective” paradigm relies mainly on the scorer’s opinion. If
the rater knows the test taker’s attributes, the rater’s “personal
bias” can signifi cantly impact upon the rating.
- *The National Counselor Exam (NCE) is a(n) _______ test be-**
- *cause the scoring procedure is specifi c.**
- *a. subjective.**
- *b. objective.**
- *c. projective.**
- *d. subtest.**
b. objective.
Since the NCE uses an a, b, c, d alternative format the rater’s
“subjective” feelings and thoughts would not be an issue.
- *A short answer test is a(n) _______ test.**
- *a. objective.**
- *b. culture free.**
- *c. forced choice.**
- *d. free choice.**
Some exams will call this a “free response” format. In any case,
d. free choice.
the salient point is that the person taking the test can respond
in any manner he or she chooses. Although free choice response
patterns can yield more information, they often take more time
to score and increase subjectivity (i.e., there is more than one
correct answer). I should mention that although testing is
often controversial, schools now employ psychoeduca-
tional tests more than at any time in history.
- *The NCE is a(n) _______ test.**
- *a. free choice.**
- *b. forced choice.**
- *c. projective.**
- *d. intelligence.**
b. forced choice.
“Forced choice” items are sometimes known as “recognition
items.” This book is composed of forced choice/recognition
items. On some tests this format is used to control for the “social
desirability phenomenon” which asserts that the person puts the
answer he or she feels is socially acceptable
The
MMPI-2 or Minnesota Multiphasic Personality Inventory, for
example, uses forced choices to create a “lie scale” composed of
human frailties we all possess.
- *The _______ index indicates the percentage of individuals who**
- *answered each item correctly.**
- *a. diffi culty.**
- *b. critical.**
- *c. intelligence.**
- *d. personal.**
a. diffi culty.
The higher the number of people who answer a question cor-
rectly, the easier the item is—and vice versa. A .5 diffi culty index
(also called a diffi culty value) would suggest that 50% of those
tested answered the question correctly, while 50% did not.
most theorists agree that a “good measure” provides a wide range of
items that even a poor performer will answer correctly.
- *Short answer tests and projective measures utilize free response items. The NCE and the CPCE uses forced choice or so-called**
- *_______ items.**
- *a. vague.**
- *b. subjective.**
- *c. recognition.**
- *d. numerical.**
c. recognition.
Recognition items give the examinee two or more alternatives.
- *A true/false test has _______ recognition items.**
- *a. similar.**
- *b. free choice.**
- *c. dichotomous.**
- *d. no.**
c. dichotomous.
“Dichotomy” simply means that you are presented with two
opposing choices.
- *A test format could be normative or ipsative. In the normative**
- *format**
- *a. each item depends on the item before it.**
- *b. each item depends on the item after it.**
- *c. the client must possess an IQ within the normal range.**
- *d. each item is independent of all other items.**
d. each item is independent of all other items.
Ipsative measures compare traits within the same individ-
ual, they do not compare a person to other persons who
took the instrument. The Kuder Occupational Interest Sur-
vey (KOIS), now called the Kuder Career Search with Person
Match is one such example. The ipsative test allows the person
being tested to compare items.
A client who takes a normative test
- *a. cannot legitimately be compared to others who have tak-**
- *en the test.**
- *b. can legitimately be compared to others who have taken**
- *the test.**
- *c. could not have taken an IQ test.**
- *d. could not have taken a personality test.**
- *b. can legitimately be compared to others who have taken**
- *the test.**
Technically, a normative interpretation is one in which the individual’s score is
evaluated by comparing it to others who took the same test. A
percentile rank is an excellent example. Say your client scores an
82 on a nationally normed test and this score corresponds to the
percentile rank of 60. This tells you that 60% of the individuals
who took the test scored 82 or less.
In an ipsative measure the person taking the test must compare items to one another. The result is that
a. an ipsative measure cannot be utilized for career guidance.
- *b. you cannot legitimately compare two or more people who**
- *have taken an ipsative test.**
- *c. an ipsative measure is never valid.**
- *d. an ipsative measure is never reliable.**
- *b. you cannot legitimately compare two or more people who**
- *have taken an ipsative test.**
Since the ipsative measure does not reveal absolute
strengths, comparing one person’s score to another is relatively
meaningless. The person is measured in response to his or her
own standard of behavior. The ipsative measure points out the
highs and lows that exist within a single individual. Hence, when
a colleague tells you that Mr. Johnson’s anxiety is improving, she
has given you an ipsative description. This description, however,
would not lend itself to comparing Mr. Johnson’s anxiety to Mrs.
McBee’s.
- *Tests are often classified as speed tests versus power tests. A**
- *timed typing test used to hire secretaries would be**
- *a. a power test.**
- *b. neither a speed test nor a power test.**
- *c. a speed test.**
- *d. a fi ne example of an ipsative measure.**
c. a speed test.
A good timed/
speed test is purposely set up so that nobody fi nishes it. A
“power test” (see choice “a”) is designed to evaluate the level of
mastery without a time limit. A timed test is really a type of
speed test, but a high percentage of the test takers com-
plete it and it is usually more diffi cult and has a time limit
(think NCE).
A counseling test consists of 300 forced response items. The person taking the test can take as long as he or she wants to answer
- *the questions.**
- *a. This is most likely a projective measure.**
- *b. This is most likely a speed test.**
- *c. This is most likely a power test.**
- *d. This is most likely an invalid measure.**
c. This is most likely a power test.
Like the speed test, it will ideally be designed so that nobody
receives a perfect score. Choice “a,” projective measure, stands
incorrect since the projective tests rely on a “free response” for-
mat.
- *An achievement test measures maximum performance while a**
- *personality test or interest inventory measures**
- *a. typical performance.**
- *b. minimum performance.**
- *c. unconscious traits.**
- *d. self-esteem by always relying on a Q-Sort design.**
a. typical performance.
Interest inventories are
popular with career counselors because such measures focus on
what the client likes or dislikes. The Strong Interest Inventory
(SII) is an excellent example. Choice “d,” the Q-Sort, often used
to investigate personality traits, involves a procedure in which
an individual is given cards with statements and asked to place
them in piles of “most like me” to “least like me.” Then the sub-
ject compiles them to create the “ideal self.” The ideal self can
then be compared to his or her current self-perception in order
to assess self-esteem.
In a spiral test
- *a. the items get progressively easier.**
- *b. the diffi culty of the items remains constant.**
- *c. the client must answer each question in a specifi ed period**
- *of time.**
- *d. the items get progressively more diffi cult.**
d. the items get progressively more diffi cult.
a type of intelligence assessment in which the focused themes being evaluated are distributed throughout the test, instead of being grouped together, and become increasingly difficult as the test progresses
Just remember that a spiral staircase seems to get more diffi cult
to climb as you walk up higher.
In a cyclical test
- *a. the items get progressively easier.**
- *b. the diffi culty of the items remains constant.**
- *c. you have several sections which are spiral in nature.**
- *d. the client must answer each question in a specifi ed period**
- *of time.**
c. you have several sections which are spiral in nature.
In each section the questions would go from easy ones to those
which are more diffi cult.
Cyclical testing is designed so that subsequent design iterations can take advantage of usability findings from previous rounds. Design is refined with good usability that is confirmed with users, ensuring a final product that is both easy and satisfying to use.
A test battery is considered
- *a. a horizontal test.**
- *b. a vertical test.**
- *c. a valid test.**
- *d. a reliable test.**
a. a horizontal test.
In a test battery, several measures are used to produce
results that could be more accurate than those derived
from merely using a single source. Say, this can get confus-
ing. Remember, that in the section on group processes I talked
about vertical and horizontal interventions. In testing, a verti-
cal test would have versions for various age brackets or levels of
education (e.g., a math achievement test for preschoolers and a
version for middle-school children). A horizontal test measures
various factors (e.g., math and science) during the same testing
procedure.
- *In a counseling research study two groups of subjects took a test**
- *with the same name. However, when they talked with each other**
they discovered that the questions were different. The researcher assured both groups that they were given the same test. How is this possible?
- *a. The researcher is not telling the truth. The groups could**
- *not possibly have taken the same test.**
- *b. The test was horizontal.**
- *c. The test was not a power test.**
- *d. The researcher gave parallel forms of the same test.**
d. The researcher gave parallel forms of the same test.
When a test has two versions or forms that are interchangeable
they are termed parallel forms or equivalent forms of the same
test. From a statistical/psychometric standpoint each form must
have the same mean, standard error, and other statistical compo-
nents.
The most critical factors in test selection are
- *a. the length of the test and the number of people who took**
- *the test in the norming process.**
- *b. horizontal versus vertical.**
- *c. validity and reliability.**
- *d. spiral versus cyclical format.**
c. validity and reliability.
Validity refers to whether the test measures what it says it mea-
sures while reliability tells how consistent a test measures an at-
tribute.
Which is more important, validity or reliability?
- *a. Reliability.**
- *b. They are equally important.**
- *c. Validity.**
- *d. It depends on the test in question.**
c. Validity.
Experts nearly always consider validity the number one
factor in the construction of a test. A test must measure
what it purports to measure. Reliability, choice “a,” is the
second most important concern. A scale, for example, must
measure body weight accurately if it is a valid instrument.
In order to be reliable, it will need to give repeated read-
ings which are nearly identical for the same person if the
person keeps stepping on and off the scale.
In the field of testing, validity refers to
a. whether the test really measures what it purports to measure.
- *b. whether the same test gives consistent measurement.**
- *c. the degree of cultural bias in a test.**
- *d. the fact that numerous tests measure the same traits.**
a. whether the test really measures what it purports to measure.
There are fi ve basic
types of validity you should familiarize yourself with for
your exam: First, content validity or what is sometimes called
rational or logical validity. Second,
construct validity, which refers to a test’s ability to measure a
theoretical construct like intelligence, self-esteem, artistic tal-
ent, mechanical ability, or managerial potential. Third is con-
current validity, which deals with how well the test compares
to other instruments that are intended for the same purpose.
Fourth, predictive validity, also known as empirical validity,
which refl ects the test’s ability to predict future behavior accord-
ing to established criteria. On some exams, concurrent validity
and predictive validity are often lumped under the umbrella
of “criterion validity,” since concurrent validity and predictive
validity are actually different types of criterion-related validity.
Fifth, a small body of literature speaks of consequential valid-
- *ity, which simply tries to ascertain the social implications of us-**
- *ing tests**.
- *A counselor peruses a testing catalog in search of a test which**
- *will repeatedly give consistent results. The counselor**
- *a. is interested in reliability.**
- *b. is interested in validity.**
- *c. is looking for information which is not available.**
- *d. is magnifying an unimportant issue.**
a. is interested in reliability.
Thus, a test can have a high
reliability coeffi cient but still have a low validity coeffi -
cient. Reliability places a ceiling on validity, but validity
does not set the limits on reliability.
Which measure would yield the highest level of reliability?
a. A TAT, projective test popular with psychodynamic helpers.
b. The WAIS-III, a popular IQ test.
- *c. The MMPI-2, a popular personality test.**
- *d. A very accurate scale.**
d. A very accurate scale.
In the real world physical measurements are more reliable than
psychological ones.
- *Construct validity refers to the extent that a test measures an**
- *abstract trait or psychological notion. An example would be**
- *a. height.**
- *b. weight.**
- *c. ego strength.**
- *d. the ability to name all men who have served as U.S. presi-**
- *dents.**
c. ego strength.
Any trait you cannot “directly” measure or observe can be
considered a construct.