Assessment & Testing (Appraisal) Flashcards
- *Appraisal can be defi ned as**
- *a. the process of assessing or estimating attributes.**
- *b. testing which is always performed in a group setting.**
- *c. testing which is always performed on a single individual.**
- *d. a pencil and paper measurement of assessing attributes.**
a. the process of assessing or estimating attributes.
Appraisal is a broad term which includes more than merely
“testing clients.” Appraisal could include a survey, observations,
or even clinical interviews.
score has been assigned to the per-
son’s attribute or performance. An effective counselor will al-
ways inform clients about the limitations of any test that
he or she administers. Some evidence indicates that neo-
phyte counselors are sometimes tempted to administer
tests merely to boost their credibility. I think it is safe to
say this is not a desirable practice.
- *A test can be defi ned as a systematic method of measuring a**
- *sample of behavior. Test format refers to the manner in which**
- *test items are presented. The format of an essay test is consid-**
- *ered a(n) _______ format.**
- *a. subjective.**
- *b. objective.**
- *c. very precise.**
- *d. concise.**
a. subjective.
A “subjective” paradigm relies mainly on the scorer’s opinion. If
the rater knows the test taker’s attributes, the rater’s “personal
bias” can signifi cantly impact upon the rating.
- *The National Counselor Exam (NCE) is a(n) _______ test be-**
- *cause the scoring procedure is specifi c.**
- *a. subjective.**
- *b. objective.**
- *c. projective.**
- *d. subtest.**
b. objective.
Since the NCE uses an a, b, c, d alternative format the rater’s
“subjective” feelings and thoughts would not be an issue.
- *A short answer test is a(n) _______ test.**
- *a. objective.**
- *b. culture free.**
- *c. forced choice.**
- *d. free choice.**
Some exams will call this a “free response” format. In any case,
d. free choice.
the salient point is that the person taking the test can respond
in any manner he or she chooses. Although free choice response
patterns can yield more information, they often take more time
to score and increase subjectivity (i.e., there is more than one
correct answer). I should mention that although testing is
often controversial, schools now employ psychoeduca-
tional tests more than at any time in history.
- *The NCE is a(n) _______ test.**
- *a. free choice.**
- *b. forced choice.**
- *c. projective.**
- *d. intelligence.**
b. forced choice.
“Forced choice” items are sometimes known as “recognition
items.” This book is composed of forced choice/recognition
items. On some tests this format is used to control for the “social
desirability phenomenon” which asserts that the person puts the
answer he or she feels is socially acceptable
The
MMPI-2 or Minnesota Multiphasic Personality Inventory, for
example, uses forced choices to create a “lie scale” composed of
human frailties we all possess.
- *The _______ index indicates the percentage of individuals who**
- *answered each item correctly.**
- *a. diffi culty.**
- *b. critical.**
- *c. intelligence.**
- *d. personal.**
a. diffi culty.
The higher the number of people who answer a question cor-
rectly, the easier the item is—and vice versa. A .5 diffi culty index
(also called a diffi culty value) would suggest that 50% of those
tested answered the question correctly, while 50% did not.
most theorists agree that a “good measure” provides a wide range of
items that even a poor performer will answer correctly.
- *Short answer tests and projective measures utilize free response items. The NCE and the CPCE uses forced choice or so-called**
- *_______ items.**
- *a. vague.**
- *b. subjective.**
- *c. recognition.**
- *d. numerical.**
c. recognition.
Recognition items give the examinee two or more alternatives.
- *A true/false test has _______ recognition items.**
- *a. similar.**
- *b. free choice.**
- *c. dichotomous.**
- *d. no.**
c. dichotomous.
“Dichotomy” simply means that you are presented with two
opposing choices.
- *A test format could be normative or ipsative. In the normative**
- *format**
- *a. each item depends on the item before it.**
- *b. each item depends on the item after it.**
- *c. the client must possess an IQ within the normal range.**
- *d. each item is independent of all other items.**
d. each item is independent of all other items.
Ipsative measures compare traits within the same individ-
ual, they do not compare a person to other persons who
took the instrument. The Kuder Occupational Interest Sur-
vey (KOIS), now called the Kuder Career Search with Person
Match is one such example. The ipsative test allows the person
being tested to compare items.
A client who takes a normative test
- *a. cannot legitimately be compared to others who have tak-**
- *en the test.**
- *b. can legitimately be compared to others who have taken**
- *the test.**
- *c. could not have taken an IQ test.**
- *d. could not have taken a personality test.**
- *b. can legitimately be compared to others who have taken**
- *the test.**
Technically, a normative interpretation is one in which the individual’s score is
evaluated by comparing it to others who took the same test. A
percentile rank is an excellent example. Say your client scores an
82 on a nationally normed test and this score corresponds to the
percentile rank of 60. This tells you that 60% of the individuals
who took the test scored 82 or less.
In an ipsative measure the person taking the test must compare items to one another. The result is that
a. an ipsative measure cannot be utilized for career guidance.
- *b. you cannot legitimately compare two or more people who**
- *have taken an ipsative test.**
- *c. an ipsative measure is never valid.**
- *d. an ipsative measure is never reliable.**
- *b. you cannot legitimately compare two or more people who**
- *have taken an ipsative test.**
Since the ipsative measure does not reveal absolute
strengths, comparing one person’s score to another is relatively
meaningless. The person is measured in response to his or her
own standard of behavior. The ipsative measure points out the
highs and lows that exist within a single individual. Hence, when
a colleague tells you that Mr. Johnson’s anxiety is improving, she
has given you an ipsative description. This description, however,
would not lend itself to comparing Mr. Johnson’s anxiety to Mrs.
McBee’s.
- *Tests are often classified as speed tests versus power tests. A**
- *timed typing test used to hire secretaries would be**
- *a. a power test.**
- *b. neither a speed test nor a power test.**
- *c. a speed test.**
- *d. a fi ne example of an ipsative measure.**
c. a speed test.
A good timed/
speed test is purposely set up so that nobody fi nishes it. A
“power test” (see choice “a”) is designed to evaluate the level of
mastery without a time limit. A timed test is really a type of
speed test, but a high percentage of the test takers com-
plete it and it is usually more diffi cult and has a time limit
(think NCE).
A counseling test consists of 300 forced response items. The person taking the test can take as long as he or she wants to answer
- *the questions.**
- *a. This is most likely a projective measure.**
- *b. This is most likely a speed test.**
- *c. This is most likely a power test.**
- *d. This is most likely an invalid measure.**
c. This is most likely a power test.
Like the speed test, it will ideally be designed so that nobody
receives a perfect score. Choice “a,” projective measure, stands
incorrect since the projective tests rely on a “free response” for-
mat.
- *An achievement test measures maximum performance while a**
- *personality test or interest inventory measures**
- *a. typical performance.**
- *b. minimum performance.**
- *c. unconscious traits.**
- *d. self-esteem by always relying on a Q-Sort design.**
a. typical performance.
Interest inventories are
popular with career counselors because such measures focus on
what the client likes or dislikes. The Strong Interest Inventory
(SII) is an excellent example. Choice “d,” the Q-Sort, often used
to investigate personality traits, involves a procedure in which
an individual is given cards with statements and asked to place
them in piles of “most like me” to “least like me.” Then the sub-
ject compiles them to create the “ideal self.” The ideal self can
then be compared to his or her current self-perception in order
to assess self-esteem.
In a spiral test
- *a. the items get progressively easier.**
- *b. the diffi culty of the items remains constant.**
- *c. the client must answer each question in a specifi ed period**
- *of time.**
- *d. the items get progressively more diffi cult.**
d. the items get progressively more diffi cult.
a type of intelligence assessment in which the focused themes being evaluated are distributed throughout the test, instead of being grouped together, and become increasingly difficult as the test progresses
Just remember that a spiral staircase seems to get more diffi cult
to climb as you walk up higher.
In a cyclical test
- *a. the items get progressively easier.**
- *b. the diffi culty of the items remains constant.**
- *c. you have several sections which are spiral in nature.**
- *d. the client must answer each question in a specifi ed period**
- *of time.**
c. you have several sections which are spiral in nature.
In each section the questions would go from easy ones to those
which are more diffi cult.
Cyclical testing is designed so that subsequent design iterations can take advantage of usability findings from previous rounds. Design is refined with good usability that is confirmed with users, ensuring a final product that is both easy and satisfying to use.
A test battery is considered
- *a. a horizontal test.**
- *b. a vertical test.**
- *c. a valid test.**
- *d. a reliable test.**
a. a horizontal test.
In a test battery, several measures are used to produce
results that could be more accurate than those derived
from merely using a single source. Say, this can get confus-
ing. Remember, that in the section on group processes I talked
about vertical and horizontal interventions. In testing, a verti-
cal test would have versions for various age brackets or levels of
education (e.g., a math achievement test for preschoolers and a
version for middle-school children). A horizontal test measures
various factors (e.g., math and science) during the same testing
procedure.
- *In a counseling research study two groups of subjects took a test**
- *with the same name. However, when they talked with each other**
they discovered that the questions were different. The researcher assured both groups that they were given the same test. How is this possible?
- *a. The researcher is not telling the truth. The groups could**
- *not possibly have taken the same test.**
- *b. The test was horizontal.**
- *c. The test was not a power test.**
- *d. The researcher gave parallel forms of the same test.**
d. The researcher gave parallel forms of the same test.
When a test has two versions or forms that are interchangeable
they are termed parallel forms or equivalent forms of the same
test. From a statistical/psychometric standpoint each form must
have the same mean, standard error, and other statistical compo-
nents.
The most critical factors in test selection are
- *a. the length of the test and the number of people who took**
- *the test in the norming process.**
- *b. horizontal versus vertical.**
- *c. validity and reliability.**
- *d. spiral versus cyclical format.**
c. validity and reliability.
Validity refers to whether the test measures what it says it mea-
sures while reliability tells how consistent a test measures an at-
tribute.
Which is more important, validity or reliability?
- *a. Reliability.**
- *b. They are equally important.**
- *c. Validity.**
- *d. It depends on the test in question.**
c. Validity.
Experts nearly always consider validity the number one
factor in the construction of a test. A test must measure
what it purports to measure. Reliability, choice “a,” is the
second most important concern. A scale, for example, must
measure body weight accurately if it is a valid instrument.
In order to be reliable, it will need to give repeated read-
ings which are nearly identical for the same person if the
person keeps stepping on and off the scale.
In the field of testing, validity refers to
a. whether the test really measures what it purports to measure.
- *b. whether the same test gives consistent measurement.**
- *c. the degree of cultural bias in a test.**
- *d. the fact that numerous tests measure the same traits.**
a. whether the test really measures what it purports to measure.
There are fi ve basic
types of validity you should familiarize yourself with for
your exam: First, content validity or what is sometimes called
rational or logical validity. Second,
construct validity, which refers to a test’s ability to measure a
theoretical construct like intelligence, self-esteem, artistic tal-
ent, mechanical ability, or managerial potential. Third is con-
current validity, which deals with how well the test compares
to other instruments that are intended for the same purpose.
Fourth, predictive validity, also known as empirical validity,
which refl ects the test’s ability to predict future behavior accord-
ing to established criteria. On some exams, concurrent validity
and predictive validity are often lumped under the umbrella
of “criterion validity,” since concurrent validity and predictive
validity are actually different types of criterion-related validity.
Fifth, a small body of literature speaks of consequential valid-
- *ity, which simply tries to ascertain the social implications of us-**
- *ing tests**.
- *A counselor peruses a testing catalog in search of a test which**
- *will repeatedly give consistent results. The counselor**
- *a. is interested in reliability.**
- *b. is interested in validity.**
- *c. is looking for information which is not available.**
- *d. is magnifying an unimportant issue.**
a. is interested in reliability.
Thus, a test can have a high
reliability coeffi cient but still have a low validity coeffi -
cient. Reliability places a ceiling on validity, but validity
does not set the limits on reliability.
Which measure would yield the highest level of reliability?
a. A TAT, projective test popular with psychodynamic helpers.
b. The WAIS-III, a popular IQ test.
- *c. The MMPI-2, a popular personality test.**
- *d. A very accurate scale.**
d. A very accurate scale.
In the real world physical measurements are more reliable than
psychological ones.
- *Construct validity refers to the extent that a test measures an**
- *abstract trait or psychological notion. An example would be**
- *a. height.**
- *b. weight.**
- *c. ego strength.**
- *d. the ability to name all men who have served as U.S. presi-**
- *dents.**
c. ego strength.
Any trait you cannot “directly” measure or observe can be
considered a construct.
Face validity refers to the extent that a test
- *a. looks or appears to measure the intended attribute.**
- *b. measures a theoretical construct.**
- *c. appears to be constructed in an artistic fashion.**
- *d. can be compared to job performance.**
most experts technically no longer list “face validity” as a sixth
type of validity. Face validity—like a person’s face—merely tells
you whether the test looks like it measures the intended trait.
Face validity is
not required test information according to the 1974 committee
that drafted Standards for Educational and Psychological Tests.
a. looks or appears to measure the intended attribute.
- *A job test which predicted future performance on a job very well would**
- *a. have high criterion/predictive validity.**
- *b. have excellent face validity.**
- *c. have excellent construct validity.**
- *d. not have incremental validity or synthetic validity.**
a. have high criterion/predictive validity.
Here you are concerned that the test will measure an indepen-
dent or external outside “criterion,” in this case the “future pre-
diction” of the job performance.
Choice “d” introduces you to
the terms incremental validity and synthetic validity. Although
incremental validity and synthetic validity are not considered two
of the fi ve or six major types of validity, don’t be too surprised if
they pop up on an advanced exam question.
- *A new IQ test which yielded results nearly identical to other**
- *standardized measures would be said to have**
- *a. good concurrent validity.**
- *b. good face validity.**
- *c. superb internal consistency.**
- *d. all of the above.**
a. good concurrent validity.
Criterion validity could be “concurrent” or “predictive.” Con-
current validity answers the question of how well your test stacks
up against a well-established test that measures the same be-
havior, construct, or trait.
Evidence for reliability and validity
is expressed via correlation coeffi cients. Suffi ce to say that the
closer they are to 1.00 the better.
The
relationship or correlation of a test to an independent measure
or trait is known as convergent validity. Convergent validity is
actually a method used to assess a test’s construct/criterion va-
lidity by correlating test scores with an outside source.
The test also should show discriminant va-
lidity. This means the test will not refl ect unrelated variables.
When a counselor tells a client that the Graduate Record Examination (GRE) will predict her ability to handle graduate work, the counselor is referring to
- *a. good concurrent validity.**
- *b. construct validity.**
- *c. face validity.**
- *d. predictive validity.**
d. predictive validity.
The Graduate Record Examination (GRE), the Scholastic Apti- tude Test (SAT), the American College Test (ACT), and public
opinion polls are effective only if they have high predictive valid-
ity, which is the power to accurately describe future behavior or
events. Again the subtypes of criterion validity are concurrent
and predictive.
- *A reliable test is _______ valid.**
- *a. always.**
- *b. 90%.**
- *c. not always.**
- *d. 80%.**
c. not always.
A reliable test is not always
valid. Reliability, nonetheless, determines the upper level of va-
lidity.
- *A valid test is _______ reliable.**
- *a. not always.**
- *b. always.**
- *c. never.**
- *d. 80%.**
b. always.
A valid test is always reliable. Choice
- *One method of testing reliability is to give the same test to the**
- *same group of people two times and then correlate the scores.**
- *This is called**
- *a. test–retest reliability.**
- *b. equivalent forms reliability.**
- *c. alternate forms reliability.**
- *d. the split-half method.**
a. test–retest reliability.
well-known test–retest method discussed here tests for “stabil-
ity,” which is the ability of a test score to remain stable or fl uctu-
ate over time when the client takes the test again. When using
the test–retest paradigm the client generally takes the same test
after waiting at least seven days. The test–retest procedure is
only valid for traits such as IQ which remain stable over time and
are not altered by mood, memory, or practice effects.
One method of testing reliability is to give the same population alternate forms of the identical test. Each form will have the same psychometric/statistical properties as the original instrument. This is known as
- *a. test–retest reliability.**
- *b. equivalent or alternate forms reliability.**
- *c. the split-half method.**
- *d. internal consistency.**
b. equivalent or alternate forms reliability.
Here a single group of examinees takes parallel forms of a test
and a reliability correlation coeffi cient is fi gured on the two sets
of scores. Counterbalancing is necessary when testing reliability
in this fashion. That is to say, half of the individuals get parallel
form A fi rst and half get form B initially. This controls for vari-
ables such as fatigue, practice, and motivation.
A counselor doing research decided to split a standardized test in half by using the even items as one test and the odd items as a second test and then correlating them. The counselor
- *a. used an invalid procedure to test reliability.**
- *b. was testing reliability via the split-half method.**
- *c. was testing reliability via the equivalent forms method.**
- *d. was testing reliability via the inter-rater method.**
b. was testing reliability via the split-half method.
In this situation the individual takes the entire test as a whole
and then the test is divided into halves. The correlation between
the half scores yields a reliability coeffi cient.
- *Which method of reliability testing would be useful with an es-**
- *say test but not with a test of algebra problems?**
- *a. test–retest.**
- *b. alternate forms.**
- *c. split-half.**
- *d. interrater/interobserver.**
d. interrater/interobserver.
What is Interscorer/Interrater/Interobserver Reliability? An assessment of the correlation between two or more rater, observers or scorers. The degree to which they would agree on the scoring of a test or interpretation of observed behaviors
choice “d,” several raters assess the same performance.
This method has been called “scorer reliability” and is utilized
with subjective tests such as projectives to ascertain whether the
scoring criteria are such that two persons who grade or assess
the responses will produce roughly the same score.
A reliability coefficient of 1.00 indicates
- *a. a lot of variance in the test.**
- *b. a score with a high level of error.**
- *c. a perfect score which has no error.**
d. a typical correlation on most psychological and counseling tests.
c. a perfect score which has no error
As stated earlier, this generally occurs only in physical measure-
ment.
- *An excellent psychological or counseling test would have a reli-**
- *ability coeffi cient of**
- *a. 50.**
- *b. .90.**
- *c. 1.00.**
- *d. −.90.**
b. .90.
Ninety percent of the score measured the attribute in question,
while 10% of the score is indicative of error.
- *A researcher working with a personality test discovers that the**
- *test has a reliability coefficient of .70 which is somewhat typical. This indicates that:**
- *a. 70% of the score is accurate while 30% is inaccurate.**
- *b. 30% of the people who are tested will receive accurate**
- *scores.**
- *c. 70% of the people who are tested will receive accurate**
- *scores.**
- *d. 30% of the score is accurate while 70% is inaccurate.**
a. 70% of the score is accurate while 30% is inaccurate.
Seventy percent of the obtained score on the test represented
the true score on the personality attribute, while 30% of the ob-
tained score could be accounted for by error. Seventy percent is
true variance while 30% constitutes error variance.
- *A career counselor is using a test for job selection purposes. An**
- *acceptable reliability coeffi cient would be _______ or higher.**
- *a. .20.**
- *b. .55.**
- *c. .80.**
- *d. .70.**
c. .80.
This is a tricky question. Although .70 is generally acceptable for
most psychological attributes, for admissions for jobs, schools,
and so on, it should be at least .80 and some experts will not
settle for less than .90.
- *The same test is given to the same group of people using the**
- *test–retest reliability method. The correlation between the first and second administration is .70. The true variance (i.e., the percentage of shared variance or the level of the same thing measured in both) is:**
- *a. 70%.**
- *b. 100%.**
- *c. 50%.**
- *d. 49%.**
d. 49%.
To demonstrate the variance of one factor accounted for by another you merely square the correlation (i.e., reliability coefficient).
So .70 × .70 = .49. .49 × 100 = 49%. Your exam could refer to this principle as the coefficient of determination.
IQ means
- *a. a query of intelligence.**
- *b. indication of intelligence.**
- *c. intelligence quotient.**
- *d. intelligence questions for test construction.**
c. intelligence quotient.
IQ testing has been the center of more heated
debates among experts than any other type of testing.