Assessment And Test Flashcards
Appraisal can be defined as
a. the process of assessing or estimating attributes.
b. testing which is always performed in a group setting.
c. testing which is always performed on a single individual.
d. a pencil and paper measurement of assessing attributes.
The process of assessing or estimating attributes.
Appraisal is a broad term which includes more than merely “testing clients.” Appraisal could include a survey, observations, or even clinical interviews.
A test can be defined as a systematic method of measuring a sample of behavior. Test format refers to the manner in which test items are presented. The format of an essay test is considered a(n) ________ format.
a. subjective
b. objective
c. very precise
d. concise
Subjective.
A “subjective” paradigm relies mainly on the scorer’s opinion. If the rater knows the test taker’s attributes, the rater’s “personal bias” can significantly impact upon the rating. For example, an attractive examinee might be given a higher rating. (This is the so-called halo effect.)
The National Counselor Exam (NCE) is a(n) ________ test because the scoring procedure is specific.
a. subjective
b. objective
c. projective
d. subtest
Objective.
A short answer test is a(n) ________ test.
a. objective
b. culture-free
c. forced choice
d. free choice
Free Choice.
Some exams will call this a “free response” format. In any case, the salient point is that the person taking the test can respond in any manner he or she chooses. Although free choice response patterns can yield more information, they often take more time to score and increase subjectivity (i.e., there is more than one correct answer).
The NCE and the CPCE would be examples of a(n) ________ test.
a. free choice
b. forced choice
c. projective
d. intelligence
Forced Choice.
Forced choice” items are sometimes known as “recognition items.” This book is composed of forced choice/recognition items. On some tests this format is used to control for the “social desirability phenomenon” which asserts that the person puts the answer he or she feels is socially acceptable (i.e., the test provides alternatives that are all equal in terms of social desirability). The MMPI-2 (Minnesota Multiphasic Personality Inventory), for example, uses forced choices to create a “lie scale” composed of human frailties we all possess. This scale, therefore, ferrets out those individuals who tried to make themselves look good (i.e., the way they believe they “should” be).
The ________ index indicates the percentage of individuals who
answered each item correctly.
a. difficulty
b. critical
c. intelligence
d. personal
Difficulty.
The higher the number of people who answer a question correctly, the easier the item is—and vice versa. A 0.5 difficulty index (also called a difficulty value) would suggest that 50% of those tested answered the question correctly, while 50% did not. Most theorists agree that a “good measure” provides a wide range of items that even a poor performer will answer correctly.
Short answer tests and projective measures utilize free response items. The NCE and the CPCE uses forced choice or so-called ________ items.
a. vague
b. subjective
c. recognition
d. numerical
Recognition.
Recognition items give the examinee two or more alternatives.
A true/false test has ________ recognition items.
a. similar
b. free choice
c. dichotomous
d. no
Dichotomous
“Dichotomy” simply means that you are presented with two opposing choices. This explains why choice “a” is definitely incorrect. When a test gives the person taking the exam three or more forced choices (e.g., the NCE, the CPCE, or this book) then psychometricians call it a “multipoint item.”
A test format could be normative or ipsative. In the normative
format
a. each item depends on the item before it.
b. each item depends on the item after it.
c. the client must possess an IQ within the normal range.
d. each item is independent of all other items.
Each item is independent of all other items.
Ipsative measures compare traits within the same individual; they do not compare a person to other persons who took the instrument. The Kuder Career Planning instruments are often cited as falling into this category. The ipsative measure allows the person being tested to compare items.
A client who takes a normative test
a. cannot legitimately be compared to others who have taken the test.
b. can legitimately be compared to others who have taken the test.
c. could not have taken an IQ test.
d. could not have taken a personality test.
Can legitimately be compared to others who have taken the test.
In an ipsative measure the person taking the test must compare
items to one another. The result is that
a. an ipsative measure cannot be utilized for career guidance.
b. you cannot legitimately compare two or more people who
have taken an ipsative test.
c. an ipsative measure is never a forced choice format.
d. an ipsative measure is never reliable.
You cannot legitimately compare two or more people who have taken an ipsative test.
Since the ipsative measure does not reveal absolute strengths, comparing one person’s score to another is relatively meaningless.
Tests are often classified as speed tests versus power tests. A timed typing test used to hire secretaries would be
a. a power test.
b. neither a speed test nor a power test.
c. a speed test.
d. a fine example of an ipsative measure.
A speed test.
In terms of difficulty, a speed test is really intended to be fairly easy. The difficulty is induced by time limitations, not the difficulty of the tasks or the questions themselves.
A counseling test consists of 300 forced response items. The person taking the test can take as long as he or she wants to answer the questions.
a. This is most likely a projective measure.
b. This is most likely a speed test.
c. This is most likely a power test.
d. This is most likely an invalid measure.
This is most likely a power test.
An achievement test measures maximum performance or present level of skill. Tests of this nature are also called attainment tests, while a personality test or interest inventory measures
a. typical performance.
b. minimum performance.
c. unconscious traits.
d. self-esteem by always relying on a Q-Sort design.
Typical performance.
In a spiral test
a. the items get progressively easier.
b. the difficulty of the items remains constant.
c. the client must answer each question in a specified period
of time.
d. the items get progressively more difficult.
The items get progressively more difficult.
Just remember that a spiral staircase seems to get more difficult to climb as you walk up higher.
In a cyclical test
a. the items get progressively easier.
b. the difficulty of the items remains constant.
c. you have several sections which are spiral in nature.
d. the client must answer each question in a specified period
of time.
You have several sections which are spiral in nature.
In each section the questions would go from easy ones to those
which are more difficult.
A test battery is considered
a. a horizontal test.
b. a vertical test.
c. a valid test.
d. a reliable test.
A horizontal test.
In a test battery, several measures are used to produce results that could be more accurate than those derived from merely using a single source.
In a counseling research study, two groups of subjects took a test with the same name. However, when they talked with each other they discovered that the questions were different. The researcher assured both groups that they were given the same test. How is this possible?
a. The researcher is not telling the truth. The groups could not possibly have taken the same test.
b. The test was horizontal.
c. The test was not a power test.
d. The researcher gave parallel forms of the same test.
The researcher gave parallel forms of the same test.
When a test has two versions or forms that are interchangeable they are termed parallel forms or equivalent forms of the same test. From a statistical/psychometric standpoint each form must have the same mean, standard error, and other statistical components.
The most critical factors in test selection are
a. the length of the test and the number of people who took the test in the norming process.
b. horizontal versus vertical.
c. validity and reliability.
d. spiral versus cyclical format.
Validity and reliability.
Validity refers to whether the test measures what it says it measures while reliability tells how consistent a test measures an attribute.
Which is more important, validity or reliability?
a. Reliability.
b. They are equally important.
c. Validity.
d. It depends on the test in question.
Validity.
Experts nearly always consider validity the number one factor in the construction of a test. A test must measure what it purports to measure.
In the field of testing, validity refers to
a. whether the test really measures what it purports to measure.
b. whether the same test gives consistent measurement.
c. the degree of cultural bias in a test.
d. the fact that numerous tests measure the same traits.
Whether the test really measures what it purports to measure.
A counselor peruses a testing catalog in search of a test which
will repeatedly give consistent results. The counselor
a. is interested in reliability.
b. is interested in validity.
c. is looking for information which is not available.
d. is magnifying an unimportant issue.
Is interested in reliability.
Which measure would yield the highest level of reliability?
a. A TAT, projective test popular with psychodynamic helpers.
b. The WAIS-IV, a popular IQ test.
c. The MMPI-2, a popular personality test.
d. A very accurate postage scale.
A very accurate postage scale.
In the real world physical measurements are more reliable than psychological ones.
Construct validity refers to the extent that a test measures an abstract trait or psychological notion. An example would be
a. height.
b. weight.
c. ego strength.
d. the ability to name all men who have served as U.S. presidents.
Ego Strength.
Any trait you cannot “directly” measure or observe can be considered a construct.
Face validity refers to the extent that a test
a. looks or appears to measure the intended attribute.
b. measures a theoretical construct.
c. appears to be constructed in an artistic fashion.
d. can be compared to job performance.
Looks or appears to measure the intended attribute.
Face validity—like a person’s face—merely tells you whether the test looks like it measures the intended trait.
A job test which predicted future performance on a job very well would
a. have high criterion/predictive validity.
b. have excellent face validity.
c. have excellent construct validity.
d. not have incremental validity or synthetic validity.
Have high criterion/predictive validity.
A new IQ test which yielded results nearly identical to other standardized measures would be said to have
a. good concurrent validity.
b. good face validity.
c. superb internal consistency.
d. all of the above.
Good concurrent validity.
Concurrent validity answers the question of how well your test stacks up against a well-established instrument that measures the same behavior, construct, or trait.
When a counselor tells a client that the Graduate Record Examination (GRE) will predict her ability to handle graduate work, the counselor is referring to
a. good concurrent validity.
b. construct validity.
c. face validity.
d. predictive validity.
Predictive validity.
The Graduate Record Examination (GRE), the Scholastic Aptitude Test (SAT), the American College Test (ACT), and public opinion polls are effective only if they have high predictive validity, which is the power to accurately describe future behavior or events. Again the subtypes of criterion validity are concurrent and predictive.
A reliable test is ________ valid.
a. always
b. 90%
c. not always
d. 80%
Not always
A valid test is ________ reliable.
a. not always
b. always
c. never
d. 80%
Always
One method of testing reliability is to give the same test to the same group of people two times and then correlate the scores. This is called
a. test–retest reliability.
b. equivalent forms reliability.
c. alternate forms reliability.
d. the split-half method.
Test–retest reliability.
One method of testing reliability is to give the same population alternate forms of the identical test. Each form will have the same psychometric/statistical properties as the original instrument. This is known as
a. test–retest reliability.
b. equivalent or alternate forms reliability.
c. the split-half method.
d. internal consistency.
Equivalent or alternate forms reliability.
A counselor doing research decided to split a standardized test in half by using the even items as one test and the odd items as a second test and then correlating them. The counselor
a. used an invalid procedure to test reliability.
b. was testing reliability via the split-half correlation method.
c. was testing reliability via the equivalent forms method.
d. was testing reliability via the inter-rater method.
Was testing reliability via the split-half correlation method.
Which method of reliability testing would be useful with an essay test but not with a test of algebra problems?
a. Test–retest.
b. Alternate forms.
c. Split-half.
d. Inter-rater/inter-observer.
Inter-rater/inter-observer.
This method is also called “scorer reliability” and is utilized with subjective tests such as projectives to ascertain whether the scoring criteria are such that two persons who grade or assess
the responses will produce roughly the same score.
A reliability coefficient of 1.00 indicates
a. a lot of variance in the test.
b. a score with a high level of error.
c. a perfect score which has no error.
d. a typical correlation on most psychological and counseling
tests.
A perfect score which has no error.
This generally occurs only in physical
measurement.
An excellent psychological or counseling test would have a reliability coefficient of
a. 50.
b. .90.
c. 1.00.
d. –.90.
.90.
Ninety percent of the score measured the attribute in question, while 10% of the score is indicative of error
A researcher working with a personality test discovers that the test has a reliability coefficient of .70 which is somewhat typical. This indicates that
a. 70% of the score is accurate while 30% is inaccurate.
b. 30% of the people who are tested will receive accurate scores.
c. 70% of the people who are tested will receive accurate
scores.
d. 30% of the score is accurate while 70% is inaccurate.
70% of the score is accurate while 30% is inaccurate.
Seventy percent of the obtained score on the test represented the true score on the personality attribute, while 30% of the obtained score could be accounted for by error. Seventy percent is true variance while 30% constitutes error variance.
A career counselor is using a test for job selection purposes. An
acceptable reliability coefficient would be ________ or higher.
a. .20
b. .55
c. .80
d. .70
.80
Although .70 is generally acceptable for most psychological attributes, for admissions for jobs, schools, and so on, it should be at least .80 and some experts will not settle for less than .90.
The same test is given to the same group of people using the test–retest reliability method. The correlation between the first and second administration is .70. The true variance (i.e., the percentage of shared variance or the level of the same thing measured in both) is
a. 70%.
b. 100%.
c. 50%.
d. 49%.
70%
Here’s the key to simplifying a question such as this. To demonstrate the variance of one factor accounted for by another you merely square the correlation (i.e., reliability coefficient). So .70 × .70 = .49 and .49 × 100 = 49%. Your exam could refer to this principle as the coefficient of determination
IQ means
a. a query of intelligence.
b. indication of intelligence.
c. intelligence quotient.
d. intelligence questions for test construction.
Intelligence quotient.