Assessment and Testing Flashcards
Appraisal is?
the process of assessing or estimating attributes
A test can be defined as a systematic method of measuring a sample of behavior. Test format refers to the manner in which test items are presented. The format of an essay test is considered what format?
subjective
The NCE is what kind of test based on the specific scoring procedure?
objective
A short answer test is what kind of test?
free-choice
The NCE and the CPCE would be examples of what kind of test?
forced choice
The _____ index indicates the percentage of individuals who answered each item correctly.
difficulty
Short answer tests and projective measures utilize free response items. The NCE and CPCE uses forced choice or so-called ____ items.
recognition
A true/false test has ____ recognition items.
dichotomous
A test format could be normative or ipsative. In the normative format…
each item is independent of all other items
What do ipsative measures compare?
traits within the same individual, not a person to other persons who took the instrument
A client who takes a normative test…
can legitimately be compared to others who have taken the test
In an ipsative measure, the person taking the test must compare items to one another. The result is that…
you can’t legitimately compare 2 or more people who have taken an ipsative test
Tests are often classified as speed tests vs. power tests. A timed typing test used to hire secretaries would be…
a speed test
A counseling test consists of 300 forced response items. The person taking the test can take as long as they want to answer the questions. This is most likely…
a power test
T/F: In a power test, time is not an issue.
true
An achievement test measures maximum performance or present level of skill. Tests of this nature are also called attainment tests, while a personality test or interest inventory measures…
typical performance
What is an interest inventory?
popular with career counselors; focuses on what the client likes and dislikes
In a spiral test…
the items get progressively more difficult
In a cyclical test…
you have several sections which are spiral in nature
A test battery is considered…
a horizontal test
In a test battery…
several measures are used to produce results that could be more accurate than those derived from merely using a single source
What does a horizontal test measure?
various factors during the same testing procedure
In a counseling research study, 2 groups of subjects took a test with the same name. However, when they talked with each other, they discovered that the questions were different. The researcher assured both groups that they were given the same test. How is that possible?
The researcher gave parallel forms of the same test
When a test has 2 versions or forms that are interchangeable, they’re called…
parallel forms or equivalent forms
The most critical factors in test selection are…
validity and reliability
Validity refers to…
whether the test measures what it says it measures
Reliability refers to…
how consistent a test measures an attribute
Which is more important, validity or reliability? Why?
validity because a test must measure what it claims to measure
There are ___ basic types of validity
5
What are the basic types of validity?
content, construct, concurrent, predictive, consequentual
Content validity is?
“rational or logical validity”; does the test examine or sample the behavior under scrutiny?
Construct validity is?
refers to a test’s ability to measure a theoretical construct; refers to the extent that a test measures an abstract trait or psychological notion
Concurrent validity is?
deals with how well the test compares to other instruments that are intended for the same purpose
Predictive validity is?
“empirical validity”; reflects the test’s ability to predict future behavior according to established criteria
Consequential validity is?
tries to find the social implications of using tests
A counselor peruses a testing catalog in search of a test which will repeatedly give consistent results. The counselor…
is interested in reliability
Which measure would yield the highest level of reliability?
a very accurate postage scale
Construct validity refers to the extent that a test measures an abstract trait or psychological notion. An example would be:
ego strength
Face validity refers to the extent that a test…
looks to measure the intended attribute
A job test which predicted future performance on a job very well would…
have high criterion/predictive validity
A new IQ test which yielded results nearly identical to other standardized measures would be said to have…
good concurrent validity
When a counselor tells a client that the GRE will predict their ability to handle graduate work, the counselor is referring to…
predictive validity
A reliable test is ___ valid.
not always
A valid test is ___ reliable.
always
One method of testing reliability is to give the same test to the same group of people 2 times and then correlate the scores. This is called:
test-retest reliability
One method of testing reliability is to give the same population alternate forms of the identical test. Each form will have the same psychometric/statistical properties as the original instrument. This is known as:
equivalent or alternate forms reliability
A counselor doing research decided to split a standardized test in half by using the even items as one test and the odd items as a second test and then correlating them. The counselor…
was testing reliability via the split-half correlation method
Which method of reliability testing would be useful with an essay test but not with a test of algebra problems?
inter-rater/inter-observer
A reliability coefficient of 1.00 indicates…
a perfect score which has no error
An excellent psychological or counseling test would have a reliability coefficient of…
.90
A researcher working with a personality test discovers that the test has a reliability coefficient of .70 which is somewhat typical. This indicates that…
70% of the score is accurate while 30% is inaccurate
A career counselor is using a test for job selection purposes. An acceptable reliability coefficient would be ___ or higher.
.80
The same test is given to the same group of people using the test-retest reliability method. The correlation between the first and second administration is .70. The true variance is…
49%
To demonstrate the variance of one factor accounted for by another…
you square the correlation
IQ means?
intelligence quotient
____ did research and concluded that intelligence was normally distributed like height or weight and that it was primarily genetic.
Galton
Francis Galton felt intelligence was…
a unitary faculty
J. P. Guilford isolated 120 factors which added up to intelligence. He is also remembered for his…
thoughts on convergent and divergent thinking
Convergent thinking occurs when…
divergent thoughts and ideas are combined into a singular concept
Divergent thinking is…
the ability to generate a novel idea
A counselor is told by his supervisor to measure the internal consistency reliability of a test but not to divide the test in halves. The counselor would need to use…
the Kuder-Richardson coefficients of equivalence
The first intelligence test was created by…
Alfred Binet and Theodore Simon
Today, the Stanford-Binet IQ test is…
a standardized measure
IQ (intelligence quotient) is expressed by…
MA/CA x100
The Binet stressed age-related tasks. Utilizing this method, a 9 y/o task would be one which
50% of the 9 y/o’s could answer correctly
Simon and Binet pioneered the first IQ test around 1905; it was created to…
discriminate children without an intellectual disability from children with an intellectual disability
Today the Stanford-Binet is used from 2 y/o to adulthood. The IQ formula has been replaced by the…
SAS (standard age score)
Most experts would agree that the Wechsler IQ tests gained popularity, as the Binet…
didn’t seem to be the best test for adults
The best IQ test for a 22 year old single male would be the…
WAIS-IV
Describe the WAIS-IV
based on neurocognitive research and the Cattell-Horn-Carroll leading theory of human intelligence
can be administered and scored online, takes about 60-90 minutes
10 subject areas that make up 4 index scores: verbal comprehensive index (VCI), perceptual reasoning index (PRI), working memory index (WMI), and processing speed index (PSI)
FSIQ = full scale IQ, sport a mean of 100 with a standard deviation of 15
less emphasis than the previous version on crystallized intelligence
can measure IQ from 40-160
Appropriate age range for the Wechsler Preschool and Primary Scale of Intelligence (WPPSI)?
2 years and 6 months to 7 years and 7 months
Appropriate age range for the Wechsler Adult Intelligence Scale (WAIS-IV)?
16-90 years old
Appropriate age range for the Wechsler Intelligence Scale for Children (WISC-IV)?
6 years to 16 years 11 months
The best intelligence test for a 6th grade girl would be the?
WISC-IV
The best intelligence test for a kindergartner would be the…
WPPSI-IV
When a test is guided via a theory, it is known as…
theory-based test or inventory
The mean on the Wechsler and the Stanford-Binet Intelligence scales (SB5) is ____ and the standard deviation is ______
100; 15 Wechsler, 16 Stanford-Binet
Group IQ tests like the Otis-Lennon, the Lorge-Thorndike, and the California Test of Mental Abilities are popular in school settings. The advantage is that…
group tests are quicker to administer
The group IQ test movement began…
with the Army Alpha and Army Beta in World War I
In a culture-fair test…
items are known to the subject regardless of their culture
The black versus white IQ controversy was sparked mainly by a 1969 article written by
Arthur Jensen
Who was John Ertl?
claims he invented an electronic machine to analyze neural efficiency and take the place of paper and pencil IQ test. the device relies on a computer, an EEG, a strobe light, and an electrode helmet
the theory is that the faster one processes the perception, the more intelligence they have
Who was Raymond B. Cattell?
responsible for the fluid (inherited neurological that decreases with age and isn’t very dependent on culture) and crystallized intelligence (intelligence from experiential, cultural, and educational interaction)
Who was Arthur Jensen?
he suggested in a 1969 article that the closer people are genetically, the more alike their IQ scores
claimed that whites score 11-15 IQ points higher than African Americans, regardless of social class
his theory stated that due to slavery, it was possible that African Americans were bred for strength rather than intelligence
he estimated that heredity contributed 80% while environment influenced 20% of the IQ
Who was Robert Williams?
created the Black Intelligence Test of Cultural Homogeneity (BITCH) to demonstrate that African Americans often excelled when given a test laden with questions whose answers would be familiar to members of the African American community
charged that tests like the Binet and Wechsler were part of “scientific racism”
The MMPI-2 is
a standardized personality test; intended to help clinicians diagnose and treat patients; said to have retained the best factors of its original version while updating the test and eliminating sexist wording
The word psychometric means
any form of mental testing
The Myers-Briggs Type Indicator reflects the work of
Carl Jung
The counselor who favors projective measures would most likely be a
psychodynamic clinician
An aptitude test is to ___ as an achievement test is to ____
potential; what has been learned
Both the Rorschach and the Thematic Apperception Test (TAT) are projective tests. The Rorschach uses 10 inkblot cards while the TAT uses…
pictures
Describe the TAT
consists of 31 cards; intended for ages 4+; uses up to 20 cards when administered to any given individual; pictures on cards are intentionally ambiguous; client is asked to make up a story for each card
Test bias primarily results from
a test being normed solely on white middle-class clients
A counselor who fears the client has an organic, neurological, or motoric difficulty would most likely use the
Bender Gestalt II
Describe the Bender Gestalt II
an expressive projective measure; suitable for ages 4+; client’s instructed to copy 16 geometric figures which they can look at while constructing their drawing
An interest inventory would be least valid when used with
an 8th grade male with an IQ of 136
One major criticism of interest inventories is that
they have far too many questions
Interest inventories are positive in the sense that
they’re reliable and not threatening to the test taker
A counselor who had an interest primarily in testing would most likely be a member of
AARC, or Association for Assessment and Research in Counseling
The Hamilton Rating Scale for Depression is intended to?
determine the severity of diagnosed depression
A t-score has a mean of ___ and a standard deviation of ____
50; 10
An IQ score has a mean of ____ and a standard deviation of _____
100; 15
What is a z score?
expresses the number of standard deviations that a raw score is from the mean
A scatter plot depicts…
pairs of scores
x-axis shows one variable, y-axis shows another variable
When should testing not be used?
labeling some individuals
What’s meant by the term “regression toward the mean”?
most scoring very high or low on a pretest will score nearer the mean on a posttest
In “rater bias”, a supervisor who rates an employee negatively overall simply because of one very negative attribute is using what type of rater bias?
horns
With the horns type of bias…
one negative attribute of someone causes another to rate everything about them more negatively
What is not a reason to use nonparametric statistics?
there’s normal score distribution
If a counselor is conducting an experiment and chooses a significance level of .01, what does this mean?
that the counselor is willing to accept the possibility of erring in accepting or rejecting the null hypothesis one time out of 100
Which confounding variable, that can threaten an experiment’s validity, is most likely to threaten both internal validity and external validity?
selection of subjects