Assessment and Testing Flashcards
general process of determining the dimensions of an attribute or trait
processes and procedures for collecting info about human behavior
- assessment tools include tests/inventories, rating scales, observation, interview data, etc.
implies going beyond measurement to making judgments about human attributes and behaviors; used interchangeably with evaluation
Measures of Central Tendency
a distribution of scores (measures on a number of individuals) can be examined using:
!! All three of these fall in the same place (are identical) when the distribution of scores is normally distributed (not skewed) !!
making a statement about the meaning or usefulness of measurement data according to the professional counselor’s knowledge and judgment
the arithmetic average (M)
the middle score in a distribution of scores
1, 2, (3), 4, 5
the most frequent score in a distribution of scores
1, (2, 2), 3, 4, 5
the degree to which a distribution of scores is not normally distributed
Positive Skew
The bulk of the scores falls on the left (positive skew = the tail goes out to the more positive values)
Mode, median, mean
Negative Skew
The bulk of the scores falls on the right (negative skews = the tail goes out to the left)
::::::::: ::::::::::: ::::::::::::::::: ............::::::::::::::::::: Mean, median, mode
This graph is messed up but you get the idea
Relationship between mean, median, mode in skewed distributions
- the mode is the top of the curve (most frequent scores)
- the mean is pulled in the direction of the extreme scores represented by the tail of a skewed distribution
Measures of Variability
the highest score minus the lowest score
Measures of Variability
Inclusive range
the high score minus the low score, adding one (1)
Measures of Variability
Standard Deviation (SD)
describes the variability within a distribution of scores
the mean of all the deviations from the mean
Excellent measure of the dispersion of scores
(SD = standard deviation within a sample
sigma = population’s variability)
!! It is NOT equal to variance!! SD is the square root of variance!!!
Measures of Variability
the square of the standard deviation (SD^2)
does not describe the dispersion of scores as well as SD
- see analysis of variance
Normal Curve
Normal curve
essentially distributes the scores (individuals) into six equal parts - three above the mean and three below mean
Normal Curve
Normal curve distributions
2%, 13.5%, 34%, 34%, 13.5%, 2%
…………………=== 68% ===…………………….1 SD
………======== 95%========……………2 SD
============ 99% ===========….3 SD
a value below which a specified percentage of cases falls
- for a score of 75% : this score is higher than 74% of the scores; 25% of the scores are higher than this score
from standard nine
converts a distribution of scores into nine parts (1 to 9) with five in the middle and a SD of about 2
Standardized Scores
creates a common language of scores to compare several different test scores for the same individual
- occur by converting raw score distributions
- these derived scores provide for constant normative/relative meaning allowing for comparisons between individuals
- express the person’s distance from the means in terms of the standard deviation of that standard score distribution
- are continuous and have equality of units
- two most commonly used standardized scores: z-scores, t-scores
Standardized Scores
mean is 0, SD is 1.0
- range for the SD is -3.0 to +3.0
Study tip: Z-score, Zero is the mean of the distribution
Standardized Scores
mean of this standardized score is 50 and SD is 10
by Transforming this standard score, negative scores are eliminated (unlike z-score)
Study tip: T-score, Ten is SD
Correlation coefficient
Pearson Product-Moment Correlation Coefficient (r) is most common
Correlation coefficient
- ranges from -1.0 (perfect negative correlation) to +1.0 (perfect positive correlation)
- statistical index which shows the relationships between two sets of numbers
- when a very strong correlation exists, if you know one score of an individual, you can predict (to a large degree) the other score of that person
- tells nothing about cause and effect!!! Only degree of relationship!!!
correlation between two variables
correlation between three or more variables
the consistency of a test or measure
- the degree to which the test can be expected to provide similiar results for the same subjects on repeated administrations
- can be viewed as the extent to which a measure is free from error (if instrument has little error, it is reliable)
- correlation coefficient is used to determine reliability
- if reliability coefficient is high (about .70 or higher), test scores have little error and the instrument is reliable
- a test can have high reliability but low validity… reliability places a ceiling on validity, but validity does not set limits on reliability
- ex. a scale could read 20 lbs every time you weigh a box, but the box actually weighs 40 lbs.
Types of Reliability
Test-Retest reliability
(AKA stability reliability)
- obtained using the same instrument on both occassions
- same group tested twice
- results of the two administrations are correlated
- length of time and intervening experiences may influence test-retest reliability
- two weeks is a good time between test adminstrations
Types of Reliability
Alternate-Forms reliability
(AKA Equivalence reliability)
- alternate forms of the same test are administered to the same group and the correlation between them is calculated
- how comparable the forms of the tests are will influence this reliability
- intervening events/experiences may also influence reliability
Types of Reliability
Split-half reliability
(AKA Internal consistency)
- test is divided into two halves
- The correlation between these two halves is calculated
- because you reduce the length of the test (1/2 vs. 1/2), you may apply Spearman-Brown formula (called prophecy formula) to see how reliable the test would be had you not split it in two
True and error variance
tests measure true and error variance
- you want to measure TRUE variance, the actual psychological trait or characteristic that the test is measuring
Types of Reliability
Internal consistency (split-half)
can also be determined by measuring interitem consistency
- the more homogeneous the items, the more reliable the test
- Kuder-Richardson formulas (two formulas) are used if test contains dichotomous items (T/F, Y/N)
- Cronbach alpha coefficient is used if instrument contains nondichotomous items (multiple choice, essays)
Coefficient of Determination
the degree of common variance
- the index (81%) that results from squaring the correlation (.90)
True/Error variance example
Venn Diagram:
(E1 ( T1 T2) E2)
Two tests are administered. Each one measures true variance (T1 and T2) and error variance (E1 and E2)
- if the correlation between two tests or two forms of the same test is .90, then the amount of true variance measured in common is the correlation squared (.90^2 = 81%)
Coefficient of Nondetermination
the unique variance, not common
- for above example, it would be 19% and represents error variance
- 100 - coefficient of determination
Standard error of measurement (SEM)
another measure of reliability and is useful in interpreting the test scores of an indiviudal
- may be referred to as the Confidence Band/Limits
- helps determine the range within which an individual’s test score probably falls
Standard error of measurement (SEM) Example
a person scores a 92 on a test. The test’s SEM = 5.0. Chances are 2 in 3 (67%) that the person’s score falls between 87 and 97 (refer to normal curve - 34% and 34% of the cases fall within one standard deviation (+/-) for total of 68%
- for the same test with the same SEM of 5.0, you can say that 95% of the time, the person’s score would fall within the range of 82 and 102
- every test has it’s own unique SEM which is calculated in advance and may be reported on test’s score profile
the degree to which a test measures what it purports to measure for the specific purpose for which it is used
- situation specific - depending on purpose and population
- an instrument may be valid for some purposes and not others
Validity is considered to be more important than reliability
Types of Validity
Face validity
instrument looks valid
- ex. a math test has math items. This validity could be important from the test-taker’s perspective
Types of Validity
Content validity
the instrument contains items drawn from the domain of items which could be included
- ex. two professors of Psyc 101 devise a final exam which covers the important content that they both teach
Types of Validity
Predictive validity
also called empirical validity
the predictions made by the test are confirmed by later behavior (criterion)
- Ex. the scores on the GRE predict later grade point average
Types of Validity
Concurrent validity
the results of the test are compared with other tests’ results or behaviors (criteria) at or about the same time
- ex. scores of an art aptitude test may be compared to grades already assigned to students in an art class
Types of Validity
Construct Validity
measures some hypothetical construct such as anxiety, creativity, etc.
- usually several tests/instruments are used to measure different components of the construct or of the hypothesized relationships between the construct and other constructs
- best when multiple traits are being measured using a variety of methods
Types of Validity
Convergent validity
a type of construct validity
- occurs when there is high correlation between the construct under investigation and others
Types of Validity
Discriminant validity
a type of construct validity
- occurs when there is no significant correlation between the construct under investigation and others
Types of Validity
Consequential validity
tries to ascertain the social implications of using tests
Tests may be reliable but not valid
Valid tests are reliable unless there is a change in the underlying trait or characteristic which might occur through maturation, training, development
Speed-Based Tests
timed, and the emphasis is on speed/acuracy
ex. measurements of intelligence, ability, aptitude
Power-Based Tests
no time limits or very generous ones (focuses on number right)
Assessments may be:
Norm referenced
comparing individuals to others who have taken the test before
- norms may be national, state, local
- how you compare with others is more important than what you know
Assessments may be:
Criterion referenced
comparing an individual’s performance to some predetermined criterion which has been established as important
- Ex. NCE cut-off score
Assessments may be:
Ipsatively interpreted
comparing the results on the test within the individual
- ex. looking at individual’s highs/lows on an aptitude battery which measures several aptitudes. There is no comparison with others
- ex. when individual’s score on a second test is compared to the score on the first test
Assessments may be:
Maximal performance test
may generate a person’s best performance on an aptitude or achievement test
Assessments may be:
Typical performance
may occur on an interest or personality test
Purposes/Rationale for using tests
- help counselor decide if the client’s needs are within the range of their services
- help client gain self-understanding
- help counselor gain a better understanding of the client
- assist the counselor in determining which counseling methods, approaches, techniques will be suitable
- assist the counselee to predict future performance in education, training, work
- help counselees make decisions about their educational or work futures
- help identify interests not previously known
- help evaluate the outcomes of counseling
Circumstances under which testing may be useful
- placement - in education/work
- admissions - schooling
- diagnosis
- counseling
- educational planning
- evaluation
- licensure and certification
- self-understanding
Regression toward the mean
if one earns a very low score (at 15% or lower) or very high scores (at 85% or higher) on a pretest, the individual will probably earn a score closer to the mean on the post-test
- this is because of the error occurring due to chance, personal, environmental factors
- these factors can reliably be expected to be different on the post-test
Standardized assessment
the instruments are administered in a formal, structured procedure, and the scoring is specified
Nonstandardized assessment
there are no formal or routine instructions for administration or for scoring
- ex. checklists or rating scales
Tests/Inventories - details will not be the focus!
ability to think in abstract terms; to learn
- some also believe it is the ability to adapt to the environment and adjust to it
- called general ability/cognitive ability
Tests/Inventories - details will not be the focus!
Intelligence tests
- Stanford-Binet Intelligence Scales
- Wechsler Adult Intelligence Scales (WAIS-IV)
- Wechsler Intelligence Scale for Children (WISC-V)
- Cognitive Abilities Test
Tests/Inventories - details will not be the focus!
Specialized Ability Test
- Kaufman Assessment Battery for Children - II
- System of Multicultural Pluralistic Assessment (SOMPA) - measures medical, social systems and pluralistic factors
- Miller Analogies Test (MAT)
- Graduate Record Exam (GRE)
Tests/Inventories - details will not be the focus!
measures effects of learning or a set of experiences
- may be used diagnostically
- many states have own K - 12 achievement tests
Tests/Inventories - details will not be the focus!
Achievement Tests
- National Assessment of Education Progress (NAEP)
- California Achievement Tests
- Iowa Test of Basic Skills
- Stanford Achievement Test
Tests/Inventories - details will not be the focus!
Specialized Achievement Tests
- General Education Development (GED)
- College Board’s AP Program
- College-Level Examination Program (CLEP)
Tests/Inventories - details will not be the focus!
AKA ability tests
measure the effects of general learning and are used to predict future performance
Tests/Inventories - details will not be the focus!
Aptitude Tests
- Differential Aptitude Test (DAT)
- O*Net Ability Profiler
- Armed Services Vocational Aptitude Battery (ASVAB)
- Career Ability Placement Survey (CAPS)
Tests/Inventories - details will not be the focus!
the dynamic product of generic factors, environmental experiences, learning to include traits and characteristics
Tests/Inventories - details will not be the focus!
Projective Personality Assessments
these tests present a relatively unstructured task or stimulus. The person projects thought processes, needs, anxieties, etc.
- Rorschach
- Thematic Apperception Test (TAT)
- Rotter Incomplete Sentences Blank (2nd Ed.)
- Draw-A-Person Test
Tests/Inventories - details will not be the focus!
Personality Inventories
- Minnesota Multiphasic Personality Inventory (MMPI)
- California Psychological Inventory (CPI)
- NEO Personality Inventory - Revised
- Beck Depression Inventory (BDI)
- Myers-Briggs Type Indicator (MBTI)
Myers-Briggs Type Indicator (MBTI)
a personality inventory based on Carl Jung’s analytic psychology (so it is a theory based test)
uses dichotomous types: Extraversion vs introversion, sensing vs. intuition, thinking vs. feeling, judging vs. perceiving
- results in a 4 letter type score
Tests/Inventories - details will not be the focus!
Specialized Personality Assessments
- Tennessee Self Concept Scale
- Bender Visual-Motor Gestalt Test
Tests/Inventories - details will not be the focus!
preferences, likes, dislikes of an individual and more broadly includes value
Interests often are not stable in the teen years (become stable around 25)
- interest inventories often emphasize professional positions and minimize blue collar jobs
- generally reliable
Tests/Inventories - details will not be the focus!
Interest Inventories
- Strong Interest Inventory
- Self-Directed Search
- Career Assessment Inventory
- Cambell Interest and Skill Survey
- O*Net Interest Profiler
Semantic differential
this scale asks respondents to report where they are on a dichotomous range between two affective polar opposites
- ex. “Think about the value of this study guide”
Very bad ——————————————— very good
- responses can be codified and added to those of others. The adjective pairs selected can usually be classified as having an evaluative, potency, activity underlying structure thus providing for a second level of analysis
Intrusive (reactive) measurement
the participant knows they are being watched/questioned and this knowledge may affect their performance
- ex. questionnaires, interviews, observation
Unobtrusive (nonreactive) measurement
data is collected without the awareness of the individual, or without changing the natural course of events
- ex. reviewing existing records; unobtrusive observation
Observation as appraisal technique
you observe samples from a stream of behavior
- in observation, you may use schedules, coding systems, record forms
Case or historical study
this may be analytical and/or diagnostic investigation of a person/group
Rating scales
these may be used to report the degree to which an atribute or characteristic is present
used to identify isolates, rejectees, stars (popular individuals)
- can measure the structure/organization of social groups (ex. class of 4th graders; work unit)
- requires revealing personal feelings about others
a figure or map showing the interrelationships or structure of the group
Social desirability
the tendency for test takers to respond in ways that are perceived to be socially desirable
Using and Interpreting Test Scores
Training in test theory/background info
about the tests you use
- must study test’s technical manual
Using and Interpreting Test Scores
Prepare for test interpretation
understand the scores, profiles, implications of the results before you counsel the individual
Using and Interpreting Test Scores
Describe the test in nontechnical terms
and explain what was being measured
Using and Interpreting Test Scores
Describe the nature of the scores you are reporting
explain percentiles, stanines, any other technical term
Using and Interpreting Test Scores
Organize the data
to help it make sense to the client
- show profiles, charts, comparative data if appropriate
- consider and explain interrelationships between scores and between tests if more than one was used
Using and Interpreting Test Scores
Provide an interpretation ot the client and ask for reactions/feelings
help the client integrate test results with existing info
Using and Interpreting Test Scores
Remind client that test scores are additional data and are not infallible
test data may be useful in decision making or obtaining some objective
Using and Interpreting Test Scores
Go slowly
you may have used similar words in test interpretations before. It may be the first time the client is hearing those words
Grade equivalent scores
- often used to report scores on an achievement test
- if a student correctly completes the number of items on a test that the average 6th grader completes, that student has a grade equivalent score of 6
Age equivalent scores
an individual’s score is compared to the average score of others at that same age
- ex. if a 7.6 year old student earned a score equivalent to 8.0 year old students, that would be their age equivalent score
Percentile rank
individual’s score can be compared to a group (norm group) already examined
- percentile rank indicates what percentage of individuals in that group have scores above/below this individual
ex. a percentile rank of 35 means that this individual’s score is higher than 34% of the individuals in the norm group. 65% of individuals in that norm group scored higher than that individual
Advantages to computer-based assessment
- standardizes administration and scoring
- feedback and results may be available immediately
- assuming computers are available, costs will be reduced
- profiles of results/reports can be generated
Disadvantages to computer-based assessment
- not all assessments are available on computer
- testing by computer may be scary for some test takers
- if not available, have to buy expensive computers
- personal contact with a test administrator or proctor may not be available
Ethical issues in testing
- tests may be biased against non-whites, females, those of other cultures (most tests normed on white middle-class males)
- counselors must be trained/competent to select/administer tests and to interpret test results/info
- test results should be released only to competent professionals/with consent of test taker
- tests may be used to label and stereotype; may invade privacy
- confidentiality of test results may be an issue especially with computerization
- computerized testing may raise issues of validity (is the same as on paper?)
- results should be interpreted
- review the measurement and eval section of ethical code
Mental Measurements Yearbook
contains critical reviews of tests and lists published references of test
- latest ed. is 2017
- created by Oscar Buros
Tests in Print IX
has info on approx. 3,000 testing instruments
A Comprehensive Guide to Career Assessment (7th ed.)
published by National Career Development Association
Association for Assessment and Research in Counseling
one of the 18 divisions of the ACA
Normative measures
each item is independent of all other items
- a normative interpretation compares the individual’s scores to others who took the same test
Ipsative measures
compare traits within the same individual (not compared to other individuals who took the instrument)
- person is measured in response to their own standard of behavior
- generally consists of forced choice format
Spiral test
the items get progressively harder
(the same way it gets harder the further up you go on a set of spiral stairs)
Cyclical test
you have several sections that are spiral in nature (each section has questions that go from easy to more difficult)
Horizontal test
a test battery is a horizontal test because several measures are used to produce results that could be more accurate than those derived from merely using a single score
horizontal tests measure various factors (e.g., math and science) during the same testing procedure
Vertical test
a test that has versions for various age brackets or levels in education
ex. the Kuder Career Planning Instruments have scales for all ages
Francis Galton
did research and concluded that intelligence was normally distributed like height/weight and was primarily genetic
- felt intelligence was a single/unitary factor
Wechsler Intelligence test versions
WPPSI (Wechsler Preschool and Primary Scale of Intelligence - 2.5 years-7 years, 7 months)
WISC-IV (Wechsler Intelligence Scale for Children - 6-16 years and 11 months)
WAIS-IV (Wechsler Adult Intelligence Scale - 16-90 YO)
- based on neurocognitive research and Cattell-Horn-Carroll leading theory of human intelligence
- can be administered/scored online
- takes 60-90 minutes to complete
- object assemby and picture arrangement has been dropped in this version
- ten subject areas (subtests) make up 4 index scores: verbal comprehensive index (VCI), perceptual reasoning index (PRI), working memory index (WMI), processing speed index (SPI)… mean of 10 and SD of 3
- FSIQ = full scale IQ (mean of 100 with SD of 15)
- less emphasis than previous version on crystallized intelligence
- Can measure IQ from 40 to 160. Standford-Binet has a wider range (up to 180), it is better for measuring extremely low IQ or giftedness)
16 Personality Factor Questionnaire (16 PF)
developed by Ramond B. Cattell
- for ages 16+
- measures key personality factors such as assertiveness, emotional maturity, shrewedness
- a type of factor-analytic test because it analyzes data outside of a given theory
Bender Visual Motor Gestalt Test
expressive projective measure
known for ability to discern whether brain damage is evident
client is instructed to copy 16 geometric figures which they can look at while constructing their drawing
Association for Assessment and Reserach in Counseling
IQ Curve - standard deviations and equivalent percentiles
(google normal curve with percentile to see image)
Barnum effect
clients will often accept a general psychological test report, horoscope, palm reading, etc. and believe it applies specifically to them