A - done Flashcards
4 typical evaluative procedures?
Clinical interview, informal, personality, ability
Number of assessments needed for Dx & Txt?
Multiple is best.
Assessment vs test?
Test is a subset of assessment, which could include interviews, observations, etc
Interpretation vs evaluation?
Interpretation assigns meaning. Evaluation assigns value/worth, eg progress or effectiveness.
Types of assessments? (5 pairs)
Individual/group, standardized/nonstandardized, power/speed, maximal/typical performance, objective/subjective.
Standardized/nonstandardized tests?
Standardized - consistent administration, validity & reliability, comparison to norms. Nonstandardized - flexible, variable use, use of judgement by administrator eg Rorschach, TAT.
Purposes of assessment? (6)
Dx & Txt planning, placement services, admission (ed), selection (job), monitoring progress, evaluation of overall outcomes.
Ethics of Appraisal: competence & assessment?
Use only instruments trained in and competent to use.
Ethics of Appraisal: assessment & informed consent
Explain in advance the nature & purpose of the assessment & intended use.
Ethics of Appraisal: release of results?
To professionals qualified to interpret results. With consent.
Ethics of Appraisal: conditions of administration?
Conditions that facilitate optimal results.
Ethics of Appraisal: instrument selection? (5)
Current, valid, reliable, multiculturally appropriate, w consideration for psychometric limitations.
Ethics of Appraisal: scoring and interpretation?
Document any concerns about the tests, the administration, and how they will be used in counseling.
Ethics of Appraisal: assessment construction?
Use scientific methodology & knowledge, inform of benefits, limits, encourage use of multiple sources of info
Civil Rights Act of 1964 plus amendments?
Assessments for employability must relate strictly to the job description, and must not discriminate.
FERPA?
Family Education Rights & Privacy Act, 1974. Provides confidentiality for test results, but access for both student & parent.
IDEA?
Individuals w Disabilities Education improvement Act, 2004. Right to testing at expense of school system. Right to IEP w accommodations.
Vocational and Technical Education Act? (7)
- Vocational assistance for those: w disabilities, the economically disadvantaged, entering nontraditional occupations, w limited English, the incarcerated, adults needing voc training, & single parents.
ADA?
Americans w Disabilities Act, 1990. Employment testing must measure ability to do job tasks without confounding results with a disability. Ensures accomodations.
HIPAA?
Health Insurance Portability & Accountability Act, 1996. Obtain consent to release. CTs have access to their records.
NCLB?
No Child Left Behind Act, 2001. Improve accountability. Requires states to assess basic skills.
Larry P vs Riles
Document use of nondiscriminatory & valid assessments.
Diana v. California State Board of Educ.
Counselors must provide testing Information in the CT’s 1st language, as well as English.
Bakke v Regents of the University of California.
Barred use of quotas for admissions.
Soroka v Dayton-Hudson Co.
Psychological screening tests for hiring are an invasion of privacy. Controversial.
Sources of info on assessments?
Mental Measurementse Yearbook (Buros). Tests in Print. Tests. Test Critiques.
MMY?
Mental Measurements Yearbook. Details for commercially available assessments, including reliability, validity.
TIP.
Tests in Print. All published and commercial tests for psych and Educ. No critiques or psychometric data.
Test Critiques?
Comprehensive reviews, 8 pp, for pro & lay person.
Definition of Validity?
Property of?
To increase credibility?
- How accurately does an instrument measure what it purports to measure?
- A property of the scores of an instrument. Will vary according to intended purpose and intended test takers.
- More types of validity means greater credibility.
8 Types of validity?
Content, criterion (concurrent/predictive), construct, experimental design validity, convergent/discriminant.
Not face.
Content validity?
Content is appropriate to purpose, w all major content areas covered w an appropriate number of items for an area’s importance.
Criterion validity?
Effectiveness relative to a specific criterion. Can be concurrent or predictive validity.
Concurrent validity?
Comparison to a criterion collected at the same time.
Ex: depression scores & data collected on hospitalizations for SI in the last 6 months.
Predictive validity?
Predicts performance on a criterion collected in the future.
Ex: can a depression score predict hospitalizations for SI in the future?
Construct validity?
Extent to which an instrument measures a theoretical construct,
esp. an abstract one.
Experimental design validity?
Implementation of a design to show an instrument measures a specific construct.
Statistical technique used to check construct validity?
Factor analysis - looks for statistical relationships between subscales and with the construct.
Convergent validity?
Relationship can be shown with other constructs where theoretically there should be relationship.
Discriminant validity?
No relationship is found w constructs where no relationship should be found.
Validity coefficient?
A correlation between a test score and the criterion measure.
A test of prediction validity?
Regression equation to predict an individual’s future score.
Standard error of estimate?
The expected margin of error in a predicted criterion score. Prediction validity is never 100%.
Define decision accuracy.
The accuracy of instruments in supporting decisions in counseling.
Decision accuracy-
Definition?
6 types?
Assesses the accuracy of an instrument in supporting counseling decisions.
Sensitivity, specificity, false positive, false negative, efficiency, incremental validity.
Decision accuracy - sensitivity?
Instrument’s ability to identify the presence of a phenomenon.
Decision accuracy - specificity?
Instrument’s ability to identify the absence of a phenomenon.
Decision accuracy - false positive?
Instrument wrongly identifies the presence of a phenomenon.
Decision accuracy - false negative?
Instrument inaccurately identifies the absence of a phenomenon.
Decision accuracy - efficiency?
Ratio of correct counseling decisions indicated by the instrument over total decisions.
Decision accuracy - incremental validity?
Concerned w the extent an instrument enhances the accuracy of prediction of a criterion, eg job performance or GPA.
Reliability?
Consistency of scores obtained by the same person over different administrations of the same test. Reliability is concerned w the error found in instruments.
Reliability - test-retest?
AKA temporal stability. Consistency of scores across time.
Alternative form reliability?
AKA parallel form or equivalent form reliability. Consistency of scores across alternative, equivalent tests.
Reliability - Internal consistency?
Consistency of responses from 1 item to another during a single administration of the test.
Split half reliability?
Correlates 1/2 the test against the other half.
Spearman-Brown Prophecy formula?
Used to compensate for short length in split half estimates of reliability in tests.
A test for inter-item reliability?
Correlate all possible split half combinations in a test.
Kuder Richardson Formula 20?
Estimate of reliability in inter-item consistency when items are dichotomous eg true/false.
Cronbach’s coefficient alpha?
Estimate of reliability in inter-item consistency when items have multipoint responses, eg Likert scales.
Inter-scorer reliability?
AKA inter-rater reliability. Degree of consistency between scorers doing observation/assessment/interviews.
How is reliability reported?
As a correlation coefficient, the closer to 1.00 the more reliable. Nationally normed Achievement, aptitude, GRE, would be .90 and above. Personality tests may be below .90.
Standard error of measurement?
The standard deviation of repeated scores from the same test w the same person. The larger the SEM, the lower the reliability. SEM is often reported in confidence intervals. Ex: 95% of scores will fall at 2 SD. That is the 95% confidence level.
5 factors influencing reliability?
Test length (longer is better).
Range restriction (restriction of range of possible scores is worse)
Homogeneity of items (more homogeneous is better).
Heterogeneity of test takers (more heterogenous on the tested subject is better).
Speed tests (get spuriously high reliability because everyone gets almost all items correct)
Relationship between reliability and validity?
Test scores can be reliable but not valid. If valid, they’re reliable.
Item analysis?
Assess test items - eliminate too easy/difficult/confusing items.
Item difficulty?
P value = percent of test takers who get an item correctly. P=.5 is considered good item difficulty.
Item discrimination?
How well an assessment discriminates between high & low scorers.
Test theory?
- Psychometric theory.
- Expects constructs to have the ability to be measured for quality and quantity to be considered empirical.
- Strives to enhance validity & reliability.
Classical test theory? (3)
- Most influential.
- Individual’s observed score = true score + error present during test administration.
- Aim - increase reliability.
Item Response Theory?
Aka Modern Test Theory. Applying mathematics to the data, eg to detect bias (eg different responses from males/females), or equating scores from 2 different tests.
Construct-based validity model? (3)
- A test theory.
- Validity is a wholistic concept.
- Internal & external validity.
Scales of measurement?
Nominal, ordinal, interval, ratio.
Nominal scale?
Named classifications. Numbers as labels.
Ordinal scale?
Rank order.
Eg Likert scales, or first/second/third place.
Intervals aren’t necessarily equal.
Interval scale?
Equal intervals. No zero point.
Ex: educational and psychological test scores are usually interval Ex: temperature.
Ratio scales?
Equal intervals, has a zero point.
Ex: height, weight, physical measurements in natural sciences.
Scale designs used? (4)
Likert. Semantic differential. Thurstone. Guttman.
Likert scale?
Assessing attitudes or opinions.
Eg Strongly agree to strongly disagree.
Eg Very satisfied to very dissatisfied.
Semantic differential scale? (2)
Aka self-anchored scale.
Place a mark between dichotomous adjectives.
Thurstone scale? (2)
Express beliefs by marking agree/disagree to various statements that are related but successive.
Employs a paired comparison methods.
Guttman scale? (3)
Measures intensity of a variable.
Items are presented in a successive order from less to more extreme.
Check items you agree with.
Raw scores vs derived scores?
Raw is the original score.
Derived scores are converted and compared to a norm group.
Normal distribution? (3)
- Bell curve.
- Most scores fall near the mean, few fall at extremes.
- Permits comparisons to be made between CTs and across tests for 1 CT through derived scores.
Norms
Typical performance against which other scores are compared.
Norm-referenced assessment?
Individual’s score is compared to the average score of the test-taking norm group.
Criterion-referenced assessment?
Comparing an individual’s score w a predetermined criterion.
Eg licensing exams.
Ipsative assessment?
Comparing a test takers score w his/her previous scores on the test.
Eg computer games.
Percentile, or percentile rank? (3)
- Percentage of scores falling at or below an individual score.
- Not equal units of measure.
- Percentiles tend to exaggerate differences near the mean and minimize differences at the tails.
Standardized score?
Compares individual scores to a norm group through conversion of the raw score to a score that specifies the number of standard deviations a score is from the mean.
Z score?
Mean = 0
1 SD = 1.00
2 SD = 2.00
-1 SD = -1.00
T score?
Mean of 50, SD of 10. Used on personality, interest, and aptitude tests. 1 SD = 60 2 SD = 70 -1 SD = 40
Deviation IQ?
Aka standard score. Mean of 100, SD of 15. 1 SD = 115 2 SD = 130 -1 SD = 85
Stanine scores?
Achievement tests.
Mean of 5, with the mean falling halfway through the 5th interval.
SD of 2. Always a whole number. Approx:
1 SD = 7
2 SD = 9
-1 SD = 3
Normal curve equivalents?
A standardized score, range 1-99. Divide the normal curve into 100 equal parts.
Mean of 50, SD of 21.06.
NCE score of 23 means 23% of peers scored at or below.
1 SD = 71.06
2 SD = 92.12
-1 SD = 29.94
Developmental scores?
Age equivalent & grade equivalent scores.
Age equivalent scores?
Compares an individual’s score w the average score of those of the same age.
Reported in chronological yrs and months.
Grade equivalent scores?
Compares the individual’s score w the average score of those at the SAME grade level.
Reported in grade level & months in grade.
Does not indicate a need for change in grade. Does not assess skills. And does not mean the individual is performing at that grade level.
Ability assessment?
4 Types?
Instruments that measure the cognitive domain. Includes: achievement aptitude intelligence high stakes testing
Achievement tests -
Purpose?
5 types?
- Subset of ability tests.
- Assess what one has learned.
- Can include: standardized, norm referenced tests; teacher’s criterion tests, tests to assess progress/at risk, tests for program evaluation.
Standardized achievement tests include? (3)
Acceptable reliability coefficient?
Survey batteries, diagnostic tests, readiness tests.
Reliability >= .80
Survey batteries?
Subset of?
Define?
Examples? (4)
Subset of standardized achievement tests.
Collection of tests - multiple content areas.
Examples: Iowa test of basic skills, Metropolitan, Terranova, Stanford Achievement.
Diagnostic tests?
Subset of?
Purpose?
Examples?
Subset of standardized achievement tests.
Identify learning disabilities or strengths & difficulties in an academic area.
Eg Wide range achievement test, key math diagnostic test, Woodcock-Johnson, Peabody individual achievement test, test of adult basic education.
Readiness tests - Subset of? Define? Purpose? Criticisms?
Subset of standardized achievement tests.
Definition: A group of criterion-referenced standardized achievement tests that indicate minimum skills needed to move to next grade.
Used in high stakes testing.
Criticized for their cultural and language biases.
Aptitude tests -
Assess?
2 types?
- Subset of ability tests.
- Assess what a person is capable of learning, predict future performance.
- Types: Cognitive ability tests
Vocational aptitude tests
Cognitive ability tests?
Subset of aptitude tests.
Predict ability to perform in school, up through grad school.
Eg ACT, SAT, GRE, LSAT, MCAT, cognitive ability test, Otis Lennon school ability test, Miller analogies test.
Vocational aptitude testing - Subset of? Definition? Useful to whom? Examples?
Subset of aptitude tests.
Predictive tests of occupational success.
For career guidance for job seekers
For employers screening for competent, well-suited employees
Includes: multiple aptitude tests, Armed services vocational aptitude battery, Differential aptitude test, special aptitude tests.
Multiple Aptitude Tests - Subset of? Assess? Predict? Example?
Type of vocational aptitude tests.
Assess several distinct aspects of ability at once.
Predict success in several occupations.
Ex: Armed services vocational aptitude battery - most widely used, 10 tests
Special aptitude tests?
Type of vocational aptitude tests.
Assess one homogenous area of aptitude, eg clerical, mechanical, musical, artistic.
Intelligence tests -
Scores?
Purposes?
Subset of ability tests.
Single summary score (IQ) and index scores derived from factor analysis.
Identify & classify intellectual developmental disabilities.
Detect giftedness & learning disabilities.
First intelligence test developed by?
Binet & Simon
IQ -
Formula?
Developed by?
MA/CA*100
Developed by Stern.
Spearman on intelligence?
2 factor
G - general factor
S - specific factors - skills acquired in training
Catell’s fluid and crystallized intelligence?
Fluid = innate ability - reasoning, memory, speed of processing Crystallized = gained through learning
Howard Gardiner’s multiple intelligences
8 primary intelligences
Linguistic, logical-math, musical, spatial, bodily kinesthetic, intrapersonal, interpersonal, naturalistic
Cattell Horn Carroll on intelligence?
Most empirically validated.
Hierarchical w 3 strata: general,
broad cognitive abilities, & narrow cognitive abilities
Common intelligence tests?
Wechsler (WAIS, WISC, WPPSI), Stanford Binet, Kaufman
High stakes testing - Subset of? What is used? Purpose? Criticism?
Subset of ability tests.
What is used: Criterion-referenced assessments used. A single defined assessment is used.
Purpose: a clear line on pass/fail; has a direct consequence on major educ. decisions.
Criticisms: single scores, not addressing diversity.
Clinical assessment - purpose?
Types?
Purposes: CT’s self-awareness. Conceptualization and Txt of CT.
Types: Personality assessments
Informal assessments (eg observation, clinical interview)
Other assessments (eg MSE, performance, suicide)
Personality tests.
What do they assess?
2 types?
Type of clinical assessment.
Facets of character that remain stable - temperament, patterns of behavior.
Objective & projective.
Objective personality tests? Define? Assess? Purpose? Examples?
Subset of clinical assessment - personality.
Standardized self report instruments.
Assess personality types, traits, states, self-concept.
Identify psychopathology, assist Txt planning.
Ex: MMPI, Millon, Myers-Briggs, California psychological inventory, 16 personality factors, the NEO, Coopersmith self esteem inventories-kids.
Projective personality tests - Define? Used by? Purpose? Examples?
Subset of clinical assessment - personality.
Interpreting CT’s response to ambiguous stimuli.
Psychoanalytic.
Identify psychopathology and for Txt planning.
Ex: Rorshach, TAT, House tree person, Sentence completion test
Informal assessments -
Subset of? Type? Purpose? Includes?
Subset of clinical assessment. Subjective.
Purpose: to identify strengths & needs of CTs.
Includes: Observation, interviewing, rating scales, classification systems.
Observation?
2 types?
Type of clinical assessment - informal.
Direct - behavior, antecedents, consequences, usually in a naturalistic setting.
Indirect - through self-report or informants, via behavioral interviewing, checklists, rating scales.
Clinical interviewing?
Type of clinical assessment - informal.
Most common assessment in counseling.
Structured, semi structured, unstructured.
Structured clinical interview?
Type of clinical assessment - informal.
Pre-established questions in a set order.
Detailed, exhaustive.
Provide consistency, but no flexibility.
Semi structured clinical interview?
Type of clinical assessment - informal.
Pre-established questions and areas.
Can customize, flexible.
More prone to bias, error, less reliable.
Unstructured clinical interview?
Type of clinical assessment - informal.
Tend to follow CT’s lead w open-ended Qs and reflective skills.
Most flexible.
Least reliable.
Rating scales for informal clinical assessment?
Evaluate the quantity of an attribute.
Eg a scale from 1 - 5, from hardly at all to extremely
Can be Broad band or Narrow focus.
Classification systems for informal clinical assessment -
Define?
3 commonly used systems?
To assess presence/absence of an attribute.
1 Behavior & feeling word checklists.
2 Sociometric instruments - assess social dynamics.
3 Situational tests, eg role play to see how CT may do in real life.
Mental status exam?
12 parts?
Clinical assessment - other.
Snapshot of mental Sx & psychological state.
Appearance, attitude, mood & affect, psychomotor, thought process & thought content & perceptions, judgement & insight, intellectual functioning & memory.
A performance assessment?
Clinical assessment - other.
Nonverbal assessments. For CTs w foreign language or disabilities.
Ex: Draw a man test, Cattell culture fair intelligence tests, Test of non-verbal intelligence (TONI), Porteus, Bayley, Gesell developmental scale.
Suicide assessment?
Clinical assessment - other.
Gather info to assess lethality & risk factors
through clinical interview or standardized assessments.
Suicide assessment acronyms?
PIMP
SAD PERSONS
PIMP suicide assessment?
Plan
Intent
Means
Prior attempt
SAD PERSONS suicide assessment?
Sex, age, depression, previous attempt, ethanol abuse, rational thought loss, social supports lacking, organized plan, no spouse, sickness.
Levels of suicide lethality?
Low - not suicidal at time of assessment
Low moderate - somewhat suicidal, but no risk factors
Moderate - suicidal w several risk factors
Moderate high - determined to die, may SA within 72 hrs without intervention
High - SA in process, needs medical intervention
Suicide risk factors?
Demographics - male, single, widowed, White, higher age
Psychosocial - lack of supports, unemployed/drop in SES
Psych Dx - mood or anxiety disorders, schizophrenia, substance use disorders, borderline, antisocial, narcissistic PDs.
Suicidal emotions - hopelessness, helplessness, worthlessness, loneliness, depression.
Hx - family Hx of suicide, abuse, MI. CT has Hx of SAs.
Individual factors - inability to problem solve, AOD use, low tolerance for psychological pain.
Suicidality & Sx - past and present SI, plans, behavior, intent; no reason for living; HI.
Standardized assessments for suicidal lethality?
Specific suicide assessments - eg Beck scale for SI.
Reasons for living inventories.
Standardized personality tests - MMPI, Millon
Projective personality tests - TAT, Rorshach, Rotter incomplete sentence
Definition of bias in assessment? (3)
Bias in language or culture.
Deprives a person of demonstrating their true ability.
Can result in lower or higher scores.
Types of bias in assessment? (5)
Examiner - examiner’s beliefs/behavior influence the test.
Interpretive - interpretation provides unfair adv/disadvantage
Response - test taker uses a response set, eg always ‘yes’
Situational - testing conditions affect different cultures differently
Ecological - global systems - eg use of only Western tests
How to reduce bias in assessment? (7)
Use assessments appropriate for multicultural pops.
Provide appropriate norms.
Use the best language for the pop.
Consider how culture/group affects administration and interp.
Understand CT’s worldview & level of acculturation.
Be knowledgeable of the CT’s culture.
Avoid cultural stereotypes.
Test translation vs test adaptation?
Translation isn’t enough; need to adapt for culture, for familiarity of concepts, objects, values.
Adaptation includes empirically validating the cultural equivalence of the test.
Computer based testing -
Definition?
Pros?
Cons?
AKA computer based assessment.
Administering, analyzing, interpreting via computer.
Advantages: time & cost reduced, scoring accuracy, quick feedback, standardization, privacy.
Disadvantages: expense, less human contact, may not have standards or normative data, some assessments aren’t possible.
Computer adaptive testing?
Adjusts the test’s structure and items to the test taker’s abilities.
Eg GRE.