Study Guide Exam 1 (Assessment and Diagnosis) Flashcards
Earliest forms of testing
China: interview tests for civil servants
Han dynasty: written tests for people wanting to work in government
Physiognomy
Practiced by ancient Greeks
Belief that internal characteristics are shown in external features
Example: person with upturned nose is arrogant
Phrenology
Germany in 1700s
Franz Joseph Gall
Examining bumps on the head: “organs” in the brain that had been exercised, contributing to personality
Psychophysics
Theorized that there is a difference between the physical world and the world that people experience
Tested absolute threshold and just noticeable difference
Absolute threshold
Related to psychophysiology
Amount of stimulus needed to detect as present 50% of the time
Just noticeable difference
Related to psychophysiology
Amount of change in stimulus needed to detect as present 50% of the time
Darwin
Contributed to psychology: focusing on individual differences between individuals (intra-species variability)
Galton
Key person in “Brass instruments era”
Tested individual differences between people to determine intelligence
Sensory and motor, questionnaires, and physical attributes were tested
Cattell
Performed work similar to that of Galton (“Brass instruments era”)
Coined the term “mental tests”
Wissler
Brought about the end of the “Brass instruments era”
Discovered that physical attributes (particularly sensory and motor) didn’t actually correlate with intelligence
Wundt
Father of modern psychology
Ran first psychology lab
Kraeplin
First person to classify mental illness
Tested for emotional handicaps (interview-based test of emotional regulation)
Esquirol
Developed test to determine degrees of mental retardation
Mental ability was classified according to verbal ability
Binet
First person to develop intelligence test, which was used to determine placement of children in school
Army alpha/beta
Used to place people as officers or non-officers
Alpha- verbal test
Beta- non-verbal test
Wechsler scales
Intelligence test used the most in modern times
Woodworth
Created first personality test (Personal Data Sheet)
Examples of personality tests
Rorschach inkblot test
Thematic Apperception Test
Minnesota Multiphasic Personality Inventory
Test
Procedure in which a sample of an individual’s behavior is obtained, evaluated, and scored using standardized procedures
Measurement
A set of rules for assigning numbers to represent objects, traits, attributes, or behaviors
Assessment
Systematic procedure for collecting information that can be used to make inferences about the characteristics of people or objects
Reliability
Consistency
Validity
Accuracy
Maximum performance tests
Goal: best performance
Examples: classroom tests, intelligence tests
Achievement tests
Test specific skills
Aptitude tests
Assess abilities
Objective tests
Specified scoring (clear right and wrong answer)
Subjective tests
Require judgment to evaluate (no clear right and wrong answers)
Power tests
Unlimited time
Give best performance
Speed tests
Timed tests
Usually fairly easy
Typical response tests
Survey
No right or wrong answer
Objective typical response tests
Answers can be calculated without subjectivity (like surveys)
Subjective typical response tests
Project something about self onto stimulus
Norm referenced scores
Your score is dependent on others' scores Percentile rank (ex- ACT, SAT, GRE) Tests that have been curved fall into this category
Criterion referenced scores
There is a set criterion for success
Typical of classroom tests
Doesn’t depend on others’ performance
Norm samples: what they need to be
Representative of the population taking the test
Consistent with that population
Current (must match current generation)
Large enough sample size
Types of norm samples
Nationally representative sample (reflects society as a whole)
Local sample
Clinical sample (compare to people with given diagnosis)
Criminal sample (utilizing criminals)
Employee sample (used in hiring decisions)
Flynn effect
Intelligence increases over successive generations
In order to stay accurate, intelligence tests must be renormed every couple of years
Raw scores
Number of questions answered correctly on a test
Only used to calculate other scores
Mean and standard deviation for z scores
M=0
SD=1
Mean and standard deviation for t scores
M=50
SD=10
Mean and standard deviation for IQ scores
M=100
SD=15
Example of age equivalents
13 and performing at an 11 year-old level
Example of grade equivalents
In the 8th grade and performing at a 6.5 grade level
3 types of criterion-referenced interpretations
Percentage correct Mastery testing (pass/fail) Standard-based interpretation (assigning letter grade)
Classical test theory equation
Xi=T+E
Xi- obtained score
T- true score
E- error
Content sampling error
Difference between sample of items on test and total domain of items
Time sampling error
Random fluctuations in performance over time
Can be due to examinee (fatigue, illness, anxiety, maturation) or due to environment (distractions, temperature)
Inter-rater differences
When scoring is subjective, different scorers may score answers differently
Clerical error
Adding up points incorrectly
Test-retest reliability
Administer the same test on 2 occasions
Correlate the scores from both administrations
Sensitive to sampling error
Alternate-form reliability
Develop two parallel forms of test
Administer both forms (simultaneously or delayed)
Correlate the scores of the different forms
Sensitive to content sampling error (simultaneous and delayed) and time sampling error (delayed only)
Split-half reliability
Administer the test
Divide it into 2 equivalent halves
Correlate the scores for the half tests
Sensitive to content sampling error
Kuder-Richardson and coefficient (Cronbach’s) alpha
Administer test
Compare each item to all other items
Use KR-20 for dichotomous answers and Cronbach’s alpha for any type of variable
Sensitive to content sampling error and item heterogeneity
Measures internal consistency
Inter-rater reliability
Administer test
2 individuals score test
Calculate agreement between scores
Sensitive to differences between raters
Composite scores
Scores that are combined to form a combined score
Reliability of these is usually better than their individual parts
Difference scores
Calculated difference between 2 scores
Reliability of these is usually lower than their individual parts (information is lost: only can see change, not initial baseline)
High-stake decision tests: reliability coefficient used
Greater than 0.9 or 0.95
General clinical use: reliability coefficient used
Greater than 0.8
Class tests and screening tests: reliability coefficient used
Greater than 0.7
How to improve reliability
Increase number of test items
Use composite scores
Develop better items
Standardize administration
Standard error of measurement (SEM)
Standard deviation of test administered to the same individual an infinite number of times
Useful when interpreting test scores
When reliability increases, this decreases
What are used to calculate confidence intervals?
Use SEM and SD
Generalizability theory
Shows how much variance is associated with different sources of error
Construct underrepresentation
Test doesn’t measure important aspects of the specified construct
Similar to content sampling error
Construct-irrelevant variance
Test measures features that are unrelated to the specified construct
External threats to validity
Examinee characteristics (ex- anxiety, which hinders examinee)
Deviation from standard test administration and scoring
Instruction and coaching
Standardization sample isn’t representative of population taking test
Content validity
Degree to which the items on the test are representative of the behavior the test was designed to sample
How content validity is determined
Expert judges systematically review the test content
Evaluate item relevance and content coverage
Criterion-related validity
Degree to which the test is effective in estimating performance on an outcome measure
Predictive validity (form of criterion-related validity)
Time interval between test and criterion
Example: ACT and college performance
Concurrent validity (form of criterion-related validity)
Test and criterion are measured at same time
Example: language test and GPA
Construct validity
Degree to which test measures what it is designed to measure
Convergent validity
Correlate test scores with tests of same or similar construct: look for convergence
Divergent/discriminant validity
Correlate test scores with tests of dissimilar construct: look for divergence
Factor analysis
Used to determine if test is measuring factors related to the given construct
Assign factor loadings (similar to correlation coefficients): variables should have high loadings on only 1 factor
Evidence based on internal structure
Examine internal structure to determine if it matches the construct being measured
Evidence based on response processes
Is the manner of responses consistent with the construct being assessed?
Evidence based on consequences of testing
If the test is thought to result in benefits, are those benefits being achieved?
Incremental validity
Determines if the test provides a gain over another test
Face validity
Determines if the test appears to measure what it is designed to measure
Not a true form of validity
Problem with tests high in these: can fake them
Internal vs. external validity
Internal: Does the measure work in ideal conditions?
External: Does it work in the real world?
Multitrait-multimethod approach to determining construct validity
Use multiple measures for same constructs to check for convergence as well as measures for other constructs to check for divergence
Contrasted group study approach to determining construct validity
Create 2 separate and different groups: administer test and look for differences between them
6 steps of test construction
- Define the test (what are we testing and why)
- Select item format
- Construct test items
- Test the items (determine reliability and validity)
- Revise the test
- Publish the test
Answer choice formats: selected-response vs. constructed-response items
Selected items: pick from a number of answers (multiple choice, true/false, matching)
Constructed items: generate your own answers (short answer, essay)
Strengths of selected-response items
Can include more items (each question takes less time to answer)
Increased content sampling as well as reliability and validity
Reduction of construct-irrelevant factors
Scoring is efficient and reliable
Weaknesses of selected-response items
Developing items is time consuming (easier to write constructed items)
Unable to assess all abilities
Subject to random guessing (make it look like examinee knows more than he/she actually does)
Strengths of constructed-response items
Questions are relatively easy to write
Can assess higher-order cognitive abilities (have to show reasoning)
No random guessing
Weaknesses of constructed-response items
Test can include relatively few items (takes longer to answer each one)
Difficult to score reliably (even with good rubric, still hard)
Subject to misinterpretation (examinee might misconstrue question)
Construct-irrelevant factors can sneak in (ex- bad handwriting makes answers hard to read)
3 things on a test that should be clear
Clear directions (examinee should know how to answer question) Clear questions (questions should only ask 1 thing; answering should be able to be done in a decisive manner) Clear print (should be easy to read)
5 things that should not be included on a test
Cues to answers (ex- including answer in a different question)
Items that cross pages (increases likelihood of examinee error)
Construct-irrelevant factors
Exact phrasing from materials (encourages rote memorization over understanding of concept)
Biased language and content
2 things to consider surrounding placement of items on a test
Item arrangement: placement should make sense
Number of items: if using a power test, should be able to complete questions in given time limit
Type of material that should be used on a matching test
Homogenous material (all items should relate to a common theme)
Multiple choice tests: what kinds of stems should not be included?
Negatively-stated ones
Unclear ones
Multiple choice tests: how many alternatives should be given?
3-5
Multiple choice tests: what makes a bad alternative?
Long
Grammatically incorrect in question
Implausible
Multiple choice tests: how many best/correct answers per question?
1
Multiple choice tests: how should placement of correct answer be determined?
Random (otherwise, examinees can detect pattern)
Multiple choice tests, true/false tests, and typical response tests: what kind of wording should be avoided?
“Never” or “always” for all 3
“Usually” for true/false
“All of the above” or “none of the above” for multiple choice
True/false tests: how many ideas per item?
1
True/false tests: what should be the ratio of true to false answers?
1:1
Matching tests: ratio of responses to stems?
More responses than stems (make it possible to get only 1 wrong)
Matching tests: how long should responses and lists be?
Brief
Essay tests and short answer tests: what needs to be created?
Scoring rubric
Essay tests: what kinds of material should be covered?
Objectives that can’t be easily measured with selected-response items
Essay tests: how should grading be done?
Blindly
Short answer tests: how long should answers be?
Questions should be able to be answered in only a few words
Short answer tests: how many correct responses?
1
Short answer tests: for quantitative items, what should be specified?
Desired level of precision
Short answer tests: how many blanks should be included? How long should they be?
Only 1 blank included
Should be long enough to write out answer
Otherwise, becomes dead giveaway
Short answer tests: where should blanks be included?
At the end of the sentence
Typical response tests: what should be covered?
Focus items on experiences (thoughts, feelings, behaviors)
Limit items to a single experience
Typical response tests: what kinds of questions should be avoided?
Items that will be answered universally the same
Leading questions
Typical response tests: how should response scales be constructed?
If neutral option is desired, have odd numbered scale
High numbers shouldn’t always represent the same thing
Options should be labeled as Likert-type scale (rating from 0-7, etc.)
Pilot testing
Test on a few people
Get feedback
Practice scoring
Assess problem areas
Large scale testing
Develop norm sets
Evaluate reliability, validity, factors
What hypotheses for planning assessment are based on
Referral question
Presenting concerns
Intake interview results
Behavioral observations
Typical intake interview
Presenting concerns (must start with what client was asking for)
Case history
Diagnostic questions
Mental status exam
2 sources of information gained from interviews
Content (what is said; thoughts and feelings) Behavioral observations (what is displayed)
Things examined in behavioral observations
General appearance and behavior
Mood and affect
Sensorium (awareness of situation)
Perception (vision, hearing, etc.: influence what tests are administered)
General intelligence
Higher cognitive functions (speech and form of thought, insight and judgment, memory, attention and concentration)
How to build rapport with a client
Comfortable atmosphere
Collaborative stance
Acceptance, understanding, empathy, respect
2 types of questions in an interview
Close-ended (produce 1 or 2- word answers; used to gather specific information)
Open-ended (require longer answers; gather lots of information)
Clarification
Questioning client to gain additional understanding from an ambiguous answer or confirm accuracy of clinician’s perception
“Are you saying that…” “Could you describe for me…” “Say what you mean by…”
Reflection
Describing feelings of client’s message to encourage the client to continue to express their feelings, have the client feel the emotion more intensely, and help the client become more aware of and discriminate between their feelings
Paraphrasing
Describing the content of the client’s message to provide opportunity for client to clarify, encourage client to expand on thoughts, and provide an opportunity to redirect client to central topic
Summarizing
Two or more paraphrases/reflections that condense the client’s message to tie together multiple elements in a common theme, interrupt excessive talking, and review progress
Affirmations
Directly affirming and supporting the client through the interview process to acknowledge the client’s struggles and build rapport
Must be careful not to overuse (can sound disingenuous)