Study Guide Exam 2 (Assessment and Diagnosis) Flashcards
Norm samples: what they need to be
Representative of the population taking the test
Consistent with that population
Current (must match current generation)
Large enough sample size
Flynn effect
Intelligence increases over successive generations
In order to stay accurate, intelligence tests must be renormed every couple of years
Types of norm samples
Nationally representative sample (reflects society as a whole)
Local sample
Clinical sample (compare to people with given diagnosis)
Criminal sample (utilizing criminals)
Employee sample (used in hiring decisions)
Ungrouped frequency distributions
For each score/criteria, number of people/items that fit criteria are listed
Grouped frequency distributions
Scores are grouped (ex: 90-100) and number of people whose scores lie in that range are listed
Frequency graphs
Histograms
Mean
Arithmetic average
Median
Point that divides distribution in half
Mode
Most frequent score
Which measure of central tendency to pick
Normal distribution: mean
Skewed distribution: median
Nominal data: mode
Positions of mean and median in positively and negatively skewed distributions
Positively skewed (right skewed): mean is higher than median Negatively skewed (left skewed): median is higher than mean
Standard deviations
Average distance of scores and how far they vary from mean
Raw scores
Number of questions answered correctly on a test
Only used to calculate other scores
Percentile ranks
Percentage of people scoring below
z scores
M=0
SD=1
t scores
M=50
SD=10
IQ scores
M=100
SD=15
Content sampling error
Difference between sample of items on test and total domain of items
Time sampling error
Random fluctuations in performance over time
Can be due to examinee (fatigue, illness, anxiety, maturation) or due to environment (distractions, temperature)
Interrater differences
When scoring is subjective, different scorers may score answers differently
Test-retest reliability
Administer the same test on 2 occasions
Correlate the scores from both administrations
Sensitive to sampling error
Things to consider surrounding test-retest reliability
Length of interval between testing
Activities during interval (distraction or not)
Carry-over effects from one test to next
Alternate-form reliability
Develop two parallel forms of test
Administer both forms (simultaneously or delayed)
Correlate the scores of the different forms
Sensitive to content sampling error (simultaneous and delayed) and time sampling error (delayed only)
Things to consider surrounding alternate-form reliability
Few tests have alternate forms
Reduction of carry-over effects
Split-half reliability
Administer the test
Divide it into 2 equivalent halves
Correlate the scores for the half tests
Sensitive to content sampling error
Things to consider surrounding split-half reliability
Only 1 administration (no time sampling error)
How to split test up
Short tests have worse reliability
Kuder-Richardson and coefficient (Cronbach’s) alpha
Administer test
Compare each item to all other items
Use KR-20 for dichotomous answers and Cronbach’s alpha for any type of variable
Sensitive to content sampling error and item heterogeneity
Measures internal consistency
Inter-rater reliability
Administer test
2 individuals score test
Calculate agreement between scores
Sensitive to differences between raters
High-stake decision tests: reliability coefficient used
Greater than 0.9 or 0.95
General clinical use: reliability coefficient used
Greater than 0.8
Class tests and screening tests: reliability coefficient used
Greater than 0.7
Content validity
Degree to which the items on the test are representative of the behavior the test was designed to sample
How content validity is determined
Expert judges systematically review the test content
Evaluate item relevance and content coverage
Criterion-related validity
Degree to which the test is effective in estimating performance on an outcome measure
Predictive validity
Form of criterion-related validity
Time interval between test and criterion
Example: ACT and college performance
Concurrent validity
Form of criterion-related validity
Test and criterion are measured at same time
Example: language test and GPA
Construct validity
Degree to which test measures what it is designed to measure
Convergent validity
Form of construct validity
Correlate test scores with tests of same or similar construct to determine
Discriminant validity
Form of construct validity
Correlate test scores with tests of dissimilar construct to determine
Incremental validity
Determines if the test provides a gain over another test
Face validity
Determines if the test appears to measure what it is designed to measure
Not a true form of validity
Problem with tests high in these: can fake them
Type of material that should be used on a matching test
Homogenous material (all items should relate to a common theme)
Multiple choice tests: what kinds of stems should not be included?
Negatively-stated ones
Unclear ones
Multiple choice tests: how many alternatives should be given?
3-5
Multiple choice tests: what makes a bad alternative?
Long
Grammatically incorrect in question
Implausible
Multiple choice tests: how should placement of correct answer be determined?
Random (otherwise, examinees can detect pattern)
Multiple choice tests, true/false tests, and typical response tests: what kind of wording should be avoided?
“Never” or “always” for all 3
“Usually” for true/false
“All of the above” or “none of the above” for multiple choice
True/false tests: how many ideas per item?
1
True/false tests: what should be the ratio of true to false answers?
1:1
Matching tests: ratio of responses to stems?
More responses than stems (make it possible to get only 1 wrong)
Matching tests: how long should responses and lists be?
Brief
Essay tests and short answer tests: what needs to be created?
Scoring rubric
Essay tests: what kinds of material should be covered?
Objectives that can’t be easily measured with selected-response items
Essay tests: how should grading be done?
Blindly
Short answer tests: how long should answers be?
Questions should be able to be answered in only a few words
Short answer tests: how many correct responses?
1
Short answer tests: for quantitative items, what should be specified?
Desired level of precision
Short answer tests: how many blanks should be included? How long should they be?
Only 1 blank included
Should be long enough to write out answer
Otherwise, becomes dead giveaway
Short answer tests: where should blanks be included?
At the end of the sentence
Typical response tests: what should be covered?
Focus items on experiences (thoughts, feelings, behaviors)
Limit items to a single experience
Typical response tests: what kinds of questions should be avoided?
Items that will be answered universally the same
Leading questions
Typical response tests: how should response scales be constructed?
If neutral option is desired, have odd numbered scale
High numbers shouldn’t always represent the same thing
Options should be labeled as Likert-type scale (rating from 0-7, etc.)
Spearman
Identified a general intelligence “G”
Underlies everything else about you
Cattell-Horn-Carroll
10 types of intelligence theory
3 abilities incorporated by most definitions of intelligence
Problem solving
Abstract reasoning
Ability to acquire knowledge
Original determination of IQ (used by Binet)
Mental age/chronological age * 100
How IQ is currently determined
Raw score compared to age/grade appropriate norm sample
M=100, SD=15
Why professionals have a love/hate relationship with intelligence tests
Good: reliable and valid (psychometrically sound, predict academic success, fairly stable over time)
Bad: limited (make complex construct into 1 number), misunderstood and overused
Group administered tests: who administers and who scores?
Standardized: anyone can administer (teachers, etc.), but professionals interpret
Group administered tests: content focuses on which skills most?
Verbal skills
Examples of group-administered aptitude tests
Otis-Lennon School Ability Test
American College Test (ACT)
Individually administered tests: how standardized?
Very standardized
No feedback given during testing regarding performance or test
Additional queries only when specified (only can say “Tell me more about that.”)
Answers are recorded verbatim
Individually administered tests: starting point
Starting point determined by age/grade
Reversals sometimes needed (person gets 1st question wrong: must back down in level)
Individually administered tests: ending point
Testing ends when person answers 5 questions wrong in a row
Individually administered tests: skills tested
Verbal and performance
3 individually administered IQ tests for adults
Wechsler Adult Intelligence Scale (WAIS; most commonly used)
Stanford-Binet
Woodcock-Johnson Tests of Cognitive Abilities
Child version of Wechsler Adult Intelligence Scale
Wechsler Intelligence Scale for Children (WISC)
WAIS: subtests and index scores
15 subtests combine to make 4 index scores: Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), Processing Speed Index (PSI)
4 index scores combined to make Full Scale IQ score
WAIS: norm set
Older teenagers to elderly
WISC: basics
2-3 hours to administer and score
Administered by professionals
Normed for children elementary-aged to older adolescence
Stanford-Binet: norm set
Young children to elderly
Stanford-Binet: IQ scores
3 composite IQ scores: verbal IQ, nonverbal IQ, full scale IQ
Score range difference between WAIS/WISC and Stanford-Binet
Stanford-Binet: possible to score higher than 160 (not possible for WAIS or WISC)
Woodcock-Johnson: norm set
Young children to elderly
What Woodcock-Johnson is based on
Cattell-Horn-Carroll theory of 10 types of intelligence