final Flashcards
face validity
does a test appear to measure what it was designed to measure; lay-person judgement
how do content & face validity differ?
content involves systematic and technical analysis
face is more superficial
criterion validity
the extent to which a measure agrees with a gold standard; whether it matches a measure of some attribute or outcome that is of primary interest (criterion)
types of studies: criterion validity
predictive vs concurrent
predictive studies
take the test today and test the criterion some time down the road
drawbacks to predictive validity studies
time, money, issues from time lag
concurrent studies
test and criterion done at the same time
when should you use predictive vs concurrent studies?
if goal is prediction -> predictive
if goal is to determine current status -> concurrent
criterion contamination
when criterion measures more dimensions than by those by test
do scores on the predictor influence criterion scores?
techniques for interpreting validity coefficients
(1) sig level; did not occur by chance (p value)
2) coefficient of determination (R^2
what if your validity coefficient is small?
if a test provides info that helps predict criterion performance better than any other existing predictor the test may be useful even if coefficient is relatively small
linear regression
a mathematical procedure that allows us to predict values on one variable if we know values on the other
standard error of estimate
a stat that reflects the average amount of error in our prediction and that allows us to make confidence statement
decision theory models
when tests are used for making decision such as personnel selection; factors others than the correlation between test and criterion are important
decision theory models: selection ratio
proportion of applicants needed to fill position
decision theory models: base rate
proportion of applicants who can be successful candidates
model sensitivity
metric that evaluates ability to predict true positives of each available category
A/A+C
A= true positive C= false negative
model specificity
metric that evaluates ability to predict true negatives of each available category
D/B+D
B= false positive
D=true negative
evaluating validity coefficients
- look for changes in the cause of relationships
- what does the criterion mean?
- review the subject pop of validation study
- be sure sample size was adequate
- never confuse criterion with predictor
- check for restricted range on both predictor and criterion
- review evidence for validity generalization
- consider differential prediction
construct validity
extent to which evidence can be provided that test measures a theoretical construct
Campbell & Fiske’s types of validity evidence
convergent and discriminant
types of convergent evidence
(1) does test measure same thing as other tests used for same purpose
(2) does test correlate with specific variables that we can expect if it is doing its job
validation study
two or more constructs measured in two or more ways
what can validation studies tell us?
convergent and discriminate validity
homogenity and unidimensionality
evidence of validity based on response process
involves an analysis of the fit between the performance and actions the test takes actually engage in and the construct being assessed
e.g., interview, behavioural indicators (RT, eye gaze)
evidence based on consequences of testing
were the intended benefits of testing achieved?
ways of getting evidence of validity
(1) test content
(2) relations to other variables (criterion)
(3) internal structure
(4) response processes
(5) consequences of testing
factor analysis
any of several stat methods describing the interrelationships of a set of variables by stats deriving new variables, called factors, that are fewer in number than the original set of variables
types of factor analysis
exploratory and confirmatory
if alpha is lower than expected, there might be ______ and you might want to do _____
heterogeneity
factor analysis
steps in factor analysis
(1) extraction (how many groups?)
2) rotation (average correlation between items and factor itself
purposes of assessment in education
- how well is a student learning?
- assess whether class, grade, school, district, region is learning content
- method to detect learning problems
- method for identifying giftedness
- determine if child is ready to move to next level
- assess teacher effectiveness
- determine readiness/placement in college, grad school, professional school
- credential exams
achievement test
assess learned information; evaluate the effects of a KNOWN or controlled set of experiences
what type of validity procedures does achievement testing rely on?
heavily on content validation procedures
aptitude test
assess ability to learn something; evaluate the effects of UNKNOWN or uncontrolled experiences
what type of validity procedures does aptitude testing rely on?
heavily on predictive criterion validation procedures
goal of classroom testing
measure the extent to which students have learned the facts, concepts, procedures, and skills that have been taught
effective classroom tests
students who have learned more will obtain higher scores and students who have learned less will obtain lower scores. to be an effective test, a test must consist of effective items
types of classroom achievement tests
constructed and selected
Bloom’s taxonomy: levels of understanding
(1) knowledge
(2) comprehension
(3) application
(4) analysis
(5) synthesis/create (switched with 6?)
(6) evaluation
item difficulty index
right v wrong questions: percentage or proportion of test takers who correctly answer the item
item difficulty index: too hard
.0 -.2
item difficulty index: too easy
.9 - 1
item difficulty indeces are:
sample dependent and after the fact
on constructed response tests with two options, what is the optimal mean p value?
.50 (about half the class gets it right)
item discrimination: right and wrong Qs
Pt - Pb
item discrimination: good discriminatory
lower % of bottom quarter of class got it correct than top quarter of class
item discrimination: bad discriminator
bottom and top quarter of class did equally well on question
Examples of achievement tests
(1) Wechsler individual achievement test
(2) standford achievement test
(3) Iowa test of basic skills
(4) metropolitan achievement test
Wechsler individual achievement test (WAIT)
z-scores, percentile ranks, stanines
norms for grades and age
all ages (above 4)
45 min - 2 hours
- longer for adults than kids
gifted? learning difficulties?
high reliability
Stanford Achievement test
individual test
1923
K-12
math, writing expression, understanding of patterns, reading comprehension
high reliability
evidence for construct validity
Iowa test of basic skills
general achievement tests
K-8?
better for lower end of distributions?
shorter than others
metropolitan achievement test
classified as achievement test, but has some aptitude components
examples of diagnostic tests
(1) wide range achievement test 4 (the rat)
(2) peabody individual achievement test
(3) woodcock reading mastery test
(4) kaufman test of educational achievement
(5) canada quick individual achievement test
(6) canada french immersion achievement test (C-FIAT)
wide range achievement test 4 (the rat)
diagnostic test
basic academic skills
good for 5-98
individual admin
longer time frame for older people
readiness tests
intended to assess a child’s readiness to enter school or move forward
issues with readiness tests
(1) children change rapidly
(2) predictive ability is weak
(3) cultural/language biases
range rule
standard deviation should be around: (max response-min response)/4
examples of aptitude tests: cognitive ability
(1) otis-lennon school ability test
(2) cogAT
(3) SAT-I
(4) ACT
(5) GRE; GMAT; LSAT; MCAT
issues with grad school tests
don’t predict success and differentially predict for different groups
advantages and disadvantages of intelligence testing
advantages: helps identify/define problem
disadvantages: cultural bias, limited info
three research traditions
(1) psychometric, (2) information processing, (3) cognitive
binet: intelligence
tendency to take and maintain a definite direction, the capacity to make adaptations for the purpose of attaining a desired end and the power of auto-criticism
binet: principles of test construction
(1) age differentiation
(2) general mental ability
binet’s age differentiation
we should be able to distinguish between people (especially) children of different ages
IQ = MA/CA * 100
max mental age was 19.5 (problem)
routing procedure
start test based on chronological age, administrator moves to more challenging items as appropriate
Standford Binet - 5
intelligence test
appropriate for a broad range of 2 to 85+ years, providing one assessment for all ages (recommend waiting until school age)
provides comprehensive coverage of five factors of cog ability
(1) fluid reasoning
(2) knowledge
(3) quantitative processing
(4) visual-spatial processing
(5) working memory
assessed verbally and non-verbally
scores: full scale IQ, verbal IQ, nonverbal IQ, routing score (start point), individual scores for each scale (verbal and non-verbal)
goodenough-harris drawing test (G-HDT)
non-verbal intelligence test
group or individually administered
standardized
reliability ranges in high .60’s to low .90s
wechsler: intelligence
aggregate or global capacity of the individual to act purposefully, to think rationally, and to deal effectively with his environment
wanted to focus more on adults (unlike Binet)
factors that influence performance on intelligence tests
(1) general intelligence, (2) general, (3) specific, (4) influencing factors
performance on intelligence tests: general
comprehend, follow direction, respond verbally, understand english
performance on intelligence tests: specific
concentration, memory, reasoning
performance on intelligence tests: influencing factors (not measured directly)
interests, occupation, confidence, arithmetic skills/knowledge
differences between Binet and Wechsler
point scale concept
inclusion of performance scale
challenging age differentiation of Binet
IQ = attained or actual score/ expected mean score for age
doesn’t max out like binet
WAIS-IV
qualification level: C
completion time; 60-90 min for core subtests
ages 16-90
IQ mean = 100; SD = 15
full scale IQ, 4 indices, individual subtests (e.g., arithmetic) with intellective and non-intellective components
pattern analysis
strengths and weaknesses
normative sample = 2200 (US)
high reliability and good evidence of validity
raven progressive matrices
one of the best known and most popular
can be administered to group or individuals
from 5 years of age to elderly
used throughout the world
respectable reliability coefficients: high .70 - .90
last revisions to the manual 1998 with impressive set of norms
has been tested with various cultural groups shown to historically score lower on binet and wechsler scales
culture fair intelligence tests & an example
one purpose of nonverbal and performance tests is to remove factors related to cultural influences that often disadvantage test takers’ performance
RPM (Raven’s progressive matrices) comes close to being culture fair
IPAT culture fair intelligence test
catell pencil and paper (fluid intelligence in children)
Gardner theory of multiple intelligences
intelligence is not unitary; it is the ability to solve problems or to create products that are valued within one or more cultural settings
Gardner’s types of intelligences
(1) linguistic
(2) logical-mathematical
(3) spatial intelligence
(4) bodily-kinesthetic
(5) musical intelligence
(6) interpersonal
(7) intrapersonal
(8) naturalistic
who developed idea of emotional intelligence (EQ)?
Peter Salovey (Yale); followed up by Goleman
clusters within emotional intelligence
(1) abstract
(2) concrete
(3) social
emotional intelligence has its roots in ____
social intelligence
EQ includes:
(1) being aware of one’s own emotions
(2) able to manage one’s own emotions
(3) sensitive to the emotions of others
(4) able to respond to and negotiate with other people emotionally
(5) use one’s own emotions to motivate oneself
EQ allows us to:
regulate emotions and problem solve
what did Goleman include in EQ?
conscientiousness, self-confidence, optimism, communication, leadership and initiative
Examples of reasons for neuropsychological testing
dementia, alzheimers, concussion, brain injury, ALS, parkinson’s, stroke, epilepsy, brain tumour, infection
quick and dirty assessment neuropsych tool
glasgow coma scale (GCS)
neuropsychological testing
application of a set of standardized procedures designed to assess and quantify brain function as expressed in over beh
leads to additional inferences regarding the covert processes of the brain
difference between neuropsych testing and general intelligence measures
neuropsych tests tend to be more highly specific in what they measure
components of neuropsych testing
- all (or at least a sig majority) of a patient’s relevant cog skills or higher order info processing skills should be assessed
- testing should sample the relative efficiency of the right and left hemispheres of the brain
- testing should sample anterior and posterior regions of cortical function (posterior mostly receptive)
- testing should determine the presence of specific deficits
- should determine the acuteness versus the chronicity of any problems or weaknesses
- testing should locate intact complex functional systems
- testing should assess affect, personality, and behaviour
- test results should be presented in ways that are useful in a school or work environment, to acute care or intensive rehabilitation facilities or to physicians
two conceptual approaches: neuropsych testing
(1) fixed battery approach
(2) non-fixed
example of a fixed battery approach to neuropsych testing
halstead-reitan neuropsych test battery
focuses on key behavioural correlates of brain function
non-fixed battery approach to neuropsych testing
use of a flexible combo of traditional psych and educational tests
e.g., boston process approach
can include qualitative stuff
conceptual model of brain-behaviour relationships
- sensory input
- attention and concentration
- learning and memory
- language
- spatial and manipulatory ability
- executive functions (logic, concepts, reasoning, planning, flexibility)
- motor output
example of motor function tests
finger tapping
grip strength
grooved pegboard
four factors of mental processing
(1) focus execute, (2) sustain, (3) encode, (4) shift
advantages of interviews
- get unique information
- participants can elaborate
- personal and meaningful experience
- report and relationship building
- rich info; detail
disadvantages of interviews
- harder to anlayze
- possible discomfort of participant
- not honest, not best performance
- time and resources
- introduction to bias
- individualized/subjective
- limited generalizability
types of interviews
(1) structured (highly)
(2) guided/semi-structured
(3) non-directive or unguided
initial intake interview
- demographic data
- reason for referral
- past medical history
- present med condition
- familial medical history
- past psych history
- past history with medical or psych professionals
- current psych conditions
potential biases in interviews
(1) confirmation bias
(2) self-fulfilling prophecy
(3) ethnocentrism
ineffective interviewing
judgmental and evaluative statements, probing questions, false reassurance
effective studying
attitude is warm and authentic, open-ended questions, measuring understanding
interviews: measuring understanding
levels 1-5 ?
sources of error in interviews
interview validity interview reliability (length of session)
personality
an individual’s unique constellation of psych traits that is relatively stable over time
personaltiy traits
distinguishable, relatively enduring ways in which one individual varies from another
personality types
a constellation of traits
continuum thinking is in contrast to this
personality assessment methods
(1) objective measures, (2) projective measures, (3) behaivoural assessment
MMPI
purpose: to aid in diagnosis of psychopathology for adults 14 years and older
developed for abnormal personality
566 true/false items
originally criterion keyed items
criterion keyed
way of developing items by how well they discriminate between different groups (e.g., psych pops vs non-psych pop)
validity (Messick)
“an integrated eval judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores of other modes of assessment”
the appropriateness or accuracy fo the interpretation of test scores
threats to validity
(1) construct (internal) underrepresentation, (2) construct-irrelevant variance (external), (3) examinee characteristics, (4) test admin and scoring, (5) instruction and coaching
construct underrepresentation
not all aspects of construct are represented
relationship between reliability and validity
reliability is necessary but not sufficient for validity
reliability restricts validity coefficients
\sqrt{rel}= max validity coeff