Assessment Flashcards
Item difficulty index
percentage of people who got an item CORRECT
The lower the score, the more difficult the question is
Who Americanized the Binet?
Lewis Terman
at Stanford University
thus, Stanford-Binet
The Buckley Amendment
AKA FERPA (Family Education Rights and Privacy Act of 1974)
Those over 18 can view their school record (including test data)
Can view their children’s test data
Can demand corrections to their file
Educational testing information cannot be released without adult consent
Projective tests may also be called
self-expressive (e.g., sentence completion or word association)
reactivity
clients/participants monitoring their own behavior and thus giving inaccurate answers
How can you increase reliability?
Increase the test’s length
Spearman-Brown Prophecy formula
used to estimate the impact that lengthening or shortening a test will have on a test’s reliability coefficient
when estimating split-half reliability, the Spearman-Brown prophecy formula can be used to compensate mathematically for the shorter length
Aptitude-achievement tests
GRE, MAT, MCAT, SAT
Ex: GRE attempts to predict graduate school performance but also tests level of current knowledge
Generally, school selection tests assess
aptitude
The ACA division for testing
AMECD
Association for Measurement and Evaluation in Counseling and Development
Interests and abilities are ____ correlated.
not highly
Bender Visual Motor Gestalt Test
named after Lauretta Bender
expressive projective measure, though known most for its ability to discern whether brain damage is present. Suitable for ages 4+ — client copies 9 geometric figures
Interest inventories
Work best with high-school age and beyond
Interests are not stable until age 25
Aptitude tests
assess Potential and Predict (aPtitude)
Tests that analyze data outside of a given theory
factor-analytic tests
Raymond Cattell
developed 16 Personality Factors
Responsible for defining fluid and crystallized intelligence
James Cattell
coined the term “mental test”
Projective tests use one of 3 formats
Association (word)
Completion (sentence)
Construction (draw a person)
Projective tests use ___ stimuli
vague
MMPI-2 for adolescents
MMPI-A
suitable for 14 to 18 y.o.s
Arthur Jensen
tremendous controversy for his 1969 Harvard Educational Review article
Said Whites score 11-15 IQ points higher than Blacks because due to slavery, Blacks were bred for strength rather than intelligence
Said that heredity contributes 80% to IQ and environment only 20%
Robert Williams
made the BITCH (Black Intelligence Test of Cultural Homogeneity) to demonstrate that Blacks often excel when given a test with questions familiar to their community. Argued that IQ tests were part of “scientific racism”
John Ertl
weirdo who claimed he invented an electronic machine to take the place of paper and pencil IQ tests. It literally had a strobe light on it.
Group tests are ___ accurate and have __ reliability, compared to individual tests
less accurate and less reliable
Means and SDs of Weschler and Binet IQ tests
Difference between the two
Weschler: M-100 SD-15
Stanford-Binet: M-100 SD-16
Binet seems to be not the best for adults and so Weschler is most popular
Forms of the Weschler IQ tests
WPPSI - preschool and primary; ages 2.6-7
WAIS - age 16+
WISC - for children; 6-16.11 years
A 9 year old task on the Binet is one that X of 9-year-olds could answer correctly
50%
Today’s Binet is scored…
with a Standard Age Score (SAS)
Mean of 100
SD of 16
IQ is calculated by
Mental Age / Chronological Age X 100
Alternatives to the split-half method of measuring internal consistency (inter-item consistency) of a test
Cronbach’s alpha
Kuder-Richardson-20 or KR-21
Cross-validation
When a researcher further examines a test’s criterion validity by administering the test to a new sample.
This helps ensure the test is applicable to other populations who will take the exam. Helps guard against error factors, which are likely to be present if the original sample was small.
The cross-validation coefficient will likely be smaller than the initial validity coefficient. This is called “shrinkage”
J. P. Guilford
isolated 120 factors that added up to intelligence
Two of the dimensions are divergent thinking (coming up with new ideas) and convergent thinking (when divergent thoughts and ideas are combined into a singular concept)
Charles Spearman
in 1904, said that two factors were applicable to any mental task:
G - general ability
S - specific ability
Francis Galton
cousin of Darwin! first intelligence theory believed intelligence was a single "unitary" factor and that exceptional abilities were genetic and ran in families eugenics :/ Hereditary Genius (1869)
coefficient of determination
variance of one factor accounted for another;
square the correlation
ex: same test is given to the same group of people and the correlation between the administrations is .70. The % of shared variance is .70 squared, which is .70x.70 = .49 (49%)
For psychological tests, an acceptable reliability coefficient is X. For admissions to jobs/schools (achievement), it is X.
Psycholgical - .70 reliability is good
For admissions to jobs/schools (achievement) - .80 or even .90
A reliability coefficient of .70 means…
70% of the obtained score on the test represented the true score
30% of the obtained score could be accounted for by error
AKA 70% is true variance while 30% is error variance.
(NOT that 70% of ppl who are tested will get their true score)
A reliable test is ___ valid.
A valid test is ___ reliable.
A reliable test is not always valid.
A valid test is always reliable.
Incremental validity (2 definitions)
The process by which a test is refined and becomes more valid as contradictory items are dropped
ALSO refers to a test’s ability to improve predictions when compared to existing measures. When a test has incremental validity, it gives you additional good info that wasn’t available from other tests.
According to the 1974 committee that drafted Standards for Education and Psychological Tests, face validity is ___
not required
A construct is any trait that ___
you cannot measure or observe directly
What is the #1 consideration in test construction
Validity
5 types of validity
content validity construct validity concurrent validity predictive validity consequential validity
content validity
AKA rational or logical validity
does the test examine the behavior under scrutiny?
construct validity
a test’s ability to measure a theoretical construct (like intelligence, self-esteem, etc)
predictive validity
AKA empirical validity
test’s ability to predict future behavior
concurrent and predictive validity may be lumped under ___ validity.
criterion validity, which is an estimate of the extent to which a measure agrees with a gold standard (i.e., an external criterion of the phenomenon being measured)
concurrent validity
relationship between an instrument’s results and another currently obtainable criterion (give a new depression assessment to people you already know are depressed)
consequential validity
tries to ascertain the social implications of using tests
horizontal vs vertical tests
horizontal - assess for different things (math, language)
vertical - versions for different age brackets or levels of education (preschooler, middle-school math assessments)
spiral test
the items get progressively more difficult
cyclical test
you have several sections that spiral in nature, the items within each spiral get progressively more difficult
ipsative test
does not measure absolute strengths
measures a person’s progress in relation to themselves
comparing their score to another person’s is meaningless
items are independent of one another
convergent validity
established when measures of constructs that theoretically should be related are observed to be related
e.g., scores on GAD are related to another anxiety measure
discriminant validity
established when measures of constructs that are not theoretically related are observed to have no relationship
standard error of estimate
a statistic that gives the expected margin of error in a predicted criterion score due to the imperfect validity of the test
validity coefficient
correlation between a test score and the criterion measure
a person’s observed score (X) = ?
true score + error
X = T + e
standard error of measurement
SEM
used to estimate how scores from repeated administrations of the same instrument to the same individual are distributed around the true score. SEM is computed using the SD and reliability coefficient:
SEM = SD{sq rt of (1 - r)}
factors that influence reliability
test length
homogeneity of test items (reliability goes up when items are homogenous)
range expansion (reliability is lowered by a restriction of range)
heterogeneity of test group (higher reliability)
speed tests (high reliability because nearly everyone gets everything right)
reliability coefficient
reliability is expressed in this coefficient
closer to 1.00, the more reliable the scores
NOIR
nominal scale - no order or equal intervals
ordinal - order, but no equal intervals
interval - equal intervals, but no true 0
ratio - equal intervals, true 0
Semantic differential
Good _ _ _ _ _ _ Bad
place a mark between where they feel
Like a Likert scale but no #s?
Thurstone scale
Agree or Disagree only
Guttman scale
measures the intensity of a variable because items are presenting in a progressive order so that a respondent who agrees with one statement will also agree with all previous, less extreme items
percentile rank
indicates the % of scores falling at or below a given score
range from 1 to 99+ and have a mean of 50
z-score
mean = 0 SD = 1
z = (X - M)/SD
T-score
mean = 50 SD = 10
T = 10(z) + 50
deviation IQ
also known simply as standard score (SS) because they are used to interpret scores from achievement and aptitude tests
mean = 100 SD = 15
SS = 15(z) + 100
stanine
mean = 5
SD = 2
range from 1 to 9
round up to a whole #
stanine = 2(z) + 5
normal curve equivalent
developed for US department of education and used to measure student achievement
1 to 99
mean = 50
SD = 21.06
NCE = 21.06(z) + 50
___ tests are usually used in high stakes testing
criterion-referenced (have you learned X curriculum)
Mental Status Exam
AAMMTPTJI Appearance Attitude Movement and behavior Mood and affect Thought content Perceptions Thought process Judgment and insight Intellectual functioning and memory
suicide assessment acronyms - 3
PIMP (Plan, Intent, Means, Prior attempts)
SLAP (Suicidal ideations, Lethality, Access, Plan)
SAD PERSONS (sex, age, depression, previous attempt, ethanol abuse, rational thought loss, social supports lacking, organized plan, no spouse, sickness)
types of test bias
examiner bias - examiner’s beliefs or behavior influence test administration
interpretive - interpretation of results is unfair
response - when clients answer one thing to all questions
situational - testing conditions
ecological - global systems affect (e.g., giving all students a test in English)
Army Alpha vs. Army Beta
Alpha - English speakers
Beta - non-English speakers
used to test intelligence of military recruits during WWII
Arthur Otis
developed the first scientifically reliable intelligence test for groups
Otis Group Intelligence Test
Frank Parsons
father of vocational guidance and counseling
NBCC and ACA ethical guidelines for assessment
- competence to use and interpret
- informed consent
- release of results to qualified professionals
- instrument selected
- conditions of administration
- scoring and interpretation of assessments
- obsolete assessments and outdated results
- assessment construction
the Joint Committee on Testing Practices (JCTP) developed…
Rights and Responsibilities of Test Takers
Test User Qualifications
Code for Fair Testing Practices in Education
IDEA
Individuals with Disabilities Education Improvement Act of 2004
rights of students with disabilities to receive testing at the expense of the public school testing
right to an IEP (individual education program)
ADA
Americans with Disabilities Act (1990)
employment testing must accurately measure a person’s ability to perform relevant job tasks
people with disabilities get appropriate accommodations for testing
Carl D. Perkins act
Vocational and Technical Education Act of 1984
provides vocational assessment, counseling, and placement for low SES, disabled, single parents, those with limited English proficiency, incarcerated individuals
Civil Rights Act of 1964 and 1972, 1978, and 1991 ammendments
assessments used to determine employability must relate strictly to the duties outlined in the job description and cannot discriminate based on race, color, religion, pregnancy, gender, or origin
criterion validity
effectiveness of an instrument in predicting an individual’s performance on a specific criterion
item discrimination
Performance of the top quarter of total scores minus the bottom quarter
An item has good discrimination when high-scorers get it right and low-scorers get it wrong (positive item discrimination)
items with 0 and negative item discrimination are poor
classical test theory
observed score = true score + error
item response theory
importance of applying mathematical models to the data collected from assessments to see how well individual items work
AKA modern test theory
construct-based validity model
AKA unified construct model
validity is a holistic construct, it doesn’t have specific components like classical test theory would believe it has (e.g., the 3: content, criterion, and construct validity)
what are the 3 types of test theory
classical item response (AKA modern) construct-based validity model
criterion-referenced assessment
provide info about a person’s score by comparing it to a predetermined standard or set criterion
e.g., A = 90-100; B=80-90, and so on
NCE and CPCE are criterion-referenced assessments
as opposed to norm-referenced tests which make meaning by comparing a person’s score to the norm group
achievement vs. aptitude tests
achievement - what one has learned at the time of testing
aptitude - what a person is capable of learning (GRE, SAT)
ASVAB
Armed Services Vocational Aptitude Battery
the most widely used multiple aptitude test in the world. Measures aptitude for military and civilian jobs
Luis Thurston
unlike Charles Spearman’s two-factor approach to intelligence (g, s - general and specific factors), Luis Thurston identified 7 mental abilities
Howard Gardner
theory of multiple intelligences — 8
Cattell-Horn-Carroll (CHC)
theory of cognitive abilities - the most empirically validated theoretical model of intelligence
intelligence is hierarchical and consists of 3 strata:
general intelligence “g”
broad cognitive abilities
narrow cognitive abilities
high-stakes testing usually uses ___ assessment
criterion-referenced
performance assessments
non-verbal form of assessment
client completes a task
good for foreign language speakers
ex: Draw-a-Man test; (Raymond) Cattell Culture Fair Intelligence Test; Test of Non-Verbal Intelligence (TONI)
computer-adaptive testing
the computer adapts the test structure and items to the examinee’s ability level
ex: GRE
the 3 main types of validity
content
criterion
construct
____ is the most widely used intelligence test
Weschler scales