Test1 Flashcards
Don't Fail
Define Test
A measurement instrument that consists of a SAMPLE OF BEHAVIOR obtained under STANDARDIZED conditions and evaluated using established scoring rules
3 Test User Qualifications
Level A: Limited range ex: achievement/educational; No specialized training required except maybe a bachelors degree
Level B: Some specialized training req.; ex: aptitude/personality; req. masters deg. and course work on testing
Level C: Extensive training req.; ex: intelligence/projective; advanced training and doctoral degree plus licensure req.
2 Main Reasons for using Tests
- Efficiency
2. Objectivity
3 Uses of Tests
- Classification
- Research
- Diagnosis and treatment planning
5 Major Categories of Tests
- Mental Ability- cognitive functioning
- Achievement- what they know
- Personality- normal or psychopathological
- Interests, attitudes, and values- career
- Neuropsychological- CNS/Brain viewing
Major Source of Info About Tests
Published: tests in print
Unpublished: directory of unpublished mental measures
ETS Test Collection of both published and unpublished tests
2 Systematic Reviews
- Mental Measurements Yearbook
2. Test Critiques
5 Factors Affecting Responses to Assessment
- Motivation
- Anxiety
- Coaching
- Physical/Psychological Conditions
- Social Desirability
3 Levels of defining a variable
- Construct- general definition of a variable
- Measure- operational definition- often a test
- Raw Data- numbers resulting from the measure
Why do we convert raw scores to normed (z) scores?
Raw scores mean nothing! At least from z scores, they may be converted to t-scores or percentiles or whatever to make meaning of the data
Why is a normal curve important?
Everything is based on a normal curve because it is the assumed distribution- allows comparison to others
Why do we convert z-scores to standardized scores?
Z-scores are hard to interpret so we convert to get better meaning from the data compared to others
Age Norms
normal scores for a particular age
ex: normal height for a 10yo
Grade Norms
Normal scores for a particular grade
ex: typical reading level for 5th grade
National Norms
represents the entire national population- SAT/GRE or WISC-II
International Norms
Developed in the context of international studies of school achievement
Conveniency Norms
Groups from single geographic location- limited range
ex: Self-concept test based on 250 8th graders from a north east city
User Norms
based on groups who actually took the test
ex: SAT
Subgroup Norms
taken from total norm group: separate norms may be provided by sex, race, etc…
ex: Zac is in the 60th percentile nationally but 30th percentile in his group
Local Norms
scores compare nationally but also in relation to the scores of other people within the group
ex: how seniors this year compare to seniors of previous years
Difference between norm-referenced and criterion referenced
Norm-Referenced: representative sample of individuals to compare scores to each other
Criterion-Referenced: used to compare scores to a predetermined criterion/standard ex:licensure
X = T + E
From classical test theory
X= obtained score
T= True score
E= Error measurement
The consistency scores plus the score effect from inconsistency equals the obtained score
3 sources of unsystematic measurement errors
- Item selection (test content)- what questions should be included?
- Test administration- room temp, lighting, noise level
- Test Scoring- subjectivity in projective/essay tests
Reliability
consistency in measurement- NOT perfect or absolute but a matter of degree
Correlation
Quantitive magnitude and direction of relaitonship
Test-retest reliability
Reliability coefficient is obtained by giving the same test tot he same individuals on two separate occasions
Inter-Scores Reliablity
AKA inter-observer or inter-rater reliablity
2 or more scorers/raters work independently to keep from influencing each other
Alternate Form Reliability
2 or more forms of a test with a similar number of items, time limits, and content, given to the same examinees
Split-Half Reliability
Shows internal consistency by comparing the first half of the test to the second half
What does reliability coefficient tell us?
Degree of reliability - indicates the proportion of variance in obtained test scores that is accounted for by variability in the true score- with tests it also indicates consistency of obtained scores
Examples of standardized scores
Percentile, t-scores, stens, stanines
Validity
Measuring what we intent to measure- asking about validity of the interpretation of the score for a particular purpose as a matter of degree (not all or none)
ex: Rorschach scale for depression
Face validity
Does the test APPEAR to measure what it is intended to measure- not whether it actually does or not
Content Validity
Deals with relationship between content of the test and a well developed domain of knowledge/behavior- does the test cover a good representative sample of all possible contents of the domain
Application: education/employment
Criterion-Related Validity
Indicates the degree of relationship between the predictor (test) and criterion (level of performance the test is trying to predict). 2 kinds:
Concurrent: data collected before or at the same time as the test is given
Predictive: criterion data is collected after the test is given
Construct Validity
General validity of the measurement tool- does the intrument measure the construct it is intended to measure
ex: altruism and 3 qualities- you have to test whether those three things are actually qualities of altruism
Which is more important- reliability or validity?
Validity- who cares if it is reliable if it doesn’t measure what you need it to measure?
Sir Frances Galton
Founder of psychological testing; measured sensory characteristics in a big makeshift lab to evaluate mental ability
James McKeen Cattell
Furthered Galton’s research- his battery of tests led to conceptual grandfather of ACT/SATs- he coined the phrase “mental test”
Alfred Binet
Father of intelligence testing- first to actually use mental tests (rather than sensory) like word usage and connection
Lewis Terman
“Benchmark” of intelligence testing, when Stanford-Binet test came to USA for immigrants
Robert Yerkes
created terrible intelligence tests for WWI placement