testing and measurement 2 Flashcards
6 Steps to Test Development
1) defining purpose
2) preliminary design issues
3) item prep
4) item analysis
5) standardizing & research
6) prep of final product
Step one of test development
Statement of purpose, simple one sentence
-include character trying to measure, target
Preliminary Design Issues
Step to:
Mode of administration, length, item format, number of scores, score reports, administrator training and background research
Mode of Administration
Group or Individual
Item Format
multiple choice, true/false, agree or disagree, or constructed by the responder (written answers)
Number of Scores
Related to length, how many scores
Score Reports
computer generated, hand written? total score, norms, subgroups
Administrator Training
Extensive professional training to administer, score and interpret? How will that be provided? Or no training?
Background Research
standard lit on things being studied, and study of clinicians who would use the test
Anatomy of a Test Item
Stimulus, Response Format (Conditions Governing Response), Scoring Procedures
Stimulus
the question being asked
Response Format
how can the person respond? Multiple Choice or T/F or constructed (meaning anyway you want)
Constructed Response
The person taking the test respond in anyway they choose, written responses, free response
Conditions Governing the Response
what influences response, time limit, can the administrator ask for clarification, answer sheet or writing etc
Scoring Procedures
Partial credit, correct/incorrect, constructed response
Two Types of Test Items
Selected-Response Test Items, Constructed Test Items
Selected-Response Test Items
multiple choice, forced choice, likert format, true/false items
Scoring Selected-Response Items
correct/incorrect, sometimes using weighted questions
Constructed Response Example Items
Essay Test, Performance Assessment, Portfolio
Scoring Constructed-Response Items
need to have inter-rater reliability, and conceptualizing a scheme for scoring
Holistic Score
scoring constructed response items by the rater giving them one whole score
Analytic Scoring
constructed response item scoring where the rater assesses different dimensions of the test (and they might even be rated by different people)
Point System
Point system of scoring Constructed Response Items, awarding points for certain predetermined aspects of things
Automated Scoring of Constructed Response Items
Using sophisticated computers to judge free responses by simulating human judgement
Suggestions for Writing Selected-Response
Extensive List: but keep it simple, get to the point, don’t give away the answer
Suggestions for Writing Constructed-Response
Task Should be clear, specific about scoring system when item made, use a sufficient number of items
Pros of Selected Vs. Constructed
Scoring reliability, takes less time to get more information, the scoring is more efficient
Pros of Constructed Vs. Selected
easier to understand how test taker thinks, they can explore more personal difference (oddities that wouldn’t come up in selected response)
Item Analysis
involves item tryout, statistical analysis and item selection (figuring out which items are ‘good’ or ‘bad’)
Item Tryout
two stages, formal and informal
Formal Item Tryout
administering test items to samples of target population
Informal Item Tryout
very small groups of the population asked what they think about the items, or think aloud as they complete them
Item Difficulty
percent of population who gets something right or wrong
P-Values
the item difficulty levels are often called this, meaning the p (percentage) who got it right
Item Discrimination
Item’s ability to differentiate statistically between groups of examinees
Distractor Analysis
a distractor is an incorrect or non-preferred item, and analyzing those shows misinterpretation of question etc
Factor Analysis
Used to determine which items are going to provide better scores
Item Selection
Choosing which items that 1) increase reliability of test, 2) finding the right average difficulty, 3) items that can discriminate between groups and 4) D (discrimination) when P (difficulty) is at its midpoint, 5) make sure the content is actually covered, don’t eliminate important items
Standardizing Program
shows the norms for the test
Equating Programs
making sure tests equate to one another
Publishing Tests Materials
Technical Manuals, Score Reports, Supplementary Materials, Test Completed, Administrator Training
Continuing Research on New Tests
Updating new norms and discovering applicability
Two Classical Theories of Intelligence
Spearman’s g, and Thurstone’s primary mental abilities
Spearman’s Theory of Intelligence
Intelligence, g, is general intelligence. S were a variety of tests/subtests that made up g.
Two factor, g and s, theory
Thurstone
Primary Mental Abilities theory of intelligence
Primary Mental Abilities Theory of Intelligence
Thurstone’s, originally 9 mental abilities, a multiple-factor theory
The Original Nine Primary Mental Abilities
Spatial, Perceptual (speed of perception), numerical, verbal, memory, words, induction (finding a rule or principle to solve a problem), Reasoning (arithmetic), Deduction (factor weakly defined calling for application of a rule)
Hierarchical Model
Compromise, different intelligences are arranged with some more important than others
Cattell
Fluid and Crystallized Intelligence
Hierarchical Characteristics
Complex factor analysis, separate intelligences, some better than others
One Vs. Many
argument of intelligences, Spearman says 1, Thurstone says many
Gc
crystalized intelligence by Cattell, sum of everything one has learned, mental skills, education, relationships etc.
Gf
General fluid intelligence is the raw mental power, potential for intelligence
Additional Factors for Cattell & Horn’s Model
short and long term memory, visual and auditory skills, processing speed on easy tasks, decision speed (problem solving tasks) and quantitative reasoning
Vernon’s Model
Hierarchy, all under g, then split into v:ed (verbal:educational) and then into k:m (spatial:mechanical) and then some of the other skills cluster under these (numbers, psychomotor, reading)
Carroll’s Summary
Three-stratum theory
Three Stratum Theory
g at the top, then Gc and Gf (as well as others, some like Thurstone’s), third level there are more specific abilities
Developmental Theories
1) stages, 2) stages happen in the same order for all people (if not the same time), 3) stages are cumulative and not reversible
Piaget Theory of Cognitive development
4 stages
Sensorimotor
no object permanence, lack of input
birth-2 yrs
Preoperational
use words to symbolize, lacks principles of conservation
2-6 yrs
Concrete Operational
Uses principles of conservation and reversibility
7-12 yrs
Formal Operational
Mature Adult thinking in terms of hypotheses, cause and effect
12+ yrs
Information Processing Model
theory of intelligence that focuses on how people processes what happens, computer processing
Biological Models
brain functioning, as the basis for understanding human intelligence
Assimilation
putting things into your schemas, all four legged animals are dogs to kids
Accommodation
changing your perception to fit reality, horses aren’t dogs
Howard Gardener
Theory of Multiple Intelligences, at least 8 intelligences
Gardener’s 8 Multiple Intelligences
Spatial Linguistic Logical-mathematical Bodily-kinesthetic Musical Interpersonal Intrapersonal Naturalistic
3 Things to Remember about Group Differences
1) Distributions mostly overlap, even if averages are slightly different, 2) a difference doesn’t tell us why, 3) difference are always changing, and may not last forever
Differences in Intelligence by Sex
minimal in terms of total scores, some difference in verbal and spatial skills. More males tend to perform very high or very low
Group Age differences
steep increase: 0-12
Maximum: 16-20
Level: 25-60
Period of Decline: 60+