Chapter 13: Testing in Schools Flashcards
What are three things that preschool assessments determine?
Readiness of the child to enter school
Identification/diagnosis of conditions that my present special educational challenges
Assessment of the child’s abilities
What are the three objectives of preschool assessments?
- Screening of children at risk
- Diagnostic assessment to determine the presence of absence of a particular condition, often for the purpose of establishing eligibility for placement in special programs, as well as to formulate intervention
and treatment recommendations - Program evaluation – where the test results are used to document and evaluation specific programs
Review brief history
Public Law (PL) 94-142 – mandated the professional evaluation of children age 3 and older suspected of having physical or mental disabilities in order to determine their special educational needs (mid
1970s)
PL 99-457 – obligation extended downward to birth (1986)
* Starting with the 1990-1991 school year, all disabled children from ages 3 to 5 will be provided with free, appropriate education.
PL-105-17 – gave greater attention to diversity issues (1997)
* Infants and toddlers with disabilities must receive services in the home or other natural settings, and continued in preschool programs
In 1999, ADHD was officially listed under “Otherwise Health Impaired” as a disabling condition that can qualify a child for special services
What are some issues with testing preschoolers?
- Language and conceptual skills emerge but are not
advanced enough to be assessed using traditional tests. - The attention span of a preschooler is short.
- Motivation in the child may vary from one test session to the next.
Curriculum Based Assessment
- Observe and record a student’s performance on a set of activities
- There are several different ways to accomplish this
- Some methods take a general approach
o Determine the learning components of a learning
construct
o Select a wide variety of tasks or items to assess
the learning components
o Example:
Spelling is the learning construct
Select words from the list of words students are expected during the course of the school year
Assess how well the student has mastered the skill
Some methods take a more specific approach
o Determine whether a student has attained proficiency with one
particular aspect of the curriculum
o Breaks down the global learning outcomes into a set of specific
subskills
o Example:
Spelling is the learning construct
Select the specific skill of words ending with a silent e
Ask the students to spell words that fall within the specific skill
Assess how well the student has mastered the specific skill
5 Assessment Approaches
- Interview parents and teachers (most widely used)
- Behavioral Observational (most valuable)
- Rating scales completed by the parent and/or teacher (quick and inexpensive)
- Projective techniques (limited use with young children)
- Traditional tests normed on children of the same age
Psychometric tests
Piagetian-Based Scales
Should be used for classification and placement decisions, but not to determine a child’s level of cognitive development and ability
- Piaget’s stages of development
Measures a child’s cognitive levels in accord with - Comprehensive Developmental Assessment Tools
Checklists based on normal child development - Process-Oriented Assessment Approaches
Assumes identification of cognitive strategies is
necessary to understand cognitive performance
Other Issues
- How equivalent are the instruments?
- Reliability is low
- Need other test forms for children with special requirements
- Assess “readiness” in terms of social and emotional skills
California Achievement Tests (CAT)
Assessment in the Primary Grades
- Determine which specific skills have mastered or not
- Compare students’ performance with that of a national sample
Interrelationship of subtests
* High intercorrelations between subtests (.50 - .80)
Reliability is satisfactory
Only content validity has been focused on
Fall and Spring norms are provided
California Achievement Tests (CAT): Locator Tests
Use of locator tests – gives the student a short 20-item version of vocabulary and math items to see what level a student is at, then can administer the full test
Scores are comparable across grades
To minimize boredom or discouragement it is a “locator test” meaning the child’s performance can be used as a guideline on which level to administer
California Achievement Tests (CAT): Lake Wobegon Effect
schools are reporting that there students are
scoring above-average
Schools are comparing results with old norms
Norms rise over time because students learn more
HIGH SCHOOL: Social Competence
- Development of the scale
- Reliability was satisfactory
- Validity
- Compared scores with popularity ratings
Cavell and Kelley (1994) developed a self-report measure of social competence for adolescents
Students described situations that did not go well
yielded 157 problem situations
Rated the situations on a 5-point Likert scale for frequency and difficulty
Factor analysis yielded 7 labels:
1. Keep friends (friends shares your secret)
2. Problem Behavior (want to drink alcohol)
3. Siblings (embarrassed you)
4. School (mean teachers)
5. Parents (nosy)
6. Work (dislike but need it)
7. Make Friends (peers dislike)
Each are scored on frequency and difficulty
Tests of General Educational Development (GED): five types of tests
High school equivalency test
Five tests
1. Writing skills
2. Social studies
3. Science
4. Reading skills
5. Mathematics
Tests of General Educational Development (GED)
Reliability and Validity
Reliability is satisfactory
Content validity is built in
Concurrent validity
* Test is also give to high school students which shows it is more stringent
* Fairly high intercorrelations between tests
Predictive validity
* Difficult to assess
* Graduates do report increases in pay, acceptance into training programs, and other benefits
National Assessment of Educational Progress (NAEP)
Designed to measure the distribution of proficiencies in national student populations
Wide range of school subjects
Given on a variable time table
The NAEP is a Congressionally mandated survey of American students’ educational achievement; it was first conducted in 1969, annually through 1980, and biennially since then.
The goal of the NAEP is to estimate educational achievement and changes in that achievement over time, for American students of specific ages, gender, and demo- graphic characteristics
The NAEP is designed to measure the distribution of proficiencies in student populations.
covers a wide range of school subject areas such as reading, mathematics, writing, science, social studies, music, and computer competence
Essay versus Multiple choice
Looking at AP exams, results showed that scores on the essay questions were less reliable and didn’t correlate very highly with the multiple choice questions
From a psychometric and a practical point of view, multiple-choice items are preferable because:
1. easy to score by machine
2. do not involve the measurement error created by subjective scoring
3. are more reliable and more amenable to statistical analyses
4. well-written items can assess the more complicated and desirable aspects of cognitive functioning.
The K- R 20 reliability for the American History exam, multiple-choice section was .90 and .89
The correlations between the multiple-choice and essay sections were .48 and .53
In other words, scores on the essay sections are not reliable and do not correlate highly with the scores on the multiple-choice sections.
Scholastic Aptitude Test (SAT)
Two content areas
* Verbal (antonyms, analogies, sentence completion, and reading
comprehension
* Quantitative (unidimensional)
Outcome oriented test – total score based on correct answers
Always being revised
Test sophistication
* Items and directions are easy to read
* Teach everyone test taking strategies
Scholastic Aptitude Test (SAT): Gender gap
- Men obtain higher scores than women
- Women obtain higher freshman year college GPAs
- Men typically do better on the quantitative section
- Women typically do better on the verbal section
Scholastic Aptitude Test (SAT): Minority bias
Problem with false negatives
High school GPA is a better predictor by itself than SAT scores
Use a regression equation developed for Mexican-Americans to predictive validity
Is the SAT redundant with HS GPA or HS rank?
Scholastic Aptitude Test (SAT)
Coaching – can impact the validity of a test if it changes scores
Results haven’t found much evidence
Criterion problem – is first-year grades what we want to predict?
Reliability is satisfactory
Validity generalization may be used
Is SAT score a measure of family income?
Is the SAT fair?
Graduate Record Examination
Global measures of verbal, quantitative, and analytical reasoning abilities
Develop new items and add them as trial questions in administered tests
Reliability is satisfactory
Studies report low levels of validity
Subject tests do a better job in some departments
The GRE does an okay job predicting graduate GPA, but undergraduate GPA does a better job
Criterion problem – how do you operationally define graduate school success?
Range restriction
* Some studies have found higher validity coefficients when correlating the GRE scores with graduate school performance when the GRE wasn’t used to select applicants
Results have been mixed when using the GRE as a predictor of success in psychology
Is graduate school GPA an appropriate criterion?
We need to define what we are trying to predict
Tests for Licensure and Certification
Licensure – government gives permission to an individual to engage in an occupation
* In order to obtain a license, an individual must meet a minimal level of competency
* Usually there are rules about what a licensed practitioner may do
Certification – an individual meets the qualifications set by a credentialing agency
* May use a designated title
Tests can be developed nationally or locally
Formats
* Multiple choice
* Work samples
Purpose
* To protect the public’s welfare and safety
* To assess a minimal level of competency
Validity
* Face validity is usually built in
* Criterion validity is more difficult because there is a diverse set of criteria
Tests for Licensure and Certification: Cutoff scores
- Have to determine what the minimal level of competency is
- Should be consistent with a job analysis
Methods
* Human resources planning approach takes into account the following information to determine how applicants needed
* Projected personnel needs
* Past history of proportion of offers accepted
* Distribution of applicant test scores
* Based on Applicants test scores
Criterion-referenced
* Experts provide judgments
* Angoff method – minimum raw score for passing
* Ebel procedure – judges rate the relative importance of each Item
* Nedelsky method – identify those distractors that a
”minimally competent” person would recognize as incorrect
* Contrasted-groups method