week 8 educational testing Flashcards
Educational assessment and testing should have at its heart the same purpose as other psychological assessment (e.g., mental health/clinical)
Measure and observe behaviour
Gauge student ability and competencies (fair, objective)
Diagnose (if conducting an assessment)
Guide treatment/educational interventions
Main purposes of Educational Testing
Masters (2011) argued only one fundamental purpose
To establish where learners are in their progress at the time of the assessment
Information can be interpreted and used in a variety of ways:
by reference to the performances of other students nationally or internationally (normative approach)
by reference to achievement expectations or standards (criterion approach)
by reference to past performances (ipsative approach)
Uses
Screening
- -Wide-scale, mass standardized testing
- -Identify individuals needing assistance or diagnostic assessment(e.g. literacy proficiency)
Diagnosis/Service Eligibility
- -Formal assessment of strengths and weaknesses of individuals
- -Diagnosis: specific learning disorder, intellectual impairment, or giftedness
Program Planning
–Instruction, intervention
USES:
Progress monitoring
Through lesson, module-course, year, or intervention(is it working)
Frequent monitoring may be useful (for student, for educator)
USES:
Evaluate outcomes
For individual: after special education, learning assistance, remediation, etc.
At end of course, grade, class (how is my child progressing)
For whole school, district and/or country after change in curricula, policy
Types
Formative Assessment: achievement during instruction
=Guides further instruction
=Role in fostering motivation and learning
=e.g questions in class, practice test, take-home exam, assignment
Summative Assessment: achievement after instruction completed
=e.g. formal exam, final grade in course
Considerations in Classroom Testing
Criticisms of one-off, end-of-course examinations
–Designed to judge and compare students on the amount of course content they have learnt
Promote ‘performance’ rather than ‘learning’ culture
“learning” driven more by external pressure for results than by curiosity and intrinsic motivation
Encourage cramming
Educational Assessment versus Testing
Generally a psychological assessment is by referral, to address a specific question (e.g. why is my child struggling at school?)
VS. educational testing, which is more widely adopted. Standardized tests used to gauge student ability/proficiency
More objective than an individual teachers’ assessment of a written task
Provides data for education and policy makers
Educational Assessment versus Testing
Standardized testing is not without it critics (p 318).
Too much emphasis on test performance (snapshot of one point in time),
School/teacher concerns can lead to teaching to the test, esp if employment/promotion may be tied to test outcomes.
For parents, see it as undue pressure on children; may underestimate their ability (test anxiety).
Standardized testing is not without it critics (p 318).
principals and school data
Principals can be forced to account when their school is underperforming (as assessed through standardized tests)
In Australia, school data is made publicly available (ranking) underperforming school will be seen as less attractive (enrolment drops, and good students head elsewhere)
Assessment for policy decisionsLarge-scale International Testing
Standardized assessment can also be used to inform policy decisions about curriculum emphasis, teaching methods, and funding decisions.
When we want to know how our students are performing, we can look to large-scale international student assessments
For younger students, we have the Trends in Mathematics and Science Study (TIMSS) : Grade 4 and Grade 8 studentsand its equivalent for reading Progress in International Reading Literacy Study (PIRLS)
Nearing the endpoint of education, we have the Programme for International Student Assessment (PISA) : measuring age 15
Large-scale International Testing
Programme for International Student Assessment (PISA)
Conducted every 3 years since 2000, with 15 year olds (near the end of compulsory schooling)
All OECD nations participate in PISA, but in recent waves an increasing number of “partner” nations are included
Including China, Hong Kong, Taiwan, Indonesia, Thailand from Asia
Including Russia, UAE, Qatar from Middle-East, andColombia, Brazil, Peru, Chile from South America
In 2015, over 540,000 students drawn from 72 different countries
Each country must sample at least 5,000 students.
https://www.youtube.com/watch?v=q1I9tuScLUA#t=34
Programme for International Student Assessment (PISA)
Under PISA, students are assessed in three core areas
Reading literacy
Mathematical literacy
Scientific literacy
Goes far beyond other standardized tests to measure deep learning and problem solving skills.
For example, doesn’t just test basic reading skills but also comprehension, reasoning skills, analysis of texts.
Mathematics and science tests are problem-solving focused, rather than rote memory of facts/equations.
How are students tested?
Students sit a 2 hour test of reading, maths and science
This is a computerized test, with a mixture of multiple choice (selected format) and short answer (constructed response) questions.
Uses computer adapted testing to reduce testing time, allow for broader content
Scoring for the original PISA 2000 was normed so that reading, mathematics and science would have a mean of 500, and a SD of 100.
Similar to an IQ test, in that transformed for a mean of 100, SD of 15
Programme for International Student Assessment (PISA)
How do Australian students perform?
reading: Significantly higher than OECD average, but declining since 2003
math: Started higher than OECD average, but declining since 2003
science: Significantly higher than OECD average, with a slight decline that is now increasing
Results in context: OECD average has been steadily declining (so a general trend). For example OECD math is now 493, so we are still (just) above average! Non-OECD countries steadily rising (so our relative ranking is dropping compared to Asian neighbours – Singapore 564, China 544, Japan 532 for mathematics)
Programme for International Student Assessment (PISA)
What are the practical implications of studies like PISA?
PISA analyses help shape public policy on educational practices and funding for schools.
In Australia, we’ve seen a greater emphasis placed on science, technology, engineering and mathematics (STEM)
e.g. considering adding computer programming to curriculum, raising proficiency standard of maths and science, and hiring of specialists.
Every time a PISA wave is released, puts pressure on state and federal education ministers to increase funding
Large-Scale National Testing
National Assessment Program- Literacy and Numeracy NAPLAN
Annually for all children in Years 3, 5, 7, and 9
Assessment of reading, writing, language conventions (spelling, grammar, punctuation), and numeracy
Does not test curriculum content
Tests skills in literacy and numeracy that are developed over time through the school curriculum
http://www.nap.edu.au/naplan-understanding-scale.html
Formative or summative assessment?
What kind of norming is this?
NAPLAN Sample Items
see slide 20
NAPLAN Online Testing
https://www.nap.edu.au/online-assessment
From 2018, NAPLAN is delivered online and is a “tailored test”, i.e., using computerised adaptive testing (CAT)
Tailored testing: https://youtu.be/oGFseJAM3Ew
Computerised Adaptive Testing (CAT)
responses are?
next item offered does what?
traditional tests do vs CAT?
reduces testing time how?
Responses continuously monitored, estimate of trait / characteristic continually refined
Next item offered tailored to give maximal additional information; i.e., maximises discrimination
Traditional graded tests have items from very easy to very hard that everyone takes in the same order until they get too many in a row wrong
Reduces testing time by adapting to each individual and delivering appropriately graded items
See Chapter 8 pp. 243-246 for further information
Challenges of CAT NAPLAN
Relies on all items being unidimensional, i.e., same underlying construct
Different test-takers get different items tailored to their level
Requires 100s of items initially tested on 1000s of examinees to determine the item characteristic curves (pp. 253-255)
–i.e., Item Response Theory (IRT) approach (pp. 166-7)
–Makes it initially expensive and time-consuming to develop
–3- 4 year period in converting and trialling NAPLAN conversion
Benefits of NAPLAN CAT
More precise measurement of student ability
–Greater differentiation by using a wider range of question difficulty, without adding to the length of the test for each student
Greater test-taker engagement
- -Less frustration at lower ability end
- -Less boredom at higher ability end
Potential to reduce anxiety as challenge better tailored to ability
Having a larger initial item pool, means wider range of aspects of the curriculum can be tested
Assessment for Credentialing Example: National Psychology Exam
Introduction from July 2014
Pass required for registration
3.5 hours with 150 MCQs
see slide 25
“Assessment” Domain of National Psychology Exam
Understanding of issues in test selection, use, interpretation, acceptability and appropriateness
-including test reliability, validity, utility, and standardisation
Ability to administer, score, interpret and write reports using current editions of psychometric tests (using relevant Australian norms where available).
“Assessment” Domain of National Psychology Exam
Competent in administration, scoring, & interpretation of:
1. WAIS IV (Wechsler Adult Intelligence Scale) 2. WISC IV (Wechsler Intelligence Scale for Children)* 3. PAI, 2007 (Personality Assessment Inventory) 4. DASS (Depression, Anxiety and Stress Scale) 5. K-10 (Kessler-10) 6. SDQ (Strengths and Difficulties Questionnaire)
*note new edition out (WISC-V) not yet updated
Terminology: Aptitude versus Achievement
Aptitude: assessment of future learning potential
Specific aptitude tests:
E.g., Differential Aptitude Tests
–Used for career guidance and school-to-career transition
Specific vocational normative samples for individual subtests
E.g., GAMSAT, UMAT used for selection for medicine and related degrees (e.g., dentistry)
Aptitude tests tend to focus on informal learning or life experience (as opposed to achievement tests which focus on learning acquired through formal instruction)
General Aptitude Tests
General aptitude tests = intelligence tests
- Tap fluid abilities more than crystallised
- Used to assess intellectual impairment (II) & giftedness
- -Diagnosis of II also requires significant interference with adaptive functioning
Achievement Tests
Assessment of past learning
- Taps crystallised abilities more than fluid
- Used to assess and diagnose learning disorders
- –Often combined with aptitude (e.g., IQ) assessment using Patterns of Strengths and Weaknesses (PSW) assessment
Common Aptitude Tests
Intelligence tests covered last week
WPPSI
WISC
Stanford-Binet
Both Aptitude and Achievement: Woodcock-Johnson-IV
- Cognitive Abilities (Aptitude)
- Achievement (Achievement)
- Oral Language (useful for reading assessments)
Woodcock Johnson- IV Cognitive Abilities (WJ-IV COG)
More closely aligned with C-H-C model than Wechsler tests
Provides scores on broad stratum abilities, based on subtest scores of underlying narrow abilities
Provides more comprehensive assessment than WISC-V
10 subtests in standard battery, 18 subtests in extended battery
Approx. 5 mins per subtest (approx. 60 mins for standard battery)
Scoring and norms:
M = 100, SD = 15
US norms: 2 – 99 years in US (not all subtests start at 2 years)
Australian norms: 5 – 99 years
General Achievement Tests
Used for assessment of learning disabilities
To be useful need
–Good psychometric properties
—Reliabilities > .90 for individual diagnosis
Alignment with the DSM-V (or ICD-10) diagnostic criteria
Alignment with theoretical models of achievement in those domains, e.g., theories of reading and reading development, theories of reading disability
Norms and Achievement Tests
Ideal if co-normed with general aptitude tests
Wechsler Individual Achievement Test (WIAT-III)
-Co-normed with WISC-V
Woodcock Johnson IV- Test of Achievement (WJ-IV-ACH)
–Co-normed with WJ-IV-COG
Both WIAT-III and WJ-IV-Ach have Australian norms from 5 through to adulthood
Woodcock Johnson IV- Test of Achievement
11 Subtests in standard battery across four domains (Reading, Mathematics & Writing)
20 Subtests in extended battery
Separate test WJ-IV that includes oral language (e.g., picture vocabulary)
Can provide meaningful information to assist with diagnosis, eligibility for services, placement, and intervention decisions
For university students/ adults: use to better understand the achievement levels of college students/adults, can be used in an assessment program for students with learning difficulties
Wechsler Individual Achievement Test- WIAT-III
16 Subtests across four domains (Reading, Mathematics, Written Language & Oral Language)
Total achievement
Oral Language
Reading: total reading, basic reading, reading comprehension and fluency
Written Expression
Mathematics: Mathematics & maths fluency
WIAT-III can provide meaningful information to assist with diagnosis, eligibility for services, placement, and intervention decisions
For university students/ adults: use to better understand the achievement levels of college students/adults, can be used in an assessment program for students with learning difficulties
WIAT-III Subtests and Composites
Reading:
Word Reading:
–*assess pre- reading
(phonological awareness) and decoding skills
Pseudoword Decoding
*assess the ability to apply phonetic decoding skills
Reading Comprehension
*understanding of what is read
Oral Reading Fluency
* fluency and prosody
WIAT-III Subtests and Composites
Mathematics
Numerical Operations
*evaluate the ability to identify and write numbers
Problem Solving
*assess the ability to reason mathematically
Fluency
* fluency in solving addition, subtraction, and multiplication
WIAT-III Subtests and Composites
Written Language
Spelling
* evaluate the ability to spell
Alphabet Writing Fluency
* ability write letters of the alphabet
Sentence Composition
* Measure the examinee’s writing skills
Essay Composition
* Measure the examinee’s writing skills at all levels of language
WIAT-III Subtests and Composites
Oral Language
Listening Comprehension
* measure the ability to listen to details
Oral Expression
* reflect a broad range of oral language activities
Diagnosis & Verification
Important purpose of educational assessment
Entry to gifted and talented programs, extension
Entry to schools for specific needs
E.g., Special Education (ID), Glenleighden (DLD), Sycamore School (ASD)
Access to additional supports
Verification for Educational Adjustment Program (http://education.qld.gov.au/students/disabilities/adjustment/verification/index.html)
Intellectual impairment
Educational accommodations, adjustments, accelerations in mainstream settings
What is a gifted student?
as per:
http://education.qld.gov.au/parents/school-life/support-services/gifted.html
Gifted students are those whose potential is distinctly above average in one or more of the following domains of human ability: intellectual, creative, social and physical. Giftedness designates the possession and the use of outstanding natural abilities, called aptitudes, in at least one ability domain, to a degree that places an individual at least among the top 10% of age peers in the school.
what is a talented student?
Talented students are those whose skills are above average in one or more areas of performance. Talent designates the outstanding mastery of abilities over a significant period of time. These are called competencies (knowledge and skills). Outstanding mastery is evident in at least one field of human activity to a degree that places an individual at least among the top 10% of age peers in the school who are or have been active in that field.
Identification of Gifted & Talented
http://education.qld.gov.au/curriculum/framework/p-12/docs/supporting-info-gifted-talented.pdf
- School-based screening & assessment
- -screening tests
- -standardised tests
- -teacher created tests
- -NAPLAN - Checklists
- -Completed by parents, teachers, peers and the students themselves (e.g., Sayler’s checklist of characteristics) - Achievement Tests
- -Standardised tests as in school-based screening (Step 1) but at a level above the current grade of the student
- -Achievement tests: e.g., WIAT, WJ-IV - Aptitude Tests
IQ or cognitive assessment to provide information on a student’s potential to perform well academically
- -establish level of giftedness and talent for appropriate provision
- -determine suitability for accelerated or special placement.
Intellectual Impairment (DSM-5)
A: Deficits in intellectual functions, such as reasoning, problem solving, planning, abstract thinking, judgment, academic learning, and learning from experience, confirmed by both clinical assessment and individualised, standardised intelligence testing
Usually use two SDs below the mean as a cut-off (i.e. 70 on typical IQ tests)
B. Deficits in adaptive functioning….across multiple environments, such as home, school, work, and community.
Assessment thus requires assessment of intellectual functioning (e.g., WISC-V) and adaptive behaviour across contexts (e.g., parent/teacher questionnaire measures such as VABS, ABAS)
Specific Learning Disorder (DSM-5)
Persistent difficulties with learning key academic skills e.g. reading, spelling, mathematics
Difficulties are substantially and quantifiably below those expected for the individual’s chronological age
Difficulties are not better accounted for by intellectual disabilities
Not associated with normal developmental milestones and brain maturation
Onset generally within the years of formal schooling, and disrupts the normal pattern of acquiring skills
Difficulties are not transitory
Not associated with lack of opportunity or inadequate instruction
Persistence of > 6 months despite evidence-based intervention
DSM-V Specific Descriptors for Learning Disorders
Reading:
- Inaccurate/ slow & effortful word reading
- Difficulty understanding the meaning of what is read
- Poor spelling
Written Expression:
4. Poor written expression (grammatical or punctuation errors, ideas lack clarity, poor paragraph organization, or excessively poor handwriting).
Mathematics:
- Difficulties remembering number facts
- Inaccurate/ slow arithmetic calculation
- Ineffective/inaccurate mathematical reasoning
Assessment for Learning Disorder
Structured interview
Informant interviews (e.g., teacher)
Cognitive Assessment (e.g., WISC-V) to assess whether difficulties are accounted for by intellectual impairment or low cognitive ability (e.g., bottom 10% although not ID)
Achievement Test of specific areas of difficulty, e.g.:
Weschler Individual Achievement Test (WIAT-III): co-normed with WISC-V
Woodcock Johnson-IV, specific tests for reading, maths, comprehension and expression (co-normed with WJ-IV Achievement)
Intervention Issues
Need to understand how strengths and weaknesses may manifest in the classroom
Draw upon student’s normative or personal strengths to compensate for weaker areas
Weakness may be relative to other areas (personal weakness) or the population (normative weakness)
Modifications: changes in age-appropriate grade level expectations
Accommodations: special teaching and classroom assessment strategies, human supports or individualised equipment, required to enable a student to learn or demonstrate learning.
Accommodations
Instructional: Adjustments to teaching strategies required to enable the student to learn and to progress through the curriculum
E.g., use of voice-to-text software
Environmental: Changes or supports in the physical environment of the classroom or school, or both.
E.g., Quiet study area
Assessment: adjustments to assessment activities to enable student to demonstrate learning
E.g., Extra time
Counselling and Guiding Students: e.g., Assessment of Vocational Interests
Used by career counsellors, guidance officers, psychologists
Holland (1992) hypothesises that interests are more an expression of personality than abilities
And that work environments also have “personalities”
Aim to find a good person-environment fit
Hypothesised good fit = high job satisfaction
Model of Vocational Interests
Holland developed a hexagonal model of 6 related “ideal” types
Realistic, Investigative, artistic, social, enterprising, conventional
Understand the personality of the person and the job based on their profile on these 6 types
The distance between types indicate how theoretically similar they are
Research supports either circular or hexagonal structure to vocational interests
RIASEC Types
Realistic- tend to be materialistic, value tangible assets
- Occupations like trades, business owner, farming
- Around 50% occupations
Investigative- like analysing and solving problems, abstract concepts, do not like business activities
-Occupations like STEM
Artistic- value creativity, nonconformist, don’t like routine
Occupations in fashion, media
Social- like interacting with others, high sense ethics and social responsibility, impractical and don’t like manual labour
Occupations like teaching, counselling, helping professions
Enterprising- strong business orientation, like to organise & persuade others, value political & economic power, don’t like abstract ideas
Occupations like government and industry leadership
Conventional- like routine and structure, dislike ambiguity & vagary
Occupations like accountants, secretaries, clerks
Self-Directed Search (SDS)
Holland developed the SDS to assess a person’s profile on the RIASEC types
- Clients indicate which occupations they are interested in
- Also asks about occupational daydreams (ideal occupations) and perceived competencies and abilities
- Comes with work environment profiles for occupations to allow person-environment matching
Strong Vocational Interest Inventory (SVII)
Holland developed the SDS based on his theory
Strong used empirical approach to develop SVII
-He obtained interest statements from people in various occupations
-RIASEC forms the most abstract level of scoring
-25 basic interests
-211 Occupational Scales
-Considered the best measure of RIASEC types
Example 1
Jane is a 6-year-old girl. Her school has recently queried whether she may be gifted, but have observed difficulties in spelling and anxiety
What tests or assessments would I consider for Jane?
Intellectual functioning:
-WISC
Academic achievement: especially spelling (e.g., WIAT subscale)
Anxiety: e.g., teacher and/or parent checklist (e.g., Spence Anxiety Scale, BASC-III)
Diagnosis of an intellectual impairment typically includes tests of: A)Aptitude and achievement B)Aptitude and adaptive functioning C)Achievement and adaptive functioning D)Achievement and NAPLAN comparison
i think B) but confirm
The Woodcock-Johnson-IV is based MOST on which theory of cognitive abilities?
a) Cattell-Horn
b) Gardener
c) Woodcock and Johnson
d) Cattell-Horn-Carroll (CHC)
D) CHC (I THINK = confirm