week 8 educational testing Flashcards
Educational assessment and testing should have at its heart the same purpose as other psychological assessment (e.g., mental health/clinical)
Measure and observe behaviour
Gauge student ability and competencies (fair, objective)
Diagnose (if conducting an assessment)
Guide treatment/educational interventions
Main purposes of Educational Testing
Masters (2011) argued only one fundamental purpose
To establish where learners are in their progress at the time of the assessment
Information can be interpreted and used in a variety of ways:
by reference to the performances of other students nationally or internationally (normative approach)
by reference to achievement expectations or standards (criterion approach)
by reference to past performances (ipsative approach)
Uses
Screening
- -Wide-scale, mass standardized testing
- -Identify individuals needing assistance or diagnostic assessment(e.g. literacy proficiency)
Diagnosis/Service Eligibility
- -Formal assessment of strengths and weaknesses of individuals
- -Diagnosis: specific learning disorder, intellectual impairment, or giftedness
Program Planning
–Instruction, intervention
USES:
Progress monitoring
Through lesson, module-course, year, or intervention(is it working)
Frequent monitoring may be useful (for student, for educator)
USES:
Evaluate outcomes
For individual: after special education, learning assistance, remediation, etc.
At end of course, grade, class (how is my child progressing)
For whole school, district and/or country after change in curricula, policy
Types
Formative Assessment: achievement during instruction
=Guides further instruction
=Role in fostering motivation and learning
=e.g questions in class, practice test, take-home exam, assignment
Summative Assessment: achievement after instruction completed
=e.g. formal exam, final grade in course
Considerations in Classroom Testing
Criticisms of one-off, end-of-course examinations
–Designed to judge and compare students on the amount of course content they have learnt
Promote ‘performance’ rather than ‘learning’ culture
“learning” driven more by external pressure for results than by curiosity and intrinsic motivation
Encourage cramming
Educational Assessment versus Testing
Generally a psychological assessment is by referral, to address a specific question (e.g. why is my child struggling at school?)
VS. educational testing, which is more widely adopted. Standardized tests used to gauge student ability/proficiency
More objective than an individual teachers’ assessment of a written task
Provides data for education and policy makers
Educational Assessment versus Testing
Standardized testing is not without it critics (p 318).
Too much emphasis on test performance (snapshot of one point in time),
School/teacher concerns can lead to teaching to the test, esp if employment/promotion may be tied to test outcomes.
For parents, see it as undue pressure on children; may underestimate their ability (test anxiety).
Standardized testing is not without it critics (p 318).
principals and school data
Principals can be forced to account when their school is underperforming (as assessed through standardized tests)
In Australia, school data is made publicly available (ranking) underperforming school will be seen as less attractive (enrolment drops, and good students head elsewhere)
Assessment for policy decisionsLarge-scale International Testing
Standardized assessment can also be used to inform policy decisions about curriculum emphasis, teaching methods, and funding decisions.
When we want to know how our students are performing, we can look to large-scale international student assessments
For younger students, we have the Trends in Mathematics and Science Study (TIMSS) : Grade 4 and Grade 8 studentsand its equivalent for reading Progress in International Reading Literacy Study (PIRLS)
Nearing the endpoint of education, we have the Programme for International Student Assessment (PISA) : measuring age 15
Large-scale International Testing
Programme for International Student Assessment (PISA)
Conducted every 3 years since 2000, with 15 year olds (near the end of compulsory schooling)
All OECD nations participate in PISA, but in recent waves an increasing number of “partner” nations are included
Including China, Hong Kong, Taiwan, Indonesia, Thailand from Asia
Including Russia, UAE, Qatar from Middle-East, andColombia, Brazil, Peru, Chile from South America
In 2015, over 540,000 students drawn from 72 different countries
Each country must sample at least 5,000 students.
https://www.youtube.com/watch?v=q1I9tuScLUA#t=34
Programme for International Student Assessment (PISA)
Under PISA, students are assessed in three core areas
Reading literacy
Mathematical literacy
Scientific literacy
Goes far beyond other standardized tests to measure deep learning and problem solving skills.
For example, doesn’t just test basic reading skills but also comprehension, reasoning skills, analysis of texts.
Mathematics and science tests are problem-solving focused, rather than rote memory of facts/equations.
How are students tested?
Students sit a 2 hour test of reading, maths and science
This is a computerized test, with a mixture of multiple choice (selected format) and short answer (constructed response) questions.
Uses computer adapted testing to reduce testing time, allow for broader content
Scoring for the original PISA 2000 was normed so that reading, mathematics and science would have a mean of 500, and a SD of 100.
Similar to an IQ test, in that transformed for a mean of 100, SD of 15
Programme for International Student Assessment (PISA)
How do Australian students perform?
reading: Significantly higher than OECD average, but declining since 2003
math: Started higher than OECD average, but declining since 2003
science: Significantly higher than OECD average, with a slight decline that is now increasing
Results in context: OECD average has been steadily declining (so a general trend). For example OECD math is now 493, so we are still (just) above average! Non-OECD countries steadily rising (so our relative ranking is dropping compared to Asian neighbours – Singapore 564, China 544, Japan 532 for mathematics)
Programme for International Student Assessment (PISA)
What are the practical implications of studies like PISA?
PISA analyses help shape public policy on educational practices and funding for schools.
In Australia, we’ve seen a greater emphasis placed on science, technology, engineering and mathematics (STEM)
e.g. considering adding computer programming to curriculum, raising proficiency standard of maths and science, and hiring of specialists.
Every time a PISA wave is released, puts pressure on state and federal education ministers to increase funding
Large-Scale National Testing
National Assessment Program- Literacy and Numeracy NAPLAN
Annually for all children in Years 3, 5, 7, and 9
Assessment of reading, writing, language conventions (spelling, grammar, punctuation), and numeracy
Does not test curriculum content
Tests skills in literacy and numeracy that are developed over time through the school curriculum
http://www.nap.edu.au/naplan-understanding-scale.html
Formative or summative assessment?
What kind of norming is this?
NAPLAN Sample Items
see slide 20
NAPLAN Online Testing
https://www.nap.edu.au/online-assessment
From 2018, NAPLAN is delivered online and is a “tailored test”, i.e., using computerised adaptive testing (CAT)
Tailored testing: https://youtu.be/oGFseJAM3Ew
Computerised Adaptive Testing (CAT)
responses are?
next item offered does what?
traditional tests do vs CAT?
reduces testing time how?
Responses continuously monitored, estimate of trait / characteristic continually refined
Next item offered tailored to give maximal additional information; i.e., maximises discrimination
Traditional graded tests have items from very easy to very hard that everyone takes in the same order until they get too many in a row wrong
Reduces testing time by adapting to each individual and delivering appropriately graded items
See Chapter 8 pp. 243-246 for further information
Challenges of CAT NAPLAN
Relies on all items being unidimensional, i.e., same underlying construct
Different test-takers get different items tailored to their level
Requires 100s of items initially tested on 1000s of examinees to determine the item characteristic curves (pp. 253-255)
–i.e., Item Response Theory (IRT) approach (pp. 166-7)
–Makes it initially expensive and time-consuming to develop
–3- 4 year period in converting and trialling NAPLAN conversion
Benefits of NAPLAN CAT
More precise measurement of student ability
–Greater differentiation by using a wider range of question difficulty, without adding to the length of the test for each student
Greater test-taker engagement
- -Less frustration at lower ability end
- -Less boredom at higher ability end
Potential to reduce anxiety as challenge better tailored to ability
Having a larger initial item pool, means wider range of aspects of the curriculum can be tested