Assessment Flashcards
Practicality
Time-efficient
Not excessively expensive
★ Reliability
No errors in scoring
Consistent and dependable : a reliable test should yield similar results.
Subjectivity
Inter-rater reliability
Two Ts evaluate by using the same rating scale.
Failure stems from lack of scoring criteria.
Subjectivity of the raters
Subjectivity doesn’t enter into the scoring process.
Intra-rater reliability이 violate되는 이유와 solution
Violation of such reliability can occur in case of unclear scoring criteria, fatigue, bias..
*soultion: careful specification of an analytic scoring instrument can increase both inter- and intra-rater reliability.
test reliability
items that have more than one correct anwer
student-related reliability
temporary illness, fatigue, illness
Validity
Test measures exactly what it is supposed to measure.
Authenticity
lg is natural, contexualized items,
includes meaningful, relevant, interesting topics
stimulates real-world tasks
provides some thematic organization to items through episode.
eg) reading passages selected from real-world sources that test-takers are likely to encounter/
listening comprehension sections feature natural lg with hesitations, white noise, and interruptions.
Topics and situations are interesting and relevant to my life.
Tasks replicates, or clearly approximate, real-world tasks.
Washback
formative
Give learners feedback that enhances their lg development.
How test influences both teaching and learning
Ts can provide information that washes back to Ss in the form of useful dialogues of strengths and weaknesses.
I expected the teacher to go over the test and give “advice” on what I should focus on in the near future.
No” feedback or comments” from the teacher were given.
washback 높이려면?
to comment generously and specifically on test performance.
“comments and feedback”
Letter grades and numerical scores give no information of intrinsic interst to the S.
Formative tests, by definition, provide washback in the form of information to the learner on progress and goals.
Informal assessment: T provides interactive feedback ->washback 높아져
Formal assement: T provides information on Ss’ progress toward goals -> washback 높아져
Criterion validity 정의와 두가지 종류 예시
하나의 새로운 시험을 기존 시험과 비교해서 타당성을 측정 : The extent to which the criterion of the test has actually been reached.
1) Predictive validity: e.g.) @ placement tests, admissions assessment batteries acheivement tests designed to determine Ss’ readiness to move on to another unit.
2) Concurrent validity: eg) high score -> actually proficiency in the lg.
Formative test
Formative tests, by definition, provide washback in the form of information to the learner on progress and goals.
Evaluationg Ss in the progress of forming their competencies and skills. The delivery (by the T) and internalization
All kinds of informal assessment are formative.
Gather information on the developmental “process” of their speaking process
Assess their performance regularly
Summative test
Measure what Ss have grasped at the end of a course or unit of instruction.
* Evaluate only product not process
Summative test fails to provide crucial info.
(cf. formative test는 정보제공)
One major test at the end of semester
Norm-referenced
목적: to place test-takers along a mathematical continuum in rank order
primary concern: Practicability, realiability, validity
Such tests must have such fixed,predetermined responses.
Use the test results to award scholarships to the top 10%.
Criterion-referenced
The test is criterion-referenced, assessing the extent to which the students achieved the goals of the class.
Primary concern: authenticity, washback
(실생활에서 그 능력 사용한다는 목표. 즉, 시험과 실생활 간 일치정도 authenticity/ feedback측면에서 washback)
Give test-takers feedback in the form of grades.
The distribution of Ss’ scores across a continuum may be of little concern as long as the instrument assesses appropriate objective.
The Ss who get over 10 out of 16 will pass the conversation course.
Test administration reliability
Classroom conditions for the test are equale for all students.
ex) aural comprehension test -> street noise
Content validity
1) The tests assess real course objectives, direct testing
2) It requires test-takers to perform the behavior that is being measured.
Items focus on previously practiced in-class reading skills.
Construct validity
e.g.) conducting an oral interview
major components of oral proficiency: pronunciation, fluency,grammatical accuracy, vocab use, socio-linguistic appropriateness
e.g.) a simple written vocab quiz, covering the content of recent unit -> have Ss correctly define a set of words.
그런데, objective가 communicative use of words라면, writing of definitions certainly failes to match a construct of communicative lg use.
Face validity
Whether the test looks as if it is measuring what it is supposed to measure.
Tests that relate to their course work./ familiar task/ directions are clear
The printing was too small. had to read five pages in one hour.
Lots of tasks were unfamiliar
I’ve never done those kinds of tasks in class.
material that she had not dealth with in class
It seemed like a writing test rather than a listening test.
The exam “look like” one that high school Ss normally take.
needs analysis (needs assessment)
process of assessing the needs of Ss
Before designing course, it is necessary to make decisions about what would be taught and how it would be taught.
survey and interview
Info about what my Ss needed to learn or change, their learning styles, interestes, proficiency levels etc.
Based on the info, I decided on the course objectives, contents and activities.
a proficiency test/ standardized test
not linked to any particular textbook or specific course of study. (not limited to single skill in the lg. Rather, it tests “overall proficiency”.)
Summative and norm-referenced : provide results in the form of a single score, measure performance agaisnt a norm (w/ equated scores and percentile rank)
Not provide diagnositc feedback
summative feedback
Ss will receive a total score for the reading section
constructed-respons item
-
Item Response Distribution
- a certain wrong alternative was chosen by a greater number of high group students than low group students.
- more students chose the wrong alternative than those who chose the correct answer.
- A certain wrong alternative did not work as a distracter.
the reliability of the test ***
Item18 deteriorates the internal consistency of the test.
low ability group Ss가 high ability group Ss보다 더 정답을 많이 맞추었을 경우
Item Facility
Item Difficulty
The extent to which an item is easy or difficult for the proposed group of test-takers
정답을 고른 학생의 비율 보여줌
Mr.Park divided the number of Ss who correctly answered a particular item by the total number of Ss who took the test.
Item Discrimination
The extent to which an item differntiates btw high- and low- ability test-takers
Item 20 shows the highest discrimination among the five items.
Item 2 does not distinguish the upper level Ss from the lower level Ss.
예) 어떤 문항에서 잘하는애와 못하는애가 같은점수 받았다 -> have poor ID, because it didn’t discrminate btw the two groups. INTERNAL CONSISTENCY
many upper group students incorrectly chose option C. (Item 2 does not distinguish the upper level Ss from the lower level Ss. )