Assessment Flashcards

1
Q

Practicality

A

Time-efficient

Not excessively expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

★ Reliability

A

No errors in scoring
Consistent and dependable : a reliable test should yield similar results.
Subjectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inter-rater reliability

A

Two Ts evaluate by using the same rating scale.
Failure stems from lack of scoring criteria.

Subjectivity of the raters

Subjectivity doesn’t enter into the scoring process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Intra-rater reliability이 violate되는 이유와 solution

A

Violation of such reliability can occur in case of unclear scoring criteria, fatigue, bias..

*soultion: careful specification of an analytic scoring instrument can increase both inter- and intra-rater reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

test reliability

A

items that have more than one correct anwer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

student-related reliability

A

temporary illness, fatigue, illness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validity

A

Test measures exactly what it is supposed to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Authenticity

A

lg is natural, contexualized items,
includes meaningful, relevant, interesting topics
stimulates real-world tasks
provides some thematic organization to items through episode.
eg) reading passages selected from real-world sources that test-takers are likely to encounter/
listening comprehension sections feature natural lg with hesitations, white noise, and interruptions.

Topics and situations are interesting and relevant to my life.
Tasks replicates, or clearly approximate, real-world tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Washback

A

formative
Give learners feedback that enhances their lg development.
How test influences both teaching and learning

Ts can provide information that washes back to Ss in the form of useful dialogues of strengths and weaknesses.

I expected the teacher to go over the test and give “advice” on what I should focus on in the near future.

No” feedback or comments” from the teacher were given.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

washback 높이려면?

A

to comment generously and specifically on test performance.
“comments and feedback”
Letter grades and numerical scores give no information of intrinsic interst to the S.

Formative tests, by definition, provide washback in the form of information to the learner on progress and goals.

Informal assessment: T provides interactive feedback ->washback 높아져
Formal assement: T provides information on Ss’ progress toward goals -> washback 높아져

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Criterion validity 정의와 두가지 종류 예시

A

하나의 새로운 시험을 기존 시험과 비교해서 타당성을 측정 : The extent to which the criterion of the test has actually been reached.

1) Predictive validity: e.g.) @ placement tests, admissions assessment batteries acheivement tests designed to determine Ss’ readiness to move on to another unit.
2) Concurrent validity: eg) high score -> actually proficiency in the lg.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formative test

A

Formative tests, by definition, provide washback in the form of information to the learner on progress and goals.

Evaluationg Ss in the progress of forming their competencies and skills.
The delivery (by the T) and internalization 

All kinds of informal assessment are formative.

Gather information on the developmental “process” of their speaking process
Assess their performance regularly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Summative test

A

Measure what Ss have grasped at the end of a course or unit of instruction.
* Evaluate only product not process

Summative test fails to provide crucial info.
(cf. formative test는 정보제공)

One major test at the end of semester

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Norm-referenced

A

목적: to place test-takers along a mathematical continuum in rank order
primary concern: Practicability, realiability, validity

Such tests must have such fixed,predetermined responses.

Use the test results to award scholarships to the top 10%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Criterion-referenced

A

The test is criterion-referenced, assessing the extent to which the students achieved the goals of the class.
Primary concern: authenticity, washback
(실생활에서 그 능력 사용한다는 목표. 즉, 시험과 실생활 간 일치정도 authenticity/ feedback측면에서 washback)

Give test-takers feedback in the form of grades.
The distribution of Ss’ scores across a continuum may be of little concern as long as the instrument assesses appropriate objective.

The Ss who get over 10 out of 16 will pass the conversation course.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test administration reliability

A

Classroom conditions for the test are equale for all students.

ex) aural comprehension test -> street noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Content validity

A

1) The tests assess real course objectives, direct testing
2) It requires test-takers to perform the behavior that is being measured.

Items focus on previously practiced in-class reading skills.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Construct validity

A

e.g.) conducting an oral interview
major components of oral proficiency: pronunciation, fluency,grammatical accuracy, vocab use, socio-linguistic appropriateness

e.g.) a simple written vocab quiz, covering the content of recent unit -> have Ss correctly define a set of words.
그런데, objective가 communicative use of words라면, writing of definitions certainly failes to match a construct of communicative lg use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Face validity

A

Whether the test looks as if it is measuring what it is supposed to measure.

Tests that relate to their course work./ familiar task/ directions are clear

The printing was too small. had to read five pages in one hour.

Lots of tasks were unfamiliar
I’ve never done those kinds of tasks in class.
material that she had not dealth with in class
It seemed like a writing test rather than a listening test.

The exam “look like” one that high school Ss normally take.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

needs analysis (needs assessment)

A

process of assessing the needs of Ss

Before designing course, it is necessary to make decisions about what would be taught and how it would be taught.

survey and interview

Info about what my Ss needed to learn or change, their learning styles, interestes, proficiency levels etc.

Based on the info, I decided on the course objectives, contents and activities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

a proficiency test/ standardized test

A

not linked to any particular textbook or specific course of study. (not limited to single skill in the lg. Rather, it tests “overall proficiency”.)

Summative and norm-referenced : provide results in the form of a single score, measure performance agaisnt a norm (w/ equated scores and percentile rank)

Not provide diagnositc feedback

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

summative feedback

A

Ss will receive a total score for the reading section

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

constructed-respons item

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Item Response Distribution

A
  1. a certain wrong alternative was chosen by a greater number of high group students than low group students.
  2. more students chose the wrong alternative than those who chose the correct answer.
  3. A certain wrong alternative did not work as a distracter.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

the reliability of the test ***

A

Item18 deteriorates the internal consistency of the test.

low ability group Ss가 high ability group Ss보다 더 정답을 많이 맞추었을 경우

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Item Facility

A

Item Difficulty
The extent to which an item is easy or difficult for the proposed group of test-takers
정답을 고른 학생의 비율 보여줌

Mr.Park divided the number of Ss who correctly answered a particular item by the total number of Ss who took the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Item Discrimination

A

The extent to which an item differntiates btw high- and low- ability test-takers

Item 20 shows the highest discrimination among the five items.

Item 2 does not distinguish the upper level Ss from the lower level Ss.

예) 어떤 문항에서 잘하는애와 못하는애가 같은점수 받았다 -> have poor ID, because it didn’t discrminate btw the two groups. INTERNAL CONSISTENCY

many upper group students incorrectly chose option C. (Item 2 does not distinguish the upper level Ss from the lower level Ss. )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Distractor

A

no one from the upper group and lower group chose option B.

Distractor a and b seem to be fulfilling their function of attracting some attention from lower-ability Ss.

29
Q

portfolios

A

collections of Ss work

useful for assessing stuent performance: 1. Ss have ownership over the process of learning, 2. Portfolios allow T to pay attention to Ss’ progress as well as achievement.

30
Q

Alternatives

A
portfolios
conference
Journals
self-assessment/ peer-assessment
observation
31
Q

Alternatives

A
portfolios
performance-based assessment
conference
Journals
self-assessment/ peer-assessment
observation
32
Q

performance-based assessment

A

The T observes the performance

The task is evaluated through “direct observation” by the T.

33
Q

performance-based assessment

A

The T observes the performance
The task is evaluated through “direct observation” by the T.

The task calls for the integration of language skills.

34
Q

analytic rating scales

A

diagnostic information 제공

35
Q

holistic rating scales (holistic scoring method)

A

-

36
Q

discrete point test

A

assessing one point at a time
On the assumption that lg can be broken down into component parts and that those parts can be tested successfully.

e.g.) grammar and vocab items in multiple choice format./ Large scale stnadardized entrance

37
Q

integrative test 종류와 integrative test가 강조하는 것

A

Cloze test
Dictation

emphasizing communication and authenticity / communicative competence

38
Q

Cloze test 종류/ 특징

A

Fixed-ratio cloze: Every nth word is deleted in a text

Rational-deletion cloze: Words are deleted in a text on a rational basis (eg. prepositions, sentence connectors) to assess specified grammatical or rhetorical categories.
Rational deletion이 more washback, expectancy grammar (ability to predict the next item)

특징) integrative+ reading ability 측정하는 indirect testing.

39
Q

Rational deletion cloze

A

specific content words are chosen to be deleted
-> more washback, expectancy grammar (ability to predict the next time.)

scoring is more difficult in rational deletion cloze than c-test.

40
Q

Cloze test scoring method 종류acceptable word method

A

a scoring method that accepts a suitiable,grammatically and rhetorically acceptable word that fits the blank in the original text.
(face validity 높다)

41
Q

C-test 정의 및 특징

A

The second half of every other word is deleted
it has a higher scoring reliability
/ lower validity

42
Q

Cloze test 정의 / Ss가 어떤 competence 사용하나/ 종류

A

an integrative measure not only of reading ability but of other lg abilities

  • Ss use linguistic competence (formal schemata)/ background experience ( content schemata)/ strategic competence

Fixed-ratio deletion
Rational deletion

43
Q

Cloze test scoring method 종류exact word method

A

a scoring method that is limited to accepting the same word found in the original text

44
Q

dictation

A

It taps into grammatical and discourse competence

45
Q

Subjective testing

A

Low reliability/ high validity
Constructed resonse items

e.g.) open-ended response*

46
Q

Objective testing

A

It has predetermined fixed responses
High test reliability, Low validity
Selected resonse

e.g.) T/F, multiple choice items

47
Q

Direct testing

A

It involves the test-taker in accurately the target task.
High content validity

e.g.) Oral presentation, to test performance directly

48
Q

Indirect testing

A

Learners are not performing the task itself but rather a task that is related in some way

49
Q

Achievement tests

A

Limited to particular material and are offered after a course has focused on the obejectives in question
Determine whether the course objectives have been met by the end of a given period instruction

Summative: administrated at the end of a lessen,unit,or term of study
Formative: when offereing feedback about the quality of a learner’s performance

50
Q

Placement tests

A

to place a student into a particular level of a lg curriculum or school
Diagnostic
Formative (correct/incorrect responses provide Ts with useful information on what may or may not be emphasized in the weeks to come)

51
Q

Diagnostic tests

A

To diagnose aspects of a lg that a S needs to develop or that a course should include
-> Should elicit info on what Ss need to work on in the future. Therefore, a diagnostic test will typically offer more detailed, subcategorized information on the learner.

52
Q

Constructed resonse items

A

A type of test item or task that requires test-takers to respond to a series of open-ended questions by wr,sp or doing something rather than choose answers from already-made list.

53
Q

computer adabptive testing

A

computer testing software that adjusts the questions depending on Ss’ performance on previous test items.

54
Q

Alternative tests (Performance-based assessment )

A

it requires Ss to perfrom,create,produce or do s/t.
use real-world contexts.
focus on process as well as products
tap into higher level thinking and problem-solving skills
provide info about both strengths and weaknesses of Ss
involve “an integration of lg skills”

55
Q

Performance-based assessment T의 주의점

A
  • state the overall goal of the performance
  • specify the objectives (crieteria) of the performance in detail
  • prepare Ss for performance in stepwise progress
  • use a reliable evaluation form, checklist.
  • treat performances as opportunities for giving feedback and provide that feedback systematically
  • if possible, utilize self- and peer- assessment judiciously.
56
Q

Rubrics

A

validity ↑, reliablity ↑
A rubric is a device used to evaluate open-ended, oral and written responses of learners
- usually composed of a set of criteria or competencies, each with descriptions of levels of expectation
- some rubrics involve scaling

57
Q

Rubric-based assessment

A

not only were rubrics beneficial for teachers but Ss were also able to better focus their efforts, produce work of higher quality earn better grades, and feel less anxious about assignments.

장) rubrics provide points for Ss to focus on and goals to pursue
단) simplicity (makring a few points on a chart and consider our job is done!) may mask the depth and breadth of a S’s attainment.

58
Q

Portfolios

A

a purposeful collection of Ss’ work that demonstrates their efforts, progress and acheivements.

장점) foster intrinsic motivation, responsibility and ownership

  • promote S-T interaction w/ the T as a facilitator
  • facilitate critical thinking, self-assessment and revision process
  • offer opportunities for collaborative work w/peers
59
Q

포트폴리오 주의점

A

-State objectives clearly
-Give guidelines on what materials to include
(a sample portfoli from a previous Ss can help stimulate some thoughts on what to include)
- Communicate assessment criteria to Ss. (self-assessment : formative
-Provide positive washback - giving final assessments
e.g.) a holistic scoring scale ranging from 1 to 6.
narrative evaluation of perceived strengths and weakness by the T

60
Q

Journals

A

the most formative of all the alternatives in assessment
CONTENT VALIDITY ↑, WASHBACK ↑ ↑

a log of one’s thoughts ,feelings, reactions, assessments, ideas, or progress toward goals, usually written w/ little attention to structure, form, or correctness.
“written conversation between T and Ss”

61
Q

Dialogue journals

A

They imply an interaction between the T and the S through dialouges or responses
장점) practice in writing fluently, using writing as a thinking process, emphasizing a stuent’s own voice, afford a unique opportunity for a teacherto offer various kinds of feedback
* T becomes better accuainted with their Ss in terms of both their learning progress and their affective states
: meet Ss’ individual needs

단점) It’s difficult to set up criteria for evaluation

주의점 ) T should provide optimal feedback in your responses.
- cheerleading feedback, instructional feedback, in which you suggest strategies or materials, reality-check feedback -> help Ss set more realistic expectations for their lg abilities

62
Q

self-assessment

/peer-assessment

A

autonomy, develop motivation

/ cooperative learning

63
Q

Observation

A

observe Ss in the classroom
assess Ss s/o their awarness
naturalness of thier linguistic performance is maximized
Can take the form of recording, checklist, ration scales

64
Q

Holistic scoring

A

an approach that uses a “single general scale” to give a global rating for a test-taker’s lg production

장) fast evaluation
단) no diagnostic info is avaible (no washback potential), raters need to be extensively trained to use the scale accurately

65
Q

Analytic scoring

A

An approach that separtely rates a number of predetermined aspects (e.g. grammar, content, organization) of a test-taker’s lg production (e. writing)
=> establishing learners to hone in on weakness and caplitalize on strengths

PRACTICALITY ↓, in that more time is required for T to attend to details but ultimately Ss receive more information about their writing

66
Q

Primary trait scoring

A

e.g.) 설득하는 글쓰기 -> 설득하는 측면에만 초점두어 점수매기기

It allows both writer and evaluator to focus on function

67
Q

Multiple choice items

A

Practicality ↑: time-saving scroing procedures, Reliability ↑: pre-determined correct responses

multiple choice itmes are all receptive, or selective response items in that the test-taker chooses from a set of responses.

STEM: the body of the item that presents a stimulus
Options/ Altnernatives - KEY

68
Q

Guidelines for designing multiple choice items.

A
  1. design each item to measure a single objective.
    e.g.) WH-Q이 objective면 이것만 측정
    +) Inadvertant (unintentional) clue 제공하면 X

2) State both stem and options as simply and directly as possible - remove needless redundancy from options and stem
3. Make certain that the intended answer is clearly the only correct one (Only one correct answer)

기출) make sure the distractors are the same grammatical class as the key 
/ make sure the key cannot be selected based on Ss' world kn.