Chapter 2 Principles of Language Testing Flashcards
Authenticity
degree of correspondence of the characteristics of a given language test task to the features of a target language task. as natural as possible, meaningful topics, real-world tasks
Washback Effect
Effect of testing on teaching and learning. includes the effects of an assessment on teaching and learning prior to the assessment itself
Practicality
whether a test should be implemented in a specific context in consideration of time, cost, and available resources
Reliability
stability and dependability of a test. the arguments on whether a test is able to measure individual competence in a consistent and dependable manner
Classical Test Theorem (Reliability)
X (observed score) = T (true score) + E (error)
Student-Related Reliability
The most common learner-related issue in reliability is cause by temporary illness, fatigue, a bad day, anxiety, and other physical factors, which may make an observed score deviate from one’s true score
Test Administration Reliability
Unreliability may also result from the conditions in which the test is administered (room lighting, classroom conditions)
Rater Reliability
Human error, subjectivity, and bias may enter into the scoring process
Inter-rater reliability
occurs when two or more scorers yield inconsistent scores of the same test, possibly for lack of attention to scoring criteria, inattention or even preconceived biases
Intra-rater reliability
common occurrence for classroom teachers because of unclear scoring criteria, fatigue, bias toward particular good and bad students, or simple carelessness
Test-Retest reliability
A reliability estimate that is based on testing the same examinees twice with the same test and then correlating the results
Validity
extent to which inferences made from assessment results are appropriate, meaningful, and useful in terms of the purpose of the assessment
Face validity
to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the examinees who take it
Content Validity
A conceptual or non-statistical validity based on a systematic analysis of the test content to determine whether it includes an adequate sample of the target domain to be measured
Criterion-related validity
refers to the extent to which the “criterion” of the test has actually been reached
Concurrent validity
A type of validity which is concerned with the relationship b/n what is measured by a test (usually a newly developed test) and another existing criterion measure, which may be a well-established standardized test, a set of judgments or some other quantifiable variable
Predictive validity
measures how well a test predicts performance on an external criterion. The main purpose of a test is to provide information about likely behavior in the real world, prediction of criterion performance is basic to test validation (e.g placement tests, language aptitude tests)
Consequential Validity
It encompasses all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learner, and the (intended and unintended) social consequences of a test’s interpretation and use. similar to washback effect
Construct Validity
whether constructs (theories, hypothesis, models that attempt to explain observed phenomena) are measured in the exams