Reliability 논자시 Flashcards
The relationship between reliability and validity
What is reliability?
What is validity?
What is the relationship between them?
How do we measure reliability?
- Test-retest Method; stability
- Split-half Method, internal consistency
- Multiple-Forms Method (parallel form); equivalence
- Rater reliability (inter-rater, intra-rater)
What are the strengths and weaknesses of each method of measuring reliability?
- Test-retest Method
strength:
weakness: - Split-half
strength:
weakness: - Multiple-forms
strength:
weakness:
What are some key statistical figures to know in reliability measures? What do they mean?
Correlations
Cronbach alpha
How do we measure validity?
Content validity
Criterion-related validity (predictive, concurrent)
Construct validity (discriminant, convergent)
correlations
factor analysis
MTMM
Things to consider in understanding the relationship between reliability and validity in language testing
consistency of measurement
the degree to which accumulated evidence supports the inferences that are made from the scores
observed score, true score, error score
the agreement between two efforts to measure the same trait through maximally similar methods
the agreement between two attempts to measure the same trait through maximally different methods
performance assessment
Describe the features of (a) purpose, (b) content, (c) frame of reference, (d) scoring procedure and (e) testing method on test development.
- Educational purpose - used for a wide variety of decisions. Classify based on type of decision to be made.
(a) admission - selection, entrance, readiness
(b) identify appropriate level/areas of instruction - placement tests, diagnostic tests
(c) learning progress - progress reports, achievements, attainments (mastery) - Research purpose - test results used for comparing the performance of individuals with different characteristics, under different conditions of acquisition, or instruction. Language tests also test the hypotheses about the nature of language proficiency.
What are the different types of tests.
- For diagnostic tests, …
- For placement tests, …
- For selection tests, …
- For formative tests, …
- For proficiency tests, …
- For achievement tests, …
Difference between subjective and objective scoring methods
a
Factors that influence test performance
- Communicative language ability
- Test method facets
- Personal attributes
- Random factors
Test method facets:
testing environment, test rubric, input, expected response, relationship between input and expected response
Criterion referenced (CR) test vs. Norm-referenced (NR) test
Results of a language test can be interpreted in two ways depending on frame of reference.
NR interpretation - interpreted in relation to the performance of a group (or norm). The group is a large group of individuals who are similar to the individuals for whom the test is designed. Results are usually interpreted with the group they are taking it with instead of a separate norm group. (mean, median, sd, percentile rank) Scores distributed on a normal distribution, ranked. Maximize distinctions among individuals in a given group. (Standardized tests have fixed content, standard procedures administering and scoring, rigorously tested and empirically validated)
CR interpretation - interpret a score based on a criterion level/ability/content (mastery of subject). Must specify reference points (criterion level of ability/domain). Items are selected based on how adequately they represent these ability levels. Need good coverage of content domain. Subject matter experts evaluate test items against test specifications. Mastery would result in an A, regardless of how many get an A.
Keep in mind that it would be harder to implement a cut-off score when most scores cluster in one area (High, middle, low). unnaturally low variance, low reliability estimate.