RELIABILITY Flashcards
involves the consistency of the measuring tool: the precision with which the test measures and the extent to which error is present in measurements.
RELIABILITY
is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance.
reliability coefficient
Sources of Error Variance
- Test construction
- Test administration
- Test scoring and interpretation
Sources of Error Variance
item sampling or content sampling
Test construction
Sources of Error Variance
Test administration:
- test environment
- test-taker variables
- examiner-related variables
Test Administration:
The room temperature, the level of lighting, and the amount of ventilation and noise,
Test Environment
Test Administration:
Pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication.
Test-taker variables.
Test Administration:
Physical appearance and demeanor
Examiner-related variables
Sources of Error Variance
Scorers and scoring systems
Test scoring and interpretation
Reliability Estimates
- Test-retest reliability
- Parallel-Forms and Alternate-Forms Reliability Estimates (coefficient of equivalence)
- Split-Half Reliability Estimates
- KR-20 AND COEFFICIENT ALPHA FORMULA
- Measures of Inter-Scorer Reliability
Reliability Estimates
Estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
Test-retest reliability
tendencies to act, think, or feel in a certain manner in any given circumstance
TRAIT
Test-retest reliability
When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the __
coefficient of stability
Parallel-Forms and Alternate-Forms Reliability Estimates
What type of reliability estimate is obtained when two different versions of a test are constructed to be parallel?
Alternate-Forms Reliability
Parallel-Forms and Alternate-Forms Reliability Estimates
What type of test reliability exists when the means and variances of observed test scores are equal for each test form?
Parallel-Forms Reliability
Parallel-Forms and Alternate-Forms Reliability Estimates
What are the two ways to obtain an estimate of parallel-forms reliability?
Answer:
- Administering two test forms to the same group
- Considering factors like motivation, fatigue, and practice effects
Parallel-Forms and Alternate-Forms Reliability Estimates
What is the primary source of error variance in alternate- or parallel-forms reliability?
Item Sampling
Parallel-Forms and Alternate-Forms Reliability Estimates
What is one drawback of parallel-forms reliability testing due to its complexity and cost?
It is time-consuming and expensive
Parallel-Forms and Alternate-Forms Reliability Estimates
What type of reliability can be obtained without developing an alternate test form or administering a test twice?
Internal Consistency Reliability
Reliability Estimates
What reliability estimate is obtained by correlating two pairs of scores from equivalent halves of a single test?
Split-Half Reliability
Split-Half Reliability
What is the first step in obtaining a split-half reliability estimate?
Divide the test into equivalent halves
Split-Half Reliability
What statistical method is used to calculate the correlation between two halves of a test?
Pearson r
Split-Half Reliability
Which formula is used to adjust the half-test reliability in split-half reliability estimates?
Spearman-Brown Formula
Split-Half Reliability
Why is dividing a test in the middle not recommended for split-half reliability testing?
It may not create equivalent halves
Split-Half Reliability
What are three acceptable ways to split a test for split-half reliability estimation?
- Randomly assign items
- Assign odd-numbered items to one half and even-numbered items to the other
- Divide the test by content and difficulty
Split-Half Reliability
Which formula estimates the reliability of a test when its length is changed?
Spearman-Brown Formula
KR-20 and Coefficient Alpha Formula
What does inter-item consistency measure in a test?
The degree of correlation among all test items
KR-20 and Coefficient Alpha Formula
What term refers to a test that measures a single trait, leading to higher inter-item consistency?
Homogeneity
KR-20 and Coefficient Alpha Formula
What term describes a test that measures different factors, leading to lower inter-item consistency?
Heterogeneity
KR-20 and Coefficient Alpha Formula
Which formula is used to determine the inter-item consistency of dichotomous items?
Kuder-Richardson Formula 20 (KR-20)
KR-20 and Coefficient Alpha Formula
Which formula is used to assess the internal consistency of tests with non-dichotomous items, such as Likert scales?
Coefficient Alpha
KR-20 and Coefficient Alpha Formula
Which formula provides the mean of all possible split-half correlations?
Coefficient Alpha
Reliability Estimates
What is another term for scorer reliability, which measures the consistency between different judges or raters?
Inter-Rater Reliability
Measures of Inter-Scorer Reliability
What does inter-scorer reliability assess in a test?
The degree of agreement between two or more scorers, judges, or raters
What statistical value is used to measure inter-scorer reliability?
Coefficient of Inter-Scorer Reliability
Nature of Tests
What type of test items measure a single ability or trait and have a high degree of internal consistency?
Homogeneous Items
Nature of Tests
What type of test items measure multiple abilities or traits, leading to lower internal consistency estimates?
Heterogeneous Items
Nature of Tests
Which type of test items typically result in a high internal consistency reliability estimate?
Homogeneous Items
Nature of Tests
For which type of test items is test-retest reliability a more appropriate measure than internal consistency?
Heterogeneous Items
Nature of Tests
What type of characteristic is presumed to be ever-changing due to situational and cognitive experiences?
Dynamic Characteristic
Nature of Tests
What type of characteristic is presumed to be relatively unchanging over time?
Static Characteristic
Nature of Tests
A person’s mood, which changes depending on experiences and situations, is an example of what kind of characteristic?
Dynamic Characteristic
Nature of Tests
A person’s fingerprint, which remains consistent throughout life, is an example of what kind of characteristic?
Static Characteristic
Nature of Tests
What occurs when a test limits the variability of scores, potentially underestimating the true relationship between variables?
Restriction of Range
Nature of Tests
What happens when a test expands the variability of scores, potentially exaggerating the true relationship between variables?
Inflation of Range
Nature of Tests
- Homogeneity vs Heterogeneity of test items
- Dynamic vs Static characteristics
- Restriction or inflation of range
- Speed tests versus power tests
- Criterion-referenced tests
it provides an estimate of the amount of error inherent in an observed score or measurement
Standard Error of Measurement (SME)
Also known as the standard error of a score
is a judgment or estimate of how well a test measures what it purports to measure in a particular context
VALIDITY
VALIDITY
Three Categories of Validity:
- Content Validity
- Criterion-Related Validity
- Construct Validity
VALIDITY
Which type of validity is concerned with evaluating the actual content of a test to ensure it represents what it is supposed to measure?
Content Validity
VALIDITY
Which validity category involves comparing test scores with other established measures to determine its accuracy?
Criterion-Related Validity
VALIDITY
Which type of validity focuses on analyzing how test scores relate to a theoretical framework or concept?
Construct Validity
VALIDITY
If a math test is checked to ensure it includes all necessary topics, which type of validity is being assessed?
Content Validity
VALIDITY
If a new depression scale is compared to an existing clinical measure of depression, which type of validity is being examined?
Criterion-Related Validity
VALIDITY
A psychologist analyzes whether a personality test aligns with established personality theories. What type of validity is this?
Construct Validity
VALIDITY
What type of validity refers to how much a test appears to measure what it is supposed to, from the perspective of the test taker?
Face Validity
VALIDITY
What is the term for a judgment about how relevant and appropriate test items seem to be?
Face Validity
VALIDITY
Which type of validity can influence a test taker’s motivation and cooperation based on their perception of the test’s effectiveness?
Face Validity
VALIDITY
What type of validity assesses how well a test represents the entire domain of behavior it is designed to measure?
Content Validity
VALIDITY
What is a method developed by C. H. Lawshe to measure agreement among judges on the importance of test items?
Content Validity Ratio (CVR)
VALIDITY
What type of validity assesses how well test scores can predict an individual’s performance on a related measure of interest?
Criterion-Related Validity
VALIDITY
What is the standard against which test scores are evaluated in criterion-related validity?
Criterion
VALIDITY
What error occurs when a rater’s knowledge of test scores influences their ratings?
Criterion Contamination
VALIDITY
What are the characteristics of a good criterion?
a. relevant significance
b. valid measure
c. uncontaminated obsolete
VALIDITY
Two Types of Validity
a. Concurrent Validity
b. Predictive Validity
VALIDITY
Which type of validity measures the relationship between test scores and a criterion measure obtained at the same time?
Concurrent Validity
VALIDITY
Which type of validity assesses how well a test predicts future performance on a criterion measure?
Predictive Validity
VALIDITY
What is the correlation coefficient that indicates the relationship between test scores and the criterion measure?
Validity Coefficient
VALIDITY
What type of validity evaluates whether test scores accurately represent an abstract concept or theoretical idea?
Construct Validity
VALIDITY
What is an informed, scientific concept developed to describe or explain a behavior, such as motivation or depression?
Construct
VALIDITY
Which type of validity requires formulating hypotheses about how high and low scorers should behave?
Construct Validity
Construct Validity
Evidence of Construct Validity
- Evidence of homogeneity
- Evidence of changes with age
- Evidence of pretest–posttest changes
- Evidence from distinct groups
- Convergent evidence
- Discriminant evidence
Evidence of Construct Validity
What type of evidence supports construct validity by showing that a test measures a single concept?
Evidence of Homogeneity
Evidence of Construct Validity
What type of evidence supports construct validity when test scores change with age as expected?
Evidence of Changes with Age
Evidence of Construct Validity
Which type of evidence is based on changes in test scores after an intervention or over time?
Evidence of Pretest–Posttest Changes
Evidence of Construct Validity
What type of construct validity evidence shows that test scores vary as expected among distinct groups?
Evidence from Distinct Groups
Evidence of Construct Validity
Which type of evidence is shown when test scores correlate well with other measures of the same construct?
Convergent Evidence
Evidence of Construct Validity
What type of evidence demonstrates that a test does not correlate with measures of unrelated constructs?
Discriminant Evidence
TEST BIAS
Different Kinds of Test Bias
- Rating Error
- Halo Effect
- Horn Effect
- Contrast Error
- Recency Bias
TEST BIAS
3 TYPES OF RATING ERROR
a. Leniency error or generosity error
b. Severity error
c. Central tendency error
RATING ERROR
What type of rating error occurs when a rater is overly forgiving in scoring, marking, or grading?
Leniency Error
(Generosity Error)
RATING ERROR
A teacher gives almost all students high grades, even if some did poorly on their tests. What kind of rating error is this?
Leniency Error
(Generosity Error)
RATING ERROR
What type of rating error occurs when a rater is overly harsh in scoring?
Severity Error
RATING ERROR
A supervisor consistently gives employees low performance scores, even if they meet expectations. What kind of rating error is this?
Severity Error
RATING ERROR
What type of rating error happens when a rater avoids extreme ratings and tends to score in the middle of the rating scale?
Central Tendency Error
RATING ERROR
A teacher grades all students between 80-85, even though some deserve a 95 and others a 70. What kind of rating error is this?
Central Tendency Error
TEST BIAS
What is the tendency to give a higher rating than deserved due to the failure to distinguish between different aspects of a person’s behavior?
Halo Effect
TEST BIAS
A manager gives an employee excellent ratings in all areas because they have a friendly personality, even though their actual work performance is average. What kind of bias is this?
Halo Effect
TEST BIAS
What bias occurs when one negative aspect of performance influences all other ratings, resulting in an overall lower score?
Horn Effect
TEST BIAS
A teacher gives a student low marks in all subjects because the student misbehaved in class, even though they perform well academically. What kind of bias is this?
Horn Effect
TEST BIAS
What type of rating error occurs when raters compare individuals to each other instead of evaluating them against performance standards?
Contrast Error
TEST BIAS
A recruiter rates an average candidate poorly because they interviewed right after an exceptional candidate. What type of error is this?
Contrast Error
TEST BIAS
What bias occurs when a leader bases their evaluation on an employee’s most recent performance rather than their overall performance?
Recency Bias
TEST BIAS
A manager rates an employee poorly because they made a mistake last week, even though they performed well throughout the year. What kind of bias is this?
Recency Bias