Chapter 5: Reliability Flashcards
this term refers to the consistency of a
measurement
It indicates whether a test yields
stable and consistent results over time and across different contexts
reliability
what is reliability determined by?
Reliability is determined by the proportion of total variance in test scores that can be attributed to true variance (i.e., actual differences in the ability being measured). The higher this proportion, the more reliable the test.
this type of variance represents actual
differences in ability.
For example, differences in math skills among friends taking a test reflect ______ ______
true variance
this term refers to an index that quantifies reliability, expressed as the ratio of true score variance to total variance.
reliability coefficient
this type of variance represents variability
due to irrelevant or random factors, such as distractions or fatigue, affecting scores even if the true ability is constant
Error Variance
this term encompasses all factors
affecting the measurement process that are not related to the variable being assessed.
Measurement Error
this type of measurement error is caused by unpredictable fluctuations, resulting in inconsistencies without a discernible pattern
This type of error can lead to score variability without biasing the results
systematically.
Random Error
this type of measurement error refers to a consistent error that may affect scores in a predictable manner, making it possible to identify and correct
This type of error does not compromise the consistency of scores
Systematic Error
this source of error variance is seen during test construction and arises from differences among test items, both within a single test and across different test
Item Sampling/Content Sampling
what are the three main sources of error variance?
- Test Construction
- Test Administration
- Test Scoring and Interpretation
this source of error variance refers to the conditions under which the test is administered can impact performance.
Test Environment
this source of error variance refers to the subjectivity in scoring, technical issues,
or glitches can lead to inconsistent
results.
Scorers and Scoring Systems
this source of error variance refers to factors such as emotional distress, physical discomfort, lack of sleep, or the influence of drugs and medication can affect the test-taker’s performance and focus.
Test-Taker Variables
this source of error variance refers to the physical appearance and demeanor of
the examiner, their presence or absence, and their level of professionalism can influence the test-taker’s experience.
Examiner-Related Variables
this source of error variance refers to the extent to which the sample population accurately represents the broader population.
Discrepancies in demographics, political affiliation, or other relevant factors can lead to biased results.
Sampling Error
this type of reliability estimate measures the consistency of results between different versions of a test designed to assess the same construct
it assesses how well two different forms of a test yield similar scores when administered to the same individuals.
Parallel-Forms and Alternate-Forms Reliability Estimates
this source of error refers to issues such as ambiguous wording in questionnaires or biased items can skew results, favoring one response or candidate over another
Methodological Error
this type of reliability estimate is defined as a method for estimating the
reliability of a measuring instrument by
administering the same test to the same group at two different points in time.
Test-Retest Reliability Estimate
when do we apply test-retest reliability estimate?
it is best used for measuring stable
traits (e.g., personality) rather than fluctuating characteristics.
this term refers to the different versions designed to be equivalent but may not meet the strict criteria of parallel forms
Alternate Forms
this term refers to the two forms of a test are statistically equivalent in terms of means and variances
Parallel Forms
what is the index for parallel and alternate-forms reliability?
coefficient of equivalence
what is the method for test-retest reliability?
correlate scores from the same individuals across two test administrations.
what is the index for test-retest reliability?
coefficient of stability
what are the reliability estimates categorized as external consistency estimates?
test-retest reliability
parallel-forms and alternate-forms reliability estimates
what are the reliability estimates categorized as internal consistency estimates?
internal consistency reliability
split-half reliability
this type of reliability can be assessed without creating alternate forms or re-administering tests
This involves evaluating the consistency of test items.
Internal Consistency Reliability
this type of reliability estimate estimates reliability by correlating scores from two equivalent halves of a single test.
Split-Half Reliability
what are the acceptable splitting methods for split-half reliabilty?
- Randomly assigning items to
halves - Using odd and even item
assignments (odd-even
reliability). - Dividing by content to ensure
both halves measure equivalent
constructs
what are the steps in conducting split-half reliability?
- Divide the test into two equivalent halves.
- Calculate the Pearson correlation between the two halves
- Adjust the reliability using the Spearman-Brown formula.
this type of measurement error refers to unpredictable fluctuations that introduce variability without a systematic
pattern.
Random Error
this term refers to all factors affecting the measurement of a variable that are not related to the variable itself
Measurement Error
this type of measurement error refers to consistent errors that affect scores predictably, allowing for correction once identified.
Systematic Error
what are the methods for estimating internal consistency reliability?
- KR-20
- KR-21
- Cronbach’s Alpha
when is KR20 appropriate?
when it is a true dichotomous test (true or false)
when is KR21 appropriate?
when it is an artificial dichotomy (multiple choice, binary scoring)
this reliability estimate measures the degree of difference between item scores rather than similarity
is calculated based on absolute differences between item scores and is less affected by the number of items on a test
Average Proportional Distance
when is Cronbach’s alpha appropriate?
when it is a true multiple choice test (Likert Scale)
this reliability estimate measures the correlation among all items on a scale from a single test administration,
assessing the homogeneity of the test
Inter-Item consistency
when is inter-scorer reliability used?
Frequently used in coding nonverbal behavior.
For example, a researcher may create a checklist of behaviors (like looking downward or moving slowly) to quantify
aspects of nonverbal cues indicating depressed mood.
his term refers to the degree of agreement or consistency between two or more scorers regarding a particular measure
Inter-Scorer Reliability
how is inter-scorer reliability calculated?
it is calculated using a correlation coefficient, referred to as the coefficient of inter-scorer reliability
what number should the coefficient be for it to be considered as highly reliable?
Considered excellent (grade A); crucial for high-stakes decisions.
0.90s
what number should the coefficient be for it to be considered as moderately reliable?
0.80s
what number should the coefficient be for it to be considered as having a low reliability?
Weak, indicating potential issues with the
test’s effectiveness
0.65-0.70s
what number should the coefficient be for it to be considered as having an unacceptable reliability?
below 0.50
when is a test considered homogenous?
A test is considered homogeneous if it is
functionally uniform, measuring a single
factor (e.g., one ability or trait).
these tests are designed to indicate how a test-taker performs relative to a specific
criterion or standard (e.g., educational or
vocational objectives).
these tests focus on measuring whether test-takers meet predetermined criteria rather than comparing their scores to those of others
criterion-referenced tests
when is a test considered heterogeneous?
A test is heterogeneous if it measures
multiple factors or traits.
In such cases, internal consistency
estimates may be lower, whereas test-retest reliability might provide a more appropriate measure of reliability
what reliability estimates are appropriate for static characteristics?
test-retest or alternate-forms methods
these tests consist of items of uniform difficulty (typically low), allowing all test-takers to complete all items correctly within generous time limits
Speed Tests
these tests are administered time limit that allows test-takers to attempt all items
Contains difficult items, with the
expectation that no test-taker can
achieve a perfect score
Power Tests
this theory is also known as the true score model
is the most widely used model of measurement in psychology due to its simplicity relative to more complex models
Classical Test Theory
this term represents the value that genuinely reflects an individual’s ability or trait level as measured by a particular test.
This value is highly dependent on the specific test used
True Score
this theory posits that a test’s reliability is
determined by how accurately the test score reflects the domain from which it samples
Domain Sampling Theory
this theory suggests that test scores can vary across different testing situations due to various situational factors
Generalizability Theory
this term refers to the complete
range of items that could measure a specific behavior, viewed as a hypothetical construct.
Domain of Behavior
this term assesses how well
scores from a specific test can be generalized across different contexts.
Coefficients of generalizability represent the influence of particular facets on test scores
Generalizability Study
this term evaluates the utility of test
scores in assisting users in making informed decisions.
Decision Study
this theory is an alternative to CTT that models the probability of an individual with a certain level of ability performing at a specific level
Often referred to as latent-trait theory because it measures constructs that are not directly observable (latent).
Item Response Theory (IRT)
what does discrimination mean for IRT?
discrimination measures the extent to which an item can differentiate between individuals with higher or lower levels of the trait or ability being assessed.
what are the key concepts in IRT?
Difficulty and Discrimination
what does difficulty mean for IRT?
The attribute of an item indicating
how challenging it is to accomplish, solve, or comprehend.
what are the two types of test items?
- Dichotomous Test Items
- Polytomous Test Items
this type of test items have two possible responses, such as true/false or yes/no.
this type of test items have three e or
more possible responses, where only one
response is correct or aligned with the targeted trait or construct
this term refers to the range or
band of test scores that is likely
to contain the true score
confidence interval
this statistical tool is used to estimate or infer the extent to which an observed score deviates from a true score
standard error of measurement
this statistical measure can aid a test
user in determining how large a difference should be before it is
considered statistically significant
standard error of the difference
how does restricted variance influence correlation coefficients?
it leads to lower correlation coefficients since the diversity of scores is limited
how does inflated variance influence correlation coefficients?
it results in higher correlation coefficients due to a broader spread of scores.