Chapter 5 Reliability Flashcards
It refers to consistency in measurement, it only really refers to something that is consistent–not necessarily good or bad but simply consistent
Reliability
an index of reliability, a proportion that indicates the ratio between true score variance on a test and the total variance.
Reliability Coefficient
An index describing the consistency of the scores across context
Reliability Coefficient
This theory states that a score on an ability test is presumed to reflect not only the test taker’s true score but also the error
Classical Test Theory
a portion of our observe score which extent of our ability, characteristics and behavior
True score
the component of the observed test score that does not have to do with the test taker’s ability
Error
It is a statistic useful in describing scores of test score variability. It is useful because it can broken down into components
Variance
What are the two components of variance?
True variance and Error Variance
It is a variance from true differences
true variance
Variance from irrelevant various sources
error variance
The greater the proportion of the total attributed to true variance, the more____is a test
reliable
all factors associated with the process of measuring some variable, other than the variable being measured;
measurement error
It is a source of error caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. It is also called “noise”
Random error
A type of error that is typically constant or proportionate on what is presumed to be the value of the variable being measured
systematic error
True or False. Is Systematic error can be fixable once it is discovered?
TRUE
True or False. Is Systematic error does affect the score consistency?
FALSE
According to this theory, we can estimate the true score by finding the mean of the observe scores from repeated administration
Basic Sampling theory
Which source of error is this situation? The Extraversion Personality test constructed by the students of Ms. Salinas has a variation among items within a test and variation among items between items
Test construction under item/content sampling
What are the three sources of error under test administration?
Test-environment, test-taker variables, and examiner-related-variables
A source of error in which the sample was actually represent the population but it is not enough
Sampling error
A sources of measurement error in which a variability inherent in a test score as a function of the fact that they are obtained at one point in time rather than another. Same test given at different points in time may produce different scores, even if given to the same test takers tests of relatively stable traits or behavior may be prone to this error
Time Sampling
A sources of measurement error that results from selecting test items that inadequately cover the content area that the test is supposed to evaluate
Item/content sampling
A sources of measurement error that is concerned with the intercorrelations between items in a test.
If the test is designed to measure a single construct and all items are equally good candidates to measure that attribute, then there should be a high correspondence among items
Inter item inconsistency
A sources of measurement error in which different judges observing the same event may record different numbers
Observer Differences:
A reliability estimate in which it correlates pairs of scores from SAME people who are administered the SAME test at two DIFFERENT times.
Test-Retest Reliability Estimate
what is the result of time sampling error in test-retest reliability
scores are likely to fluctuate as a result of time sampling error
longer interval = lower correlation
What is the ideal interval of re-administering a test in test-retest reliability?
(ideal interval between tests is 2-4 weeks
What are the statistic procedure use in test-retest reliability?
statistics: Pearson r -> interval and ratio
Spearman rho -> ordinal
What are the possible intervening factors that will happen if we did not follow the interval of 2-4weeks of re-administration of a test in a test-retest reliability?
carry over effect
Practice effect
Mortality
Changes in Participants
Combination of all these factors
explain the carry over effect in test-retest reliability
–possible na maalala yung tests at mag review
occurs when the first testing session affects the second testing (e.g. remembering test items)
only of concern when it is random or if it is unpredictable and affects only some respondents
if systematic error, where it affects all respondents, then reliability is not affected
explain the Practice effect in test-retest reliability
when a test takers score better because they have sharpened their abilities with the passing of time—– development shows plasticity hahaha
explain the coefficient stability
Kapag naman masyadong matagal 6 months for example wala nang masyadong maalala si test taker–ang tawag pag ganito is Coefficient of Stability
Explain Mortality in test-retest reliability
dropping out of the study.
Explain changes in Participant in test retest reliability
-non normative changes and normative history graded influences
It is the degree of the relationship between various forms of a test that can be evaluated by means of alternate forms of a test or parallel forms coefficient of reliability
Coefficient of equivalence
In this reliability of estimate, the means and variances of observed test scores are equal
Parallel forms
In theory, the mean scores obtained on ______ correlate equally with true score and with other measures
Parallel Forms
A reliability estimate in which simply different versions of a test that have been constructed to as parallel. It is a mere different versions without the sameness of the observed scores
Alternate Forms
It is typically designed to be equivalent with respect to variables such as content and level of difficulty
Alternate Forms
How to obtain the estimates in parallel/alternate forms
- two administrations with the same group are required (Form A at one point and Form B at the other)
- the equivalent form is administered either immediately or fairly soon
-after administration, correlation coefficient is obtained between the results of the two forms
What are the statistic procedure that we can use in Alternate or Parallel forms?
- statistics Pearson r for scale measurements (interval/ratio)
- Spearman rho for ordinal
Source of error in immediate Alternate/Parallel forms
content sampling
Source of error in delayed Alternate/Parallel forms
content sampling and time sampling
True or False. Alternate forms is are INDEPENDENTLY CONSTRUCTED TESTS
True
Give the following that alternate and parallel forms should have
- the same number of items
- items expressed in the same form
- items that cover the same type of content
- same difficulty
- same instructions
-same time limits, format, and all other aspects of the test
degree of correlation among all the items on a scale calculated from a single administration of a single form of a test
Internal Consistency
Explain the term “assess homogeneity” in Internal consistency
extent to which items measure a single trait
True or False. Is possible that heterogenous items can be homogenous
True
Explain why is it possible that heterogenous items can be homogenous
possible pa rin naman kahit heterogenous yung items basta i-measure ang items per subscale
Homogenous pa rin ang test na may subscales as long as per subscale iisang construct lang yung minemeasure.
In this reliability estimates, the two scores are obtained for each person by dividing the test into equivalent halves
Split-Half Reliability Estimate
What are the steps of Split-Half Reliability Estimate?
- divide the test into two equivalent halves
- calculate Pearson r between scores on the two halves of the test
- adjust half-test reliability using the Spearman-Brown formula
What are the acceptable ways to divide the test in Split Half Reliability?
- random assignment
- odd-even
- split dividing the test by content so that each half contains
- equivalent items with respect to content and difficulty
Explain the disadvantage of split half reliability
reliability of the test is directly related to the length of the test, kaya pag split half yung ginamit mo possible na below .70 yung correlation ng scores. To address that gagamit ng spearman brown formula. Rule of thumb is the higher the number of items the higher the reliability
What is spearman brown formula?
- it is used in split-half tests in correcting their correlation
- estimates the internal consistency reliability from two halves of a tests
- can be used to determine the reliability of one test once it is shortened or lenghtened
What are the other functions of spearman-brown formula?
- it can determine the number of items needed to attained a desired level of reliability
In adding new items using spearman brown formula, what are the factors to consider
it must be equivalent in content in difficulty so that the longer test still measures what the original test measures
if the reliability is low in the context of split half and spearman brown, what we can do?
- abandon the instrument
- locate or develop a suitable alternative
- create new items, clarify the test instruments or simply the scoring rules
used for non-dichotomous items —walang maling sagot na test such as personality tests
Cronbach’s Coefficient Alpha
True or False. It is appropriate to use split half reliability in heterogenous test and speed test
False
According to Steiner, the value of alpha____ may be too high and indicate redundancy items
.90
Cronbach’s alpha will be higher when a measure has more than ___items
25
used for tests with dichotomous items with varying difficulty
May choices at may iisang tamang sagot lang na test and magkakaiba yung level of difficulty per item
KR-20 (Kuder-Richardson formula 20)
who is the propent of
KR-20 (Kuder-Richardson formula 20)
used for tests with dichotomous items with same difficulty or average 50% difficulty
KR-21
A relatively new measure for evaluating internal consistency of a test. It focuses on degree of differences that exist between item scores
Average Proportional Distance (ADP)
Interpretation of ADP
0.2 higer—excellent
.25 to 0.2 Acceptable
ADP with .25 measure means
it suggests that there is a problem in internal consistency of the test
What is one potential advantage of APD to Cronbach’s Alpha?
APD is not connected to the number of items on a measure
A reliability estimate that focus on degree of agreement or consistency between two or more raters with regard to a particular measure
. Inter-rater reliability
. Inter-rater reliability is often use in _____
evaluating non-verbal behavior
what is the source of error in Inter-rater reliability?
differences between raters assessed / scoring
What are the statistical procedure use in inter-rater reliability
Kappa Statistics: 2 raters
Fleiss Statistics: 3 rater
If the test is designed for use at various times or for stable traits (e.g. employee performance, and personality trait, then what reliability estimate should we use
Test-retest reliability
If the test is for single administration only. then what reliability estimate should we use?
Internal Consistency
____ is a source of error attributable to variations in the test takers feelings, moods, or mental state overtime
Transient Errors