Test 2 Flashcards
Define Reliability
the degree to which test scores for an individual test taker or group of test takers are consistent over repeated applications
reliability coefficent
the results obtained from the statistical evaluation of reliability
define systematic error
when a single source of error always increases or decreases the true score by the same amount
define true score
the amount of the observed score that truly represents what you are intending to measure
define error component
the number of other variables that can impact the observed score
what is internal consistency
it measures the reliability of test scores on the number of items on the test and the intercorrelation among the items. therefore it compares each item to every other item
- How related the items (or groups of items) on the are to one another. This is whether knowledge on how a person answered one item on the test would give you information that would help you correctly predict how he or she answered another item on the test
what is the bench mark number for internal consistency
.30/ .70 . 70% true score and 30% error
what is item-total correlations
the correlation of the item with the remainder of the items (the percentage of error)
define average intercorrelation
the extent to which each item represents the observation of the same thing observed (connection between the items)
what is a split half
refers to determining a correlation between the first half of the measurement and the second half of the measurement
o divide the test into two halves and then compare the set of individual test scores on the first half with the set of individual test scores on the second half
what is the odd even method
refers to the correlation between even items and odd items of a measurement tool
advantages and disadvantages of the split half/odd-even method
Advantages:
- simplest method- easy to perform
- time and cost effective
- because you only need one administration
Disadvantages
- many ways of splitting (odd-even, 1st vs 2nd half, random)
- each split yields a somewhat different reliability estimate
- which is the real reliability of the test
what is test-retest reliability
measured by computing the correlation coefficient between scores of two administration
the same test is administered to the same group of people but there is a certain amount of time in between each test administration
what is the benchmark number for test - retest reliability
.50 and above
define practice effects
occurs when test takers benefit from taking the test the first time (practice) which enables them to solve problems more quickly and correctly the second time they do the test
define memory effects
which means that a respondent may recall the answers from the original test, therefore inflating the reliability
what is interrater reliability
- Interrater reliability means that if two different rater scored the scale using the scoring rules, they should attain the same result
how is interrater reliability measured?
measured by % of agreement between raters or computer the correlation coefficient between scores of two raters for the set of respondents (the raters’ scoring is the source of error)
intrascorer reliability
whether each clinician was consistent in the way he or she assigned scored from test to test
what is the benchmark score for interrater reliability
- Here the criterion of acceptability is pretty high (ex. a correlation of at least .80 or agreement above 75%), but what is considered acceptable will vary from situation to situation
.80 and above
define parallel/alternative forms method
refers to the administration of two alternate forms of the same measurement device and then comparing the scores.
- Both forms of the tests are given to the same person and then you compare the scores
advantages and disadvantages of parallel/alternative forms method
Advantages
- eliminates the problem of memory effect
- reactivity effects (ie. Experience of taking the test) are also partially controlled
- can address a wider array of sampling of the entire domain than the test-retest method
possible disadvantages
- are the two forms of the test actually measuring the same thing (same construct)
- more expensive because more man power is required to make two tests
- requires additional work to develop two measurement tools because two tests have to be created
what is generlizability theory
- theory of measurement that attempts to determine the multiple sources of consistency and inconsistency- known as factors or facets
- Identifies both systematic and random sources of inconsistency allow for the evaluation of interaction from different types of error sources
- Looks at all possible sources of errors and then separates each source of error and evaluates its impact on reliability
what is are the limitations of the generalizability theory
- u cannot measure every single source of error
- tougher to complete generalizability theory because a lot of the work has to be done upfront. A lot of the upfront work is done in regard to what data to collect, how much data to collect, what measures. All these sources of error have to thought about upfront. With CTT you can do the test first and then look at the factors with regards to reliability.
what is standard error of measurement (SEM)
an estimate of how much the observed test score might different from the true test score
a statistic that obtains the confidence interval for many obtained scores. It represents the hypothetical distribution we would have if someone took a test an infinite # of times
how to calculation SEM
SD(sq root of 1 minus reliability)
define confidence interval
Give an estimate of how much error is likely to exist in an individual’s observed score, that is, how big the difference between the individual’s observed score and his or her true score is likely to be
what is cronbachs alpha
coefficient of internal consistency- commonly used. Looks at interval scale. Determines which questions on the scale are interrelated. Used for test questions such as rating scales that have more than one correct answer
what is Kuder Richardson (KR-20)
used for dichotomous items (ex. 0 or 1, true or false). Dichotomous scale. Ordinal in nature. Used when There is either a right or wrong answer. There is only one correct answer
what is spearman brown
used in split- half analysis is used to adjust the reliability coefficient. It is designed to estimate what the reliability would be if the tests had not been cut in half
what is cohens kappa
inter-rating reliability