Reliability Flashcards

Question

Observer Differences - sources of error

Answer 1

untrained person, independent observations reconciled Even though they have the same instructions, different judges observing the same event may record different numbers. To determine the extent of this type of error, researchers can use an adjusted index of agreement such as the kappa statistic.

Answer 2

Because we usually assume that the distribution of random errors will be the same for all people, classical test theory uses the standard deviation of errors as the basic measure of error. tells us, on average, how much a score varies from the true score. In practice, the standard deviation of the observed score and the reliability of the test is used to estimate the standard error of measurement SDsqrt1-r Not the standard error of the mean Includes info about the reliability of the test Use this measure to construct a confidence interval around a specific score Upper bound = score +/- 1.96*SM Bounds are around the observed score 1.96 multiplied by standard error of measurement - 95% CI Tells me that X is my observed score, 95%, the true score will be within those boundaries

Answer 3

Depends on what you are using it for High-stakes consequences - need to have a good idea of how reliable it is Reliability estimates in range of .7-.8 are good enough for most purposes in basic research - Some people have argued that it would be a waste of time and effort to refine research instruments beyond a reliability of .90. In fact, it has even been suggested that reliabilities greater than .95 are not very useful because they suggest that all of the items are testing essentially the same thing and that the measure could easily be shortened. For a test used to make a decision that affects some person’s future, evaluators should attempt to find a test with a reliability greater than .95.

Answer 4

Add items - always makes it more reliable - if she gives one multiple choice question vs. 40 item, single item would not be reliable Larger the sample - more likely that the test will represent the true characteristic Item analysis - go in and test how all individual items are doing - which ones are doing well Each item in a test is an independent sample of the trait or ability being measured

Answer 5

N = the number of tests of the length of the current version that would be needed Rd = desired reliability R0 = observed reliability based on the current version of the test

Answer 6

If a test is unreliable, information obtained with it is of little or no value. Thus, we say that potential correlations are attenuated, or diminished, by measurement error. True correlation btw tests 1 and 2 - estimate of true correlation between tests OBserved will be an underestimate If we could get everyone’s true scores R12hat = the estimated true correlation btw tests 1 and 2 R12 = the observed correlation btw 1 and 2 R11 = reliability of test 1 R22 = reliability of test 2 Another way if you are concerned about one of your tests being unreliable

Answer 7

considers the problems created by using a limited number of items to represent a larger and more complicated construct Use a sample task in reliability analysis is to estimate how much error we would make by using the score from the shorter test as an estimate of your true ability. reliability as the ratio of the variance of the observed score on the shorter test and the variance of the long-run true score. the greater the number of items, the higher the reliability. Because true scores are not available, our only alternative is to estimate what they would be. Given that items are randomly drawn from a given domain, each test or group of items should yield an unbiased estimate of the true score. Different random samples of items might give different estimates of the true score To estimate reliability, we can create many randomly parallel tests by drawing repeated random samples of items from the same domain

Answer 8

Imagine a bunch of people taking the same test Everyone has their own TRUE score (theoretical) Everyone has their own observed scores Variance = square of SD We can calculate the variance of the observed scores Theoretically, we could also imagine the variance of the true scores Which would be bigger? - observed or true Observed scores variance is larger - add error variance

Answer 9

Observed scores variance is larger - add error variance

Answer 10

No error - bullseye Random error - scattered spots in middle - we have accuracy but not precision Systematic error - not a lot of variance but error is not randomly distributed - cluster somewhere else - precision but not accuracy Practice effect - expect score to be different the second time - not-random - get better Test that underestimates the ability of women - gave a systematically lower score Observed score tends to be lower than true score

Answer 11

Ratio of the variance of the true scores on a test to the variance of the observed scores R = the theoretical reliability of the test o2T = the variance of the true scores O2x = the variance of the observed scores Use o bc theoretical values in a population rather than those actually obtained from a sample R = percentage of the observed variation that is attributable to the variation in the true score - 1 = variance attributable to random error

Answer 12

frequently unreliable because of discrepancies between true scores and the scores recorded by the observer problem of error associated with different observers presents unique difficulties estimate the reliability of the observers - interrater

Answer 13

percentage does not consider the level of agreement that would be expected by chance alone - , if two observers are recording whether a particular behavior either occurred or did not occur, then they would have a 50% likelihood of agreeing by chance alone. A method for assessing such reliability should include an adjustment for chance agreement percentages should not be mathematically manipulated. For example, it is not technically appropriate to average percentages. Indexes such as Z scores are manipulable and thus better suited to the task of reliability assessment.

Answer 14

Factor analysis - tests are most reliable if they are unidimensional - one factor should account for considerably more of the variance than any other factor Disciminability analysis - when the correlation between the performance on a single item and the total test score is low - item is prob measuring something different from other items on the test

Reliability Flashcards

(38 cards)