Reliability Flashcards
Reliability
Red is always hitting the same spot and the correct spot. This qualifies for reliability and validity.
Green is hitting the same spot all of the time. This is reliable but not valid. Consistency but not in the same spot.
The other two are not reliable or valid.
All over the place is not valid or reliable.
Reliability
Refers to the degree to which observed scores are “free of error of measurement for a given group” (Standards). It refers to consistency of scores obtained by the same person given the same test.
Reliability reflects the precision of measurement:
Error
Measurement free of error is an ideal that is basically never achieved especially in social sciences; a measurement free of error= true score (T)
Realistically, there will always be an amount of error (e) in the observed scores (x)
X= T + e
2 main sources of e (error):
1.Unsystematic
2. Systematic error
Unsystematic (Random) Error
= the fluctuations in test scores that occur when the same person takes the test several times due to:
Administration of test (standard procedures not followed)
Recording or reading (computation errors)
Instrumentation
Personal variation (fatigue, moods etc)
Environmental fluctuations (temperature, setting, noise, etc.)
The less random error, the more precise the measurement
Noise, Temperature, things that you aren`t in control of in life.
There is no pattern to these things.
Systematic Error
Random vs Systematic Error
What is a confound? : an extraneous variable in an experimental design that correlates with both the dependent and independent variables
In this example, the weather is a variable that confounds the relationship between ice cream sales and murder rates.
Comes from :
Systematic error: There is a pattern. One of the blocks were missing from the test. There is something fundamentally wrong with the test itself. Systematic error from the system itself. Ie) ethnic differences, or cultural differences not accounted for. It is a fault of the test. It is consistent, you can see it happening every time. Once you figure it out you can predict it, the bias and you can make changes to control it.
Instrument bias (different groups test differently Tester Bias Co-variation (you are measuring more than one thing at once). Two things are married together. The test is measuring two things at the same time. Anxiety and Depression: It is hard to separate: This is a classic example of co-variation…hard to separate. There is a lot of physical similarities, and there are confounds for the test
Reliability Coefficient
Reliability coefficient (rxx) = the ratio of true variance to observed variance Question: How large is the error variance in relation to the true variance? Rule: The larger the reliability coefficient the smaller the error Reminder: Standard deviation (S) is the common measure (statistic) for variance
Reliability under Classical Test Theory
X = T+ E
Assumption: Random error is the same for everyone who takes a test.
This random error is called the Standard Error of Measurement.
In statistics, the true score is the mean score a person would get if the person took the same test measuring the same thing over and over again.
The closer together the observed scores are on a test taken multiple times the less the error on the test
How reliable is a measure?
Reliability can be defined as the ratio of true variance to observed score
True Score Variance
_________________
Observed Score Variance
Error: variance occurring by random chance.
With reliability we estimate the reliability coeffiecient. We never have one value that is true. There is always uncertainty. We have ranges of probabilities. It is a useful tool.
Sampling Distribution
Sampling Distribution: All of the scores on a graph would be the sample distribution. Sampling is different. Sampling Distribution means we take a sample of 26 people, the same sample size of our class but take 1000 of these samples and we calculate each mean from each of them, and that is the sample distribution. It is an approximation of a sample. The distribution of many means of various sample sizes. It is the norm of a test.
Standard Error of the Mean
Standard Error of the Mean is the standard deviation of the sampling distribution (= the distribution of all possible sample means)
Standard Deviation of original Distribution = 9.3
Sample Size 16
9.3
____
square root of sample = 16
9.3
____
16
= 2.32
Standard Error of Measurement
Standard Error of Measurement: one person being tested many times and compute the average square distances from the mean in the distribution. This is the standard deviation of this example and is called the Standard Error of Measurement. It is the standard variation of this distribution.
If you plot you get the standard measurement of scores.
Standard Deviation of those scores attained from one person repeated repeatedly.
SD – average square distance from the mean from one person in a distribution.
We need standard area of measurement so we can get a better idea of what the true score is…..because there is no such thing as a true score.
It gives a range of possible scores around the true scores. Takes into consideration error within tests.
Real scores are impacted by error so your score falls in a range of scores around the true score.
There is always a range of scores. You don`t look for 1 score, you look for a range of scores. Things affect your score, ie) how much sleep you got, anxiety levels, temperature of the room. Your true score is likely to fall within this range.
Standard Error of Measurement
Take Observed Score
You can compute your standard error of measurement
You build an interval at the left and the right of the observed score using the Standard Error of Measurement.
SD of observed Score X (1 – Reliability coefficient)
The standard error of measurement:
Smallest measure (what is the smallest possible measurement) = 0 ie Reliability is 1
Reliability can be from 0 to 1
The largest measure is the Standard Deviation of the test scores
SEM and Reliability are negatively correlated.
When error goes up reliability goes down
When reliability goes up, Standard error goes down.
SEM and Individual Scores
SEM is important in interpreting individual scores: use SEM to estimate the person’s true score and their range (“you’re a band, not a point”)
SD of an IQ test is 15
Reliability coefficient is .90
SEM = 15 1 - .90 = 5 (rounded from 4.74)
SEM = 5
Normal Distribution
Standard Deviastion: -3 -2 -1 0 1 2 3
Z Scores Normal Disbribution
68 - 95 - 99.7 %