Quiz 2 Flashcards
didn’t study, anxious, late to test
trait error
error that resides in testing situation (noisy, hot, etc.)
Method errors
what is the formula for test score theory
X (observed score) = t (true score) + E (error)
What is the formula for gregory’s intelligence scale?
(Lf + Ll)x100 divided by
10
___are used to estimate reliability.
Correlation coefficients
ranges from -1.0 to +1.0
Correlation coefficients
is expressed from 0.0 (no reliability) to +1.0 (perfect reliability)
reliability coefficient
indicates the proportion of variance in a group of obtained scores that is attributable to true individual differences.
A reliability coefficient
A reliability coefficient is directly interpretable. A test with a reliability of .90 has a ______ error.
.10
What are the 4 types of reliability
test-retest
parallel form
internal consistency
interrater reliability
-Stability of test scores over time
-It requires two administrations of the same test with the same group of individuals
-Correlate scores from one test taken at two different times.
Describes what type of reliability?
test-retest reliabiltity
-Used when there is more than one form of a test
EX – SAT, Act, GRE, MCAT, Make-up Exams
-Reduces the possibility of coaching or cheating and memory or practice effects (minimizes but does not eliminate)
-measures degree that two forms of a test are measuring the same thing.
Describes what type of reliability
Parallel Form Reliability
When you want to know if the items on a test assess one, and only one, dimension. The majority of psychological tests have only one form.
-correlate each individual item with the total score
Describes what type of reliability?
internal consistency reliability
A single test is administered to a group of people
Items are divided into equal halves, typically odd-even (good especially b/c many tests get harder as test progresses)
Correlation between items is the split-half reliability
Have to be careful with speed tests, especially if first items are easier
Split-half reliability
When you want to know whether there is consistency in the rating of some outcome. describes what type of reliability?
examine agreement between raters
interrater reliability
How do we increase reliability
One way is to increase the number of items (typically works), will increase the range of test scores
EX – think of luck involved on a 5 item test vs. 100 item test.
The average amount of variability in a set of scores (average distance from the mean) is called
standard deviation (s or sd)
If s=____, there is no variability, numbers are identical in nature
0
_____is sensitive to extreme scores, just like the Mean.
Standard deviation
The Standard Deviation squared (or don’t compute last step of SD) is the ____
variance
Second measure of internal consistency takes it a step further than split-half
Split-half has been criticized for lack of precision – reliability changes based on how items are split
Why not take a more typical value such as the mean of the split-half coefficients for all possible splitting of a test?
Used with dichotomous data- items scored as right or wrong dichotomous (0 or 1)
Kuder-Richardson Reliability
mean of all possible split half coefficients, corrected by the Spearman-Brown formula.
Used for tests with continuum such as LIKERT
Must have high reliability coefficient (items must be homogenous – measure the same trait)
cronbach’s alpha (coefficient alpha)
An advantage of the median over the mean is:
A) it is sensitive to extreme scores
B) it is less influenced by extreme scores
C) it is more accurately reflects the central tendency
B) it is less influenced by extreme scores
The mean is: A) the most frequently occurring score B) the midpoint of a distribution of scores C) least affected by extreme scores D) the arithmetic average
D) the arithmetic average
What is the mode of the following set of values?
13, 5, 17, 18, 5, 9, 9, 13, 5, 17, 5, 16
5
In “descriptive statistics” you reduce the mass of the data, whereas with inferential statistics you:
A) measure a single characteristic
B) estimate a population’s characteristics
C) generalize information about a single person
B) estimate a population’s characteristics
Reliability within a set of observations measuring homogenous traits of a construct (e.g., personality) is referred to as: A) parallel forms reliability B) unified reliability C) internal consistency D) all of the above
C) internal consistency
A test has a reliability coefficient of .77. This coefficient means that:
A) 77% of the variance in test scores is true score variance, and 23% is error variance.
B) 77% of items on this test are reliable and 23% of the items are unreliable.
C) 23% of the variance in test scores is true score variance, and 77% is variance.
A) 77% of the variance in test scores is true score variance, and 23% is error variance.
Test constructors can improve reliability by:
A) decreasing the number of items on a test
B) increasing the number of items on a test
C) retaining items that measure sources of error variation
B) increasing the number of items on a test
The Spearman-Brown formula corrects for deflated reliability due to: A) errors in validity B) small sample size C) systematic error D) half-length tests
D) half-length tests
Administering two supposedly equivalent forms of test (e.g. form A and form B) to the same group of individuals yields a correlation coefficient indicating:
A) test-retest reliability
B) split-half reliability
C) parallel forms reliability
A) test-retest reliability
Approximately what value must a reliability coefficient have if a test is being used to make decisions about an individual’s life?
A) .90
B) .70
C) .50
A) .90
A reliability coefficient can range from:
A) 0 to 1.00
B) 0 to 100
C) -1.0 to 1.00
A) 0 to 1.00
The measure of how spread out numbers are in a group of numbers is the: A) rooted variance B) mean C) internal consistency D) standard deviation
D) standard deviation
Explain the difference between reliability and validity.
Reliability is the ability of a test to be repeated with similar results and validity is the extent to which the test measures what it claims to measure.