Stats - Test Construction Flashcards
For a test to be good, what two things does it need?
Reliability and validity
Does validity or reliability involve consistency, repeatability and dependability in scores?
Reliability
What’s the formula of True Score Model or Classical Test Theory (i.e., total variability = ______ + _______)
total variability = true score variability + error variability (a combo of truth & error)
If true score variability is 80% then what percentage is error variability?
20%
total variability is always 100%
If true score variability is 80% then what is the reliability score?
0.80 (always expressed as a coefficient)
What’s the minimal acceptable reliability
0.80
What are 3 common sources of error in tests, and what are they?
- content sampling: items that do or don’t tap into a test-takers knowledge
- time sampling: scores are different at two points in time due to factors related to the passage of time
- test heterogeneity: when a test has heterogeneous items tapping more than one domain
How’s reliability increased in relation to number of items?
Reliability increases when the number of items increases
How’s reliability increased in relation to homogeneity of items?
the more homogeneous the items are, the greater the reliability
How’s reliability increased in relation to range of scores?
An unrestricted range of scores increases the reliability
Does the ability to guess increase or decrease reliability?
It decreases reliability
What is test-retest reliability?
When you administer the identical test to the same sample of people at two points in time
Coefficient of stability is another name for what?
Test-retest reliability
What’s parallel forms reliability?
It’s calculated by the scores obtained by the same groups of people on two roughly equivalent but not identical forms of the same test administered at two different points in time
Coefficient of equivalence is another name for what?
Parallel forms reliability
What type of reliability is when you look at the consistency of the scores within the test; therefore, the test is administered only once to one group of people
Internal consistency reliability
What are the two subtypes of internal consistency?
- Split-half reliability
- Kuder Richardson (KR-20/21) or Coefficient Alpha
What 1 of 4 reliability estimates is associated with the Spearman brown prophecy formula?
Internal consistency reliability in the subtype of split-half reliability
What is subtype called from internal consistency reliability where it finds the average of every possible way of splitting the test in half
Kuder Richardson or Coefficient Alpha
What reliability estimate looks at the degree of agreement between two or more scorers when a test is subjectively scored?
Interrater (interscorer) reliability
What reliability estimate connects to percent agreement, Pearson r, Kappa stats or Yule’s Y
Interrater reliability
If the test was perfectly reliable what would be the standard error of measurement score?
0.0
What is the highest score of the standard error of measurement?
It would be equal to the standard deviation of the test
What are the 3 confidence intervals?
-1 & +1 = 68%
-2 & +2 = 95%
-3 & +3 = 99%
What connects to the meaningfulness, usefulness or accuracy of a measure in terms of predicting? Is it a good predictor?
Validity
Are the items on my test really measuring the right knowledge or skills to predict if someone’s good for a job?
Is this content, criterion or construct validity?
Content validity
Whether the test is valid or accurate in terms of predicting? Can we use our test to predict something? (e.g., sales potential or salesmen on my team)
Is this content, criterion, or construct validity?
Criterion validity
Whether my instruments are measuring the trait that I think it’s measuring (make a scale that measures assertiveness, but is it actually measuring assertiveness or is it measuring aggression?
Is this content, criterion, or construct validity?
Construct validity
What type of validity, content vs construct, measures skills/knowledge or measures a trait
Content= skills and knowledge
Construct = always a trait
What are the two subtypes of criterion-related validity?
- concurrent (measuring the predictor & outcome at the same time)
- predictive (delay between the measurement of predictor and the criterion)
The criterion-related validity coefficient is Rxy; what does x represent, and what does y represent?
x = test/predictor scores
y= outcome/criterion scores
How do you find out how much the outcome (criterion variability) can be explained by the predictor if you have the criterion-related validity coefficient score (e.g., 0.50)
You square it (0.50 squared = 0.25 = 25%)
What’s the maximum value of the standard error of the estimate?
It’s the same as the standard deviation of the criterion
3 applications of the criterion related validity coefficient
- Expectancy tables
- Taylor Russel tables
- Decision making theory
Which of the 3 applications lists the probability that a person’s criterion score will fall in a specific range based on the range in which that person’s predictor score fell?
1. Expectancy Tables
2. Taylor-Russel
Tables
3. Decision making theory
- Expectancy Tables
Which of the 3 applications numerically shows how much more accurate selection decisions are when a particular predictor test as opposed to not using one?
1. Expectancy Tables
2. Taylor-Russel
Tables
3. Decision making theory
- Taylor Russel Tables
What’s the definition of a base rate? (Taylor Russel Table)
the rate of selecting successful employees without using a predictor test
What’s the definition of a selection ratio? (Taylor Russel Table)
the proportion of available openings to available applicants (e.g., 3 openings with 10 applicants)
What’s the definition of incremental validity (Taylor Russel Table)
it’s the amount of improvement in the success rate that results from using a predictor test
e.g.,
base rate = 0.40 (40%) to 0.65 (65%)
incremental validity = 65% - 45% = 25%
incremental validity = 25%
When is incremental validity optimized? (in connection to base rate & selection ratio)
- Moderate base rate (around 0.5)
- Low selection ratio (around 0.1)
Which of the 3 applications takes performance predictions based on predictor tests and compares them with the actual criterion outcome?
1. Expectancy Tables
2. Taylor-Russel
Tables
3. Decision-making theory
- Decision-making theory
Draw out a decision-making theory table: where does criterion/outcome go, where does predictor/test go, & where do false negative, true negative, false positive & true positive go?
Criterion/outcome = Y (up & down axis)
Predictor/test = X (right to left axis)
false negative = top left
true negative = bottom left
false positive = bottom right
true positive = top right
If a company wanted to reduce false positives of a pregnancy test, what do you do to the predictor & criterion cut offs?
1. Increase both
2. Decrease both
3. Increase predictor & decrease criterion
4. Decrease predictor & increase criterion
- Increase predictor & decrease criterion
i.e., make the false positive box smaller in the decision-making theory table
What does cross-validation do to the criterion-related validity coefficient?
Results in shrinkage of the criterion-related validity coefficient
What does correction for attenuation calculate concerning validity and the reliability of predictor (x) & criterion (Y)?
How much higher validity would be if the predictor (X) and criterion (Y) were both perfectly reliable
What’s it called when the criterion is subjectively scored, and the rater knows how people did on the predictor before assigning them to the criterion: (e.g., if teacher knows student’s IQ scores before an assignment and it and then they give students with higher IQs higher grades on the assignment)
Criterion Contamination
What does criterion contamination result in? (in connection to criterion-related validity coefficient scores)
inflated/spuriously high criterion-related validity coefficient
What type of validity looks at how adequately a new test measures a trait?
Construct validity
Two subtypes of construct validity
- Convergent validity - come together
- Divergent (discriminant) validity - go apart
With convergent vs divergent validity, which measures the same trait and which measures different traits?
Convergent = comparing measures of the same trait
Divergent = comparing measures of different traits
With convergent vs divergent validity, which one wants high correlations, and which one wants low correlations?
Convergent = high correlations
Divergent = low correlations
With convergent vs divergent validity, which relates to high correlations of monotrait-heteromethod (same traits are measured by different methods) and which relates to low correlations of heterotrait-monomethod (dif traits are measured by same method)
- Convergent = high correlations of monotrait-heteromethod
- Divergent = heterotrait-monomethod
With standard error of measurement, where do you plot the person’s score in relation to a bell shaped distribution and how do you figure out what the other numbers are in relation to +1, +2, +3, -1, -2, -3
The person’s score goes in the middle (0) of the distribution & then add or subtract the standard error of measurement to the person’s score (e.g. person’s score = 70 & standard error of measurement = 5, then -1 would be 65 and +1 would be 75)
Cronbach’s Alpha is a measure of…
Internal Consistency Reliability
Reliability is always highest when test items are (heterogenous/homogenous), subjects are (heterogenous/homogenous) and the number of items are (high or low)
test items = homogenous
subjects = heterogenous
# of items = high