Stats - Test Construction Flashcards by Hayley Mueller

For a test to be good, what two things does it need?

Reliability and validity

How well did you know this?

Not at all

Perfectly

Does validity or reliability involve consistency, repeatability and dependability in scores?

Reliability

How well did you know this?

Not at all

Perfectly

What’s the formula of True Score Model or Classical Test Theory (i.e., total variability = ______ + _______)

total variability = true score variability + error variability (a combo of truth & error)

How well did you know this?

Not at all

Perfectly

If true score variability is 80% then what percentage is error variability?

20%
total variability is always 100%

How well did you know this?

Not at all

Perfectly

If true score variability is 80% then what is the reliability score?

0.80 (always expressed as a coefficient)

How well did you know this?

Not at all

Perfectly

What’s the minimal acceptable reliability

0.80

How well did you know this?

Not at all

Perfectly

What are 3 common sources of error in tests, and what are they?

content sampling: items that do or don’t tap into a test-takers knowledge
time sampling: scores are different at two points in time due to factors related to the passage of time
test heterogeneity: when a test has heterogeneous items tapping more than one domain

How well did you know this?

Not at all

Perfectly

How’s reliability increased in relation to number of items?

Reliability increases when the number of items increases

How well did you know this?

Not at all

Perfectly

How’s reliability increased in relation to homogeneity of items?

the more homogeneous the items are, the greater the reliability

How well did you know this?

Not at all

Perfectly

How’s reliability increased in relation to range of scores?

An unrestricted range of scores increases the reliability

How well did you know this?

Not at all

Perfectly

Does the ability to guess increase or decrease reliability?

It decreases reliability

How well did you know this?

Not at all

Perfectly

What is test-retest reliability?

When you administer the identical test to the same sample of people at two points in time

How well did you know this?

Not at all

Perfectly

Coefficient of stability is another name for what?

Test-retest reliability

How well did you know this?

Not at all

Perfectly

What’s parallel forms reliability?

It’s calculated by the scores obtained by the same groups of people on two roughly equivalent but not identical forms of the same test administered at two different points in time

How well did you know this?

Not at all

Perfectly

Coefficient of equivalence is another name for what?

Parallel forms reliability

How well did you know this?

Not at all

Perfectly

What type of reliability is when you look at the consistency of the scores within the test; therefore, the test is administered only once to one group of people

Internal consistency reliability

How well did you know this?

Not at all

Perfectly

What are the two subtypes of internal consistency?

Split-half reliability
Kuder Richardson (KR-20/21) or Coefficient Alpha

How well did you know this?

Not at all

Perfectly

What 1 of 4 reliability estimates is associated with the Spearman brown prophecy formula?

Internal consistency reliability in the subtype of split-half reliability

How well did you know this?

Not at all

Perfectly

What is subtype called from internal consistency reliability where it finds the average of every possible way of splitting the test in half

Kuder Richardson or Coefficient Alpha

How well did you know this?

Not at all

Perfectly

What reliability estimate looks at the degree of agreement between two or more scorers when a test is subjectively scored?

Interrater (interscorer) reliability

How well did you know this?

Not at all

Perfectly

What reliability estimate connects to percent agreement, Pearson r, Kappa stats or Yule’s Y

Interrater reliability

How well did you know this?

Not at all

Perfectly

If the test was perfectly reliable what would be the standard error of measurement score?

0.0

How well did you know this?

Not at all

Perfectly

What is the highest score of the standard error of measurement?

It would be equal to the standard deviation of the test

What are the 3 confidence intervals?

-1 & +1 = 68%
-2 & +2 = 95%
-3 & +3 = 99%

What connects to the meaningfulness, usefulness or accuracy of a measure in terms of predicting? Is it a good predictor?

Validity

Are the items on my test really measuring the right knowledge or skills to predict if someone's good for a job? Is this content, criterion or construct validity?

Content validity

Whether the test is valid or accurate in terms of predicting? Can we use our test to predict something? (e.g., sales potential or salesmen on my team) Is this content, criterion, or construct validity?

Criterion validity

Whether my instruments are measuring the trait that I think it's measuring (make a scale that measures assertiveness, but is it actually measuring assertiveness or is it measuring aggression? Is this content, criterion, or construct validity?

Construct validity

What type of validity, content vs construct, measures skills/knowledge or measures a trait

Content= skills and knowledge Construct = always a trait

What are the two subtypes of criterion-related validity?

1. concurrent (measuring the predictor & outcome at the same time) 2. predictive (delay between the measurement of predictor and the criterion)

The criterion-related validity coefficient is Rxy; what does x represent, and what does y represent?

x = test/predictor scores y= outcome/criterion scores

How do you find out how much the outcome (criterion variability) can be explained by the predictor if you have the criterion-related validity coefficient score (e.g., 0.50)

You square it (0.50 squared = 0.25 = 25%)

What's the maximum value of the standard error of the estimate?

It's the same as the standard deviation of the criterion

3 applications of the criterion related validity coefficient

1. Expectancy tables 2. Taylor Russel tables 3. Decision making theory

Which of the 3 applications lists the probability that a person's criterion score will fall in a specific range based on the range in which that person's predictor score fell? 1. Expectancy Tables 2. Taylor-Russel Tables 3. Decision making theory

1. Expectancy Tables

Which of the 3 applications numerically shows how much more accurate selection decisions are when a particular predictor test as opposed to not using one? 1. Expectancy Tables 2. Taylor-Russel Tables 3. Decision making theory

2. Taylor Russel Tables

What's the definition of a base rate? (Taylor Russel Table)

the rate of selecting successful employees without using a predictor test

What's the definition of a selection ratio? (Taylor Russel Table)

the proportion of available openings to available applicants (e.g., 3 openings with 10 applicants)

What's the definition of incremental validity (Taylor Russel Table)

it's the amount of improvement in the success rate that results from using a predictor test e.g., base rate = 0.40 (40%) to 0.65 (65%) incremental validity = 65% - 45% = 25% incremental validity = 25%

When is incremental validity optimized? (in connection to base rate & selection ratio)

1. Moderate base rate (around 0.5) 2. Low selection ratio (around 0.1)

Which of the 3 applications takes performance predictions based on predictor tests and compares them with the actual criterion outcome? 1. Expectancy Tables 2. Taylor-Russel Tables 3. Decision-making theory

3. Decision-making theory

Draw out a decision-making theory table: where does criterion/outcome go, where does predictor/test go, & where do false negative, true negative, false positive & true positive go?

Criterion/outcome = Y (up & down axis) Predictor/test = X (right to left axis) false negative = top left true negative = bottom left false positive = bottom right true positive = top right

If a company wanted to reduce false positives of a pregnancy test, what do you do to the predictor & criterion cut offs? 1. Increase both 2. Decrease both 3. Increase predictor & decrease criterion 4. Decrease predictor & increase criterion

3. Increase predictor & decrease criterion i.e., make the false positive box smaller in the decision-making theory table

What does cross-validation do to the criterion-related validity coefficient?

Results in shrinkage of the criterion-related validity coefficient

What does correction for attenuation calculate concerning validity and the reliability of predictor (x) & criterion (Y)?

How much higher validity would be if the predictor (X) and criterion (Y) were both perfectly reliable

What's it called when the criterion is subjectively scored, and the rater knows how people did on the predictor before assigning them to the criterion: (e.g., if teacher knows student's IQ scores before an assignment and it and then they give students with higher IQs higher grades on the assignment)

Criterion Contamination

What does criterion contamination result in? (in connection to criterion-related validity coefficient scores)

inflated/spuriously high criterion-related validity coefficient

What type of validity looks at how adequately a new test measures a trait?

Construct validity

Two subtypes of construct validity

1. Convergent validity - come together 2. Divergent (discriminant) validity - go apart

With convergent vs divergent validity, which measures the same trait and which measures different traits?

Convergent = comparing measures of the same trait Divergent = comparing measures of different traits

With convergent vs divergent validity, which one wants high correlations, and which one wants low correlations?

Convergent = high correlations Divergent = low correlations

With convergent vs divergent validity, which relates to high correlations of monotrait-heteromethod (same traits are measured by different methods) and which relates to low correlations of heterotrait-monomethod (dif traits are measured by same method)

1. Convergent = high correlations of monotrait-heteromethod 2. Divergent = heterotrait-monomethod

With standard error of measurement, where do you plot the person's score in relation to a bell shaped distribution and how do you figure out what the other numbers are in relation to +1, +2, +3, -1, -2, -3

The person's score goes in the middle (0) of the distribution & then add or subtract the standard error of measurement to the person's score (e.g. person's score = 70 & standard error of measurement = 5, then -1 would be 65 and +1 would be 75)

Cronbach's Alpha is a measure of...

Internal Consistency Reliability

Reliability is always highest when test items are (heterogenous/homogenous), subjects are (heterogenous/homogenous) and the number of items are (high or low)

test items = homogenous subjects = heterogenous # of items = high