Prelim 2 prep Flashcards
What are the differences between True Score Theory, Generalizability Theory, and Item Response Theory?
What is the standard error of measurement?
What are confidence intervals and what do they tell us?
When confidence intervals increase in terms of percentage (i.e., 90% vs 95%), what does that do
to the range of scores it comprises?
What does it mean if a test is valid?
What are the three main categories of validity?
What is: content validity, criterion-related validity, construct validity, ecological validity, external
validity, face validity?
What is the content validity ratio and how is it used to determine content validity of test items?
If more than half the panelists of experts say an item is essential, has content validity, CVR 0 is half, negative if fewer than half
Name the three characteristics of a criterion
Relevant, valid, uncontaminated
What does it mean for a criterion to be uncontaminated?
Independent- independent group of raters decides who is good and who isn’t, then correlate that with test scores
Define concurrent validity and predictive validity
Concurrent- degree to which a test score is related to some criterion measure obtained at the same time
Predictive- degree to which a test score predicts a criterion measure
What are false negatives, false positives, specificity, and sensitivity?
False negative- test predicts someone doesn’t possess a trait and they do
False positive- test says someone has a trait and they don’t
Specificity- perfect wouldn’t mistakenly identify as someone having a trait when they don’t
Sensitivity- perfect identify all people who have the trait
What is incremental validity and what would be proof of its existence?
Extent to which adding a second or third predictor gives more information about a criterion
proof??
What is construct validity?
Extent to which a test measures a construct we are examining
Name a describe the several ways in which you can find evidence for construct validity.
- Homogenous
- Evidence changes w age
- Test scores change w experience
- Distinct groups score differently
- Convergent evidence between two tests measuring the same construct
What is the difference between convergent and concurrent validity?
What is a factor analysis and how does an exploratory factor analysis differ from a confirmatory
one?
?
Exploratory- estimating or extracting factors, deciding how many to retain, rotating to an interpretable orientation ????
Confirmatory- degree to which a hypothetical model fits the data
Name and define the different types of rating error that can occur
Leniency error- arises from tendency on part of rater to be lenient
Severity- opposite
Central tendency- rater doesn’t use extreme ends of scale
Halo effect- seeing people well no matter what
What is test utility?
Usefulness or practical value of testing to improve efficiency, use in a particular situation helps us make better decisions
What are some of the costs of administering a test, and what are some costs of NOT
administering one?
Administering:
- buying
- supply of blank test protocols
- computer program to score the test
- paying to score the test
- hiring people to administer the test
- costs of doing business
Not administering:
- loss of confidence as an ultimate cost of the company???
- missing a child abuser
- failing to diagnose when someone underreports on an interview???
Keep in mind the real-life example I discussed about how to think about the cost of testing when
doing evaluations.
????
What are the Taylor Russel tables used for and what three variables are considered when using
them to decide if giving a test is “worth it.”
- COME BACK
Be able to name a few other tables (i.e., Naylor-Shine) and have a basic sense of how they work
(they could be multiple-choice option for instance)
COME BACK
Name some different ways cut scores are determined.
COME BACK
What’s the difference between a fixed and relative cut score?
Relative- actual score you need to meet a criteria changes
Fixed- always the same
What is pilot work and why is it used?
Preliminary research surrounding creation of prototype of test, experiment with test items
Name some ways scales are graded?
Age based
Grade based
Unidimensional v multidimensional??
Categorical v dimensional
What are some scaling methods – and remember, they can overlap – so a categorical scale can
be graded “summatively,” etc.
Rating
Summative
Paired comparisons
Sorting tasks
Categorical scaling
Guttman scale
????
What is the empirical vs analytical way of writing test items?
Analytical- write test questions you think will measure the qualities you want to measure
Empirical- find people with a problem, ask different types of questions, see how they respond
Why would we want to find seemingly arbitrary items for use of distinguishing one group for
another (in other words, non-face-valid ones)
?
Name some different ways items can be formatted.
Selected response
Constructed response
Computerized adaptive testing?
What is computerized adaptive testing, and how does item branching work?
?
Adds or deletes a branch depending on performance (meaning what)?
Name and describe a few different ways in which items can be scored
Cumulative
Class/category scoring
Ipsative scoring- comparing one score on one scale to another scale within the same test ?
What are the following: item-difficulty index, item endorsement index, item reliability index,
item discrimination index
?
How are item characteristic curves useful?
?
Extra paper
almost done