Chapter 2: Test Construction, Administration, and Interpretation Flashcards
How are tests constructed? (1)
Identify the Need
This could mean testing something not yet tested or improving a test that is already made.
How are tests constructed? (2) The Role of Theory
All tests are either implicitly/explicitly influenced or guided by the theory/theories held by the test constructer
A theory might yield some specific guidelines
Example: if a researcher though that depression was a disturbance in four specific areas (self-esteem, sleep quality, etc.) then this would dictate what test they make to measure depression
The theory may also be less explicit and not well formalized. The creation of a test is intrinsically related to the person doing the creating and to their theoretical views.
Even a tests said to be empirically developed (based on observations of real life behaviors) can be influenced by theory
How are tests constructed? (3) Practical Choices
What format will the items have?
Will they be true or false, multiple choice, or on a rating scale?
Will my instruments be designed for group administration?
How are tests constructed? (4) Pool of Items
The next step is to develop a table of specifications, much like the blueprint needed to construct a house. This table of specifications would indicate the subtopics to be covered by the proposed test.
The table of specifications may reflect the researchers thinking, theoretical notions in present literature, other tests on the topic, and thoughts of other experts.
The table of specifications can be formals, informal, or not present, but leads to the writing when present.
The items on a test reflect the constructor’s creativity or pertain to other researchers/literature. Writing good test questions is both a science and an art. Professionals know that they need to write a pool 4 or 5 times greater than the number they actually need.
How are tests constructed? (5) Tryouts and Refinement
The initial pool of items will probably be large and rather unrefined.
The intent of this step is to refine the pool of items to a smaller but usable pool.
pilot testing is used where a preliminary form is administered to a sample of subjects to determine whether there are any glitches
We may also do some preliminary statistical work and assemble the test for a trial run called a pretest.
Administer a test to two different groups and carry out item analyses to see which items in fact differentiate the two groups
Write down the items that are best and then preform a content analysis in which you sort them to determine which groups have too many and not enough questions in their category.
How are tests constructed? (6) Reliability and Validity
We need to establish that our measuring instrument is reliable, that is, consistent, and measures what we set out to measure, that is, the test is valid.
How are tests constructed? (7) Standardization of Norms
We need to standardize the instrument and develop norms. To standardize means that the administration, time limits, scoring procedures, and so on are all carefully spelled out so that no matter who administers the test, the procedure is the same.
Raw scores in psychology are often meaningless. We need to give meaning to raw scores by changing them into derived scores
We also need to be able to compare an individual’s performance on a test with the performance of a group of individuals; that information is what we mean by norms.
Simply because a sample is large, does not guarantee that it is representative. The sample should be representative of the population to which we generalize.
How are tests constructed? (8) Further Refinements
Some- times the changes reflect additional scientific knowledge, and sometimes societal changes, as in our greater awareness of gender bias in language
One type of revision that often occurs is the development of a short form of the original test.
Typically, a different author takes the original test, administers it to a group of subjects, and shows by various statistical procedures that the test can be shortened without any substantial loss in reliability and validity.
Psychologists and others are always on the lookout for brief instruments, and so short forms often become popular, although as a general rule, the shorter the test the less reliable and valid it is.
Still another type of revision that occurs fairly frequently comes about by factor analysis.
The factor analysis will tell you if all the items on the test are useful or if some should be thrown out because their contribution is minimal. It will also tell you if different aspects of the test should be scored together or separately.
Finally, there are a number of tests that are multivariate, that is the test is composed of many scales
The pool of items that comprises the entire test is considered to be an “open system” and additional scales are developed based upon arising needs.
What to avoid when writing test items
Biased Questions
Loaded Questions
Double-barreled Questions
Jargon
Double Negatives
Poor answer scale options
Biased Questions
Leading questions that sway people to answer one way or another.
Example: How great is our hard-working customer service team?
Loaded Question
Contains an assumption about a person’s
habits or perceptions.
Example: Where do you like to go to happy hour after work?
Double-barreled Questions
Asks multiple questions within one
item.
Example: Was the product easy to find and did you buy it?
Jargon
An item includes words, phrases, acronyms that the
person is not familiar with or doesn’t understand.
Example: The product helped me meet my OKRs.
Double Negatives
You need to use proper grammar.
Example: I don’t scarcely buy items online.
Poor answer scale options
Make sure you answer scales match
the content of your items. The should not be confusing or
unbalanced.
Example: How easy was it for you to complete the exam on time?
Answer: Yes | No
Types of Items
- Multiple-choice
- True-false
- Analogies
- Odd-man-out
- Sequences
- Matching
- Completion
- Fill-in-the-blank
- Forced choice items
10.Vignettes - Rearrangement or continuity
What are the incorrect items on a multiple choice test called?
Distractors
What are the correct items on a multiple choice test called?
Keyed response
What is the keyed response on tests with no definitive answer?
In tests that assess mental health and there is no correct answer the keyed response is the response that reflects what the test assesses. If you are measuring depression than the keyed response will be the choice that correlate with depression. “I feel withdrawn from others”
What are the advantages of a multiple choice test?
can be answered quickly so the test can include more items, can be scored quickly & inexpensively
What are the disadvantages of a multiple choice test?
better at assessing factual knowledge then problem-solving
When is the best time to use true or false?
when there is no right answer
Where are analogies usually found?
in tests of intelligents
What are matching tests good at?
assessing factual knowledge
What is a disadvantage of matching tests?
mismatching one item can affect other items and thus the questions are not independent
Where are completion tests usually found?
on personality tests
Where are forced choice test usually found?
personality tests
Have to pick one of a few options (I would rather spend time alone or I would rather spend time with friends)
What is a vignette?
A brief scenario, like the synopsis of a play or novel.
The subject is asked to react in some way to the vignette, perhaps by providing a story completion, choosing from a set of alternatives, or making some type of judgment.
What are the two categories of items?
Constructed-response items: subject is presented with a stimulus and produces a response
Example: essay exams or sentence completion
Selected-response items: subject selects the correct or best response from a list of options
Example: multiple choice
Objective test formats
One single response is labeled as “correct.”
Subjective test formats
There is not one single answer or response that is labeled as
“correct.”
How to decide which Item Format to Use?
Try to increase variation
If it is multiple choice then have many choices such as “strongly agree, agree, undecided, disagree, strongly disagree”
Use more items, a 10 items test can yield scores ranging from 0-10, if each item is scores 0-5 then raw scores can range from 10-50
Sequencing of Items
A plan is to use a spiral omnibus format, which involves a series of items from easy to difficult, followed by another series of items from easy to difficult, and so on
Some scales contain filler items that are not scored but are designed to “hide” the real intent of the scale