Lecture 5 - Test Construction Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are the 11 steps for the construction of a test?

A
  1. Conceptual Foundation
  2. Purpose of the test
  3. table of specification
  4. select attributes of the construct
  5. select the population of interest
  6. develop administrative protocols
  7. item construction
  8. Pilot testing
  9. item analysis
  10. creating normative scores
  11. technical manual
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain step #1 of test construction

A

Conceptual Foundation
- asking questions
- why does the test need to exist?
- what is the purpose of the test?
- How will the material relate to the test?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Definition of: Domain of content

A

there should be a meaningful and logical connection between the tests and the items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain step #2 of test construction

A

Purpose of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain step #3 of test construction

A

Table of specification
- blueprint of the test
- what content should be covered
- based on theory, expertise or observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain step #4 of test construction

A

Select attributes of the construct
- Decide on the item of based on the table of specifications
- Often rely on subject matter expertise
- Can identify possible items and content via observations in a clinical setting
- Extreme behaviours can display the ends of an item scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Content Analysis

A

the prices of coming up with the content matter of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is step #5 of test construction

A

Select population of interest
- sampling: selection of elements following prescribed rules from a defined population
- population: the collection of elements sharing a defining characteristic
- elements are the test takers
- establish protocols (definitions of how to select individuals from a population)

1. Who should the sample consist of? 
2. How credible is this group as being representative of the population at interest? 
3. What obstacles may we encounter when obtaining our sample? 
4. How can we address or avoid the pre - mentioned obstacles?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

non probabilistic vs probabilistic sampling

A

Non-probabilistic sampling: individuals are selected based on some criteria, no defined probability of selecting a person
Ex. Student volunteers agreeing to take a test

Probabilistic sampling: each person has a nonzero chance of being selected and the selection process is random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is step #6 of test construction

A

Develop administrative protocols
- Define how the test will be administered
- How much time each person will be given?
- What will the mode of delivery?
- Pencil and paper?
- Online?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is step #7 of test construction

A

Item Construction
- a test is the sum of its parts, so good items make good tests
- many decisions along the way for item construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Item Guidelines for writing items (7)

A
  1. Define clearly what you want to measure – be specific (theory is important)
  2. Generate an item pool - avoid redundant items, larger the pool the better
  3. Select from this pool based on item-analysis (see below) on early results
  4. Use unitary items (items that test only one trait) – then the meaning of a response is clear
    (Avoid double-barrelled items that convey 2+ ideas at same time)
  5. Monitor reading level of difficulty – appropriate for those completing the scale
  6. Avoid long items – shorter items are more likely to be unitary
     Remember concept of internal-consistency reliability – if you’re testing multiple
    traits, test will be less reliable
  7. Break any response set – use reverse-scored items so test-takers don’t make same
    response for every item on a Likert scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Items should be sensitive to ethnic and cultural differences

A
  1. Avoid statements within the past tense
    1. Avoid using double negative wording
    2. Avoid using words with absolutes (only, just, always, none)
    3. Avoid statements that would be selected by everyone
    4. Avoid statements with multiple interpretations
    5. Keep language clear, direct and simple
    6. Keep the number of words in a question to less than 20
  2. Correct grammar is essential
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

define scaling

A

process of transforming and modifying the mathematical properties of an item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the types of formats (types) - selected responses?

A
  • Dichotomous
  • Polytoumous
  • Psychometric theory
  • Likert format
  • Category format
  • checklists and Q sorts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Dichotomous Format?

A

Test items format that offers 2 alternatives for each item
- simplest
- most common example of true/false tests
- requires absolute judgement
advantages: simplicity, ease of administration, ease of scoring
disadvantages: tends to be easier to select the correct response
○ Misses nuances by requiring only absolute answers
○ Increased probability of selecting the correct answer by chance
- to increase the reliability of a test, need many more dichotomous items compared to other item formats

17
Q

What is absolute judgment?

A

respondents must indicate with certainty that the item is 100% true or 100% false. No middle

18
Q

What is the Polytoumous Format?

A
  • items have more than two responses - - ex multiple-choice tests
  • Good balance of ease of administration and scoring without too much risk of selection by chance/issues of reliability
  • more popular than dichotomous tests in educational settings
  • Wrong answers on a polytomous item are called distractors
  • Choice of distractors is the largest difficulty on polytomous item
  • advantages: can cover a lot of info in a short amount of time, easy to administer, easy to score, probability of getting the right answer on chance alone lower than true/false tests
19
Q

What are distractors?

A

alternatives on an MC exam that aren’t correct, or for which no credit is given
- It’s rare/difficult to find good distractors that enhance the reliability of tests, often they
are used ineffectively
- Well-chosen distractors essential for a good test item
- Studies show that 3 alternatives and 5 alternatives have the same reliability and validity
- ‘cute’ distractors: ones obviously wrong, extremely unlikely to be chosen

20
Q

What is the Likert Format?

A

for attitude/personality scale items where subjects indicate degree of
agreement to statements using categories
- Strongly disagree, disagree, neutral, agree, strongly agree
- Likert like items such as not al all, not often, neutral, often, very often
-

21
Q

What is the Category Format?

A

rating-scale format that often uses categories 1 to 10
- ex. rate your pain on a scale of 1-10
- To avoid categorization problems from included stimuli, clearly defined scale, especially at endpoints (what is 1, what is 10) (show videos of a 10), etc
- Psychometric theory indicates less than four response options will reduce the reliability of the item and seven response options is the point where reliability begins to diminish
-

22
Q

Adjective Checklist

A
  • Alternative to dichotomous items is the adjective checklist
    • Common in personality measurement
    • List of adjectives are provided and responded is asked to circle which best describes themselves
    • Equivalent to asking a dichotomous question for each adjective
    • Take all adjectives and see if we can find a pattern in responses
      ○ Pattern that suggests a certain prototypical personality?
    • Examined through latent class analysis
23
Q

Q sorts

A
  • In a Q-sort, choices are listed on cards and individuals place cards in piles.
    • Can have anywhere from 5-10 piles (no official rules).
    • Piles are based on the degree they agree with the word on the card. Does this card reflect me and by how much?
    • The frequency of placed cards and the types of cards that paint a picture of the person.
    • Usually, the end piles (extremes) are most interesting
  • normally bell-shaped (4,5,6)
24
Q

Define Scaling

A

process of measuring objects in a way to maximizes precision, objectivity and communications
- Allows us to have a final score on a test
- Provides an operational framework for assigning numbers to objects and transforms qualitative constructs into measurable metrics
- This provides a visual and mathematical interpretation of the test

25
Q

What are the two types of scaling?

A
  1. Psychological scaling - focus on people
  2. Psychophysical scaling - focus on the stimulus
26
Q

what are the two types of psychological scaling

A
  1. Response Centered scaling
    - Responses are scaled to place a subject along a psychological continuum
    - Based on the strength of the psychological traits they possess
    - Often used most in item response theory
    The total score is not decided by the sum of the items, it is inferred based on the responses of items and the probability of where they are on a scale
    Scores are transformed from an ordinal scale to an interval or ratio scale
  2. Subject Centered Scaling
    - Subject obtains total score by summing/averaging all test items
    - Is a common form of scaling outside of item response theory and is based on classical test theory
    Requires all items to have additivity so the scores can be summated
27
Q

what is step #8 of test construction?

A

Pilot Testing
- Used to collect data
- Done through a pilot test - a small-scale study where a sample is selected to validate our test
- 2 main objectives
Obtain statistical information on items
Obtain comments to improve the test
- Collect a representative sample from the target population and administer the test

28
Q

what is step #9 of test construction?

A

Item Analysis
- Identification of the properties of items
- Difficulty - property indicating the probability of success/failture for each item
- For achievement tests, we can think of it in terms of the percentage of correct and incorrect responses.
- For other tests, it is the probability of selecting a given response based on some unknown threshold. This is a critical concept for the Psychometric Function and Item Response Theory

29
Q

What is the formula for optimal item difficulty?

A

Chance + (perfect - chance) / 2 = 2 optimal difficulty

Example:
A Dichotomous test (2 responses) has ½ = 0.5 probability by chance.
Then optimal difficulty = 0.5 + (1.0 – 0.5)/2 = 0.5 + 0.25 = 0.75

A four-response option item has ¼ = 0.25 probability by chance.
Then optimal difficulty = 0.25 + (1.0 – 0.25)/2 = 0.25 + 0.375 = 0.625

30
Q

Correction for chance formula

A
  • We can correct a score for guessing by subtracting from each score the expected number of choices by chance
31
Q
A