Test Development Flashcards
Define
Test revision
Action taken to modify a test’s content or format for the purpose of improving the test’s effectiveness as a tool of assessment
Definition
a plan of the number and type of items that are required for a test, as indicated in the test specification
Plan for item writing
Which item would be the best to remove?

Item 3
How do you go about forming a test concept?
- Review existing tests
- Review literature regarding existing tests
- Review the need for a test
- Decide to develop/adapt a test
Define
Rasch model
a model that relates the probability of response of a particular sort (e.g. right/wrong) to the difference between a person’s standing on a latent variable and the difficulty of the item
What are the advantages of a Likert scale?
- Degree of trait can be measured
- Lots of information
- Easy to use and administer
- Works best with strong (but not extreme) statements
What sort of factor analysis would you use when number of factors is known?
Confirmatory factor analysis (CFA)
Define
Model of measurement
the formal statement of observations of objects mapped to numbers that represent relationships among the objects
Definition
The decrease in item validities that inevitably occurs after cross-validation
Validity shrikage
Definition
the assignment of numbers to objects according to a set of rules for the purpose of quantifying an attribute
Measurement
Define
Plan for item writing
a plan of the number and type of items that are required for a test, as indicated in the test specification
Definition
Action taken to modify a test’s content or format for the purpose of improving the test’s effectiveness as a tool of assessment
Test revision
Definition
the extent to which the score on an item correlates with an external criterion relevant to the attribute or construct that is the subject of test construction
Item validity
Definition
a way of constructing psychological tests that relies on collecting and evaluating data about how each of the items from a pool of items discriminated between groups of respondents who are thought to show or not show the attribute the test is to measure; also an approach to personality that relates the reports that people make about their characteristic behaviours to their social functioning and thereby provide tools for personality prediction
Empirical approach
Give an example of a double-barrelled item
e.g. I support civil rights because discrimination is a crime against God.
Define
Classical test theory
the set of ideas, expressed mathematically and statistically, that grew out of attempts in the first half of the twentieth century to measure psychological variables; and that turns on the central idea of a score on a psychological test comprising both true and error score components
Definition
a way of constructing psychological tests that relies on both reasoning from what is known about the psychological construct to be measured in the test, and collecting and evaluating data about how the test and the items that comprise it actually behave when administered to a sample of respondents
Rational-empirical approach
What are the disadvantages of a written/essay format test?
- Narrow content
- Bluffing possible
- Hiding behind good writing
- Time consuming scoring
- Inter-rater reliability issues
Definition
the percentage of the total group that got the item correct
Optimal item-difficulty index
Which item is better? Why?
‘I get tired after soccer’ vs. ‘I get tired after exercise’
‘I get tired after exercise’
Define
Construct
a specific idea or concept about a psychological process or underlying trait that is hypothesised on the basis of a psychological theory
Define
Optimal item-difficulty index
the percentage of the total group that got the item correct
Why is it is often recommended to have the initial item poor reviewed by experts prior to administering the questionnaire to the target sample?
- Confirm or invalidate your definition of the construct (by asking how relevant is each item to what you intend to measure).
- Evaluate the items clarity and conciseness
- Identify other items that you have failed to include
Define
Test manual
the document that accompanies a psychological test and that records the way in which the test was developed, how the test is to be administered (including the groups for which it is relevant), information on the reliability and validity of the test when used for use for specific purposes, and norms for test interpretation
Definition
a family of theories that specifies the functional relationship between a response to a single test item and the strength of the underlying latent trait
Item response theory (IRT)
Define
Item characteristic curve
the term for a trace line in item response theory
Definition
the formal statement of observations of objects mapped to numbers that represent relationships among the objects
Model of measurement
Definition
the various forms the content of a psychological test can take
Item
Definition
Test is administered to a representative sample of test-takers under conditions that stimulate the conditions that the final version of the test will be administered under
Test tryout
These questions are based on which test assumption?
- What is the test designed to measure?
- Is there a need for this test?
- What content will the test cover?
Psychological traits exist
Definition
any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice
Cross validation
Define
Attribute
the consistent set of behaviours, thoughts or feelings that is the target of a psychological test
What sort of factor analysis would you use to identify a manageable number of factors to extract?
exploratory factor analysis (EFA)
Define
Item
the various forms the content of a psychological test can take
Definition
refers to how many attributes a dataset has
Dimensionality
What is a paired comparision?
- Test-taker has to choose one of two options (e.g., a statement, object, picture) on the basis of some rule
- The value (e.g., 1 or 0) of each option in each paired comparison is determined by judges prior to test administration
Define
Latent trait
the hypothesised continuously and normally distributed dimension of individual differences that is the sole source of a consistent set of observable behaviours, thoughts and feelings, which is the target of a psychological test
Define
Validity shrikage
The decrease in item validities that inevitably occurs after cross-validation
These questions are based on which test assumption?
- Who benefits from this test?
- Is there any potential for harm?
Testing/assessment can be fair and benefit society
Definition
the consistent set of behaviours, thoughts or feelings that is the target of a psychological test
Attribute
What are some item writing guidelines for test construction?
- Write items using straight forward language that is appropriate for the reading level of the population
- Avoid double barrelled items
- Avoid slang and colloquial expressions that may quickly become obsolete
- Consider if using positively and negatively worded items is a good idea
- Write items that majority of respondents can respond to appropriately
- Ask about sensitive issues using straightforward and nonjudgemental language
- Choose the item response carefully
*
Define
Item analysis
the process of studying behaviour of items when administered to a group of respondents, usually with a view to the selection of some of the items to form a psychological test
Definition
the set of ideas, expressed mathematically and statistically, that grew out of attempts in the first half of the twentieth century to measure psychological variables; and that turns on the central idea of a score on a psychological test comprising both true and error score components
Classical test theory
Definition
the extent to which items on a test represent the universe of behaviour the test was designed to measure
Content validity
What is the aim of item development during test construction?
The researcher aims to generate an item pool with good content validity
Define
Differential item functioning
the possibility that a psychological test item may behave differently for different groups of respondents
Definition
a model that relates the probability of response of a particular sort (e.g. right/wrong) to the difference between a person’s standing on a latent variable and the difficulty of the item
Rasch model
Define
Likert scale
a graphical scale originally with five points used by a respondent to represent the strength of an underling attitude or emotion
Definition
a graphical scale originally with five points used by a respondent to represent the strength of an underling attitude or emotion
Likert scale
Definition
a specific idea or concept about a psychological process or underlying trait that is hypothesised on the basis of a psychological theory
Construct
Define
Dimensionality
refers to how many attributes a dataset has
Define
Content validity
the extent to which items on a test represent the universe of behaviour the test was designed to measure
What does item-discrimation index tell you?
Does the item separate ‘high’ and ‘low’ scorers?
What are the 5 broad steps to test development?
Test conceptualisation
Test construction
Test tryout
Item analysis
Test revision
What does oblique rotation assume?
assumes factors are correlated
Definition
the document that accompanies a psychological test and that records the way in which the test was developed, how the test is to be administered (including the groups for which it is relevant), information on the reliability and validity of the test when used for use for specific purposes, and norms for test interpretation
Test manual
What are the disadvantages of a binary scale?
- Allows guessing (T/F)
- Only suits content where a dichotomous response can be made
- Content not as rich
What types of questions should you ask yourself during test conceptualisation?
- What is the test designed to measure?
- What is the objective of the test?
- Is there a need for the test?
- Who will use the test?
- Who will take the test?
- What content will the test cover?
- How will the test be administered?
- What is the ideal format of the test?
- Should more than one form of the test be developed?
- What special training will be required of test users?
- What types of responses will be required of test takers?
- Who benefits from this test?
- Is there any potential for harm in developing this test?
- How will meaning be attributed to scores on this test?
Define
Test tryout
Test is administered to a representative sample of test-takers under conditions that stimulate the conditions that the final version of the test will be administered under
Define
Test construction
A stage in the process of test development that entails writing test items (or rewriting or revising existing items), as well as formatting ideas, setting scoring rules,and otherwise designing and building a test
True or False
Over inclusion of items during test construction is recommended
True
What are the disadvantages of a Likert scale?
- Number of response options need to be considered
- Odd vs even number of responses
What does orthogonal rotation assume?
assumes factors are uncorrelated
Definition
A stage in the process of test development that entails writing test items (or rewriting or revising existing items), as well as formatting ideas, setting scoring rules,and otherwise designing and building a test
Test construction
Define
Empirical approach
a way of constructing psychological tests that relies on collecting and evaluating data about how each of the items from a pool of items discriminated between groups of respondents who are thought to show or not show the attribute the test is to measure; also an approach to personality that relates the reports that people make about their characteristic behaviours to their social functioning and thereby provide tools for personality prediction
Define
Rational-empirical approach
a way of constructing psychological tests that relies on both reasoning from what is known about the psychological construct to be measured in the test, and collecting and evaluating data about how the test and the items that comprise it actually behave when administered to a sample of respondents
Define
Test conceptualisation
the first stage of test development where the idea for a test begins
Definition
the use of factor analysis inductively to identify the factor structure of a set of variables
Exploratory factor analysis
These questions are based on which test assumption?
- What is the ideal format of the test?
- What types of responses will be required of test takers?
Traits/states can be measured
What item properties might be investigated during item analysis?
- Item difficulty/distribution
- Dimensionality (i.e. factor analysis)
- Item reliability
- Item validity
- Item discrimination
Definition
a graph of the probability of response to an item as a function of the strength of or position on a latent trait
Trace line
What can factor anaylsis provide?
- Determine the number of underlying latent variables or constructs
- Help condense information
- Define the content or meaning of the factors
- Helps identify items that are performing better or worse
- Items that do not fit into any factor, or those that fit into more than one can be considered for elimination
When looking at item distributions, what are the characteristics of items that should be flagged for removal?
- Consider removing items with a highly skewed distribution
- These are items that virtually everyone answers in the same way
- Item conveys little information
- Limited variability so will correlate weakly with other items (impacts on FA).
- Keep items with a high variance/distribution
- Likely to discriminate between the different level of the construct
- Keep items with a mean close to the centre of the range of possible scores
Define
Test specification
a written statement of the attribute or construct that the test constructor is seeking to measure and the conditions under which it will be used
Definition
the first stage of test development where the idea for a test begins
Test conceptualisation
Definition
a written statement of the attribute or construct that the test constructor is seeking to measure and the conditions under which it will be used
Test specification
In what ways do tests ‘age’?
- Domains change
- Interpretations change
- The stimuli age
- Certain words change in their meaning
- Test norms become outdated
- Theories behind the test change
What must be considered when deciding on the optimal item-difficulty index?
The probability of guessing correctly is taken into account when deciding the optimal item-difficulty index.

Definition
the possibility that a psychological test item may behave differently for different groups of respondents
Differential item functioning
Define
Item response theory (IRT)
a family of theories that specifies the functional relationship between a response to a single test item and the strength of the underlying latent trait
What are the advantages of a binary scale?
- Easy to construct
- Easy to score
- Quick to administer
- Large number of questions
Definition
the process of studying behaviour of items when administered to a group of respondents, usually with a view to the selection of some of the items to form a psychological test
Item analysis
Definition
the hypothesised continuously and normally distributed dimension of individual differences that is the sole source of a consistent set of observable behaviours, thoughts and feelings, which is the target of a psychological test
Latent trait
Definition
the term for a trace line in item response theory
Item characteristic curve
Define
Trace line
a graph of the probability of response to an item as a function of the strength of or position on a latent trait
Define
Exploratory factor analysis
the use of factor analysis inductively to identify the factor structure of a set of variables
Define
Item validity
the extent to which the score on an item correlates with an external criterion relevant to the attribute or construct that is the subject of test construction
These questions are based on which test assumption?
- How will meaning be attributed to scores on this test?
Test behaviour is predictive
Define
Cross validation
any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice
What are the advantages of a written/essay format test?
- Complex, imaginative or original knowledge
- Written communication
- Information generated not recognised
Define
Measurement
the assignment of numbers to objects according to a set of rules for the purpose of quantifying an attribute
What does cross-validation tell you?
Is the test applicable to this population?
These questions are based on which test assumption?
- What is the ideal format of the test?
- What types of responses will be required of test takers?
Tests have strengths /weaknesses /error