Test Construction and Factor Analysis Flashcards
What are the 6 steps in test construction?
- Defining the Test’s Purpose
- Preliminary Design Issues
- Item Preparation
- Item Analysis
- Standardization and Ancillary Research
- Preparation of Final Materials and Publication
Polly Pocket Is A Silly Face
What is involved with defining a test’s purpose?
Statement of purpose: traits to be measured and target audience (e.g., “the WMS-R is an individually administered clinical instrument for appraising major dimensions of memory functions in adolescents and adults”)
What are the 6 components of preliminary design issues?
- Background research
- Mode of administration
- Length
- Number of scores
- Question and response format
- Administrator training
Rafael Ate Lard Sandwiches Timidly
What is involved in background research (preliminary design issues)?
Literature search (theoretical and empirical), books. Ask yourself how have others defined the construct? Which assessments are already available? Subject matter experts: interview with individuals who study the phenomenon you are interested in, people who have specific knowledge about the construct, include different constituent groups
Describe the mode of administration options (preliminary design issues).
group vs individual; pen and paper, scantron, online, oral
What is the tradeoff involved with test length (preliminary design issues)?
tradeoff between reliability and efficiency, number of scores (constructs X dimensions within constructs)
What are the different types of response format (preliminary design issues)?
self-report, informant rated, expert rated
What are the different ways a question can be formatted (preliminary design issues)?
- Dichotomously scored
- Likert scored
- Forced/multiple choice
- Graded response options
- Ranking
- Visual Analog Scale
- Open format
- Perfomance assessment
- Semi-structured interview
What does it mean for a question to be dichotomously scored?
(Woodworth in 1915 with Army Recruits; little effort on answerer, do not distinguish differences among people who answer yes or no; require little effort on the part of the person answering questions; do not distinguish differences among the people who answered yes or no; e.g. are you currently depressed yes vs no)
What does it mean for a question to be Likert scored?
(most widely used for assessing attitudes, beliefs, and feelings; agree or disagree on 5, 7 or 9 point scale; response options are typically symmetric or balanced because there are equal numbers of positive and negative positions; extended to assess frequency, importance, quality, and likelihood; a recent empirical study found that tiems with five or seven levels may produce slightly higher mean scores relative to the highest possible attainable score, compared to those produced form the use of 10 levels)
What does a multiple choice question consist of? What is another name for it?
(typically used to test knowledge and ability; usually 4 or 5 responses, only one correct)
What is a graded response option question?
options (every option from any test question, in which options are viewed as progressively more severe [e.g., strongly agree vs agree]; some tests utilize options that are more explicit, such as Beck Depression Inventory and IQ tests)
What is a ranking question?
(respondents order elements from a group of objects, activities, characteristics, or conditions; advantage of forcing people to choose one over another—prevents ties among items)
What is a visual analog scale?
(VAS is a method of capturing the degree or amount of some condition or attitude an individual has without the use of explicit numbers; described in 1921 and referred to as graphic rating method; actually measure distance)
What are components and concerns when using an open format question?
(benefit of not controlling response; risk of not asking about certain aspects of the construct being assessed that may be important; allows respondent to provide any information they wish; must be rated by researchers using a scoring key)
*constructed response item
What is a performance assessment and why would it be used?
(critically important to specifically define the target behaviors; identify the antecedent conditions that trigger the behavior; SME’s review the behaviors to make sure you are measuring the construct, e.g., customer service: sales associate must greet customer within 30s of customer entering the store)
*constructed response item
Why do psychologists use semi-structured interviews? What are the two variance factors to consider when making these questions?
designed to help clinicians comprehensively assess the presence and absence of psychiatric symptoms; facilitate decisions about whether or not there were sufficient symptoms and impairment to render a formal diagnosis; designed to control information and streamline decision making:
- Information variance (i.e. what information is collected and considered in deciding whether or not someone was depressed)
- Criterion variance (i.e. how symptoms would be combined to reach a diagnosis). The goal is to ensure that every person making a diagnosis would ask exactly the same question about all of the relevant symptoms. Assumptions is that diagnoses are discrete entities—either you have it or you don’t.
What are the two types of test items?
- Selected-response times (e.g., T/F, multiple choice, Likert format)
- Constructed-response items (open format [essay or oral responses] and performance assessments)
What are some benefits of selected-response test items?
- Scoring reliability
- Temporal efficiency
- Scoring efficiency
What are some benefits of constructed-response items?
- Behavioral observation
- Exploring/explaining on responses
- Development of study habits
What are some considerations when training administrators (preliminary design issues)?
Scoring, Behavior assessment, Interviewing
What are the four parts of a test considers in the item preparation step of test construction?
- Stimulus
- Response
- Conditions governing responses
- Scoring procedures
What are some guiding principles when writing items?
deal with ONE central thought in each item, be precise, be brief, avoid awkward wording or dangling constructs, avoid irrelevant information, present items in positive language, avoid double negatives, avoid terms like all or none, avoid indeterminate terms like frequently or sometimes
What are some item preparation recommendations?
prepare at least 2-3 times as many items as you intend to have. Review: spelling and grammar, content review, beware of stereotype
What are the four ways to analyze an item?
- item tryout
- statistical analysis (item difficulty, item discrimination, distractor analysis)
- factor analysis
- item selection
What are the two characteristics a perfect item has?
- People who know the answer would always choose the right answer
- People who do not know the answer would chose randomly among the possible responses
What is item difficulty? How do you calculate it?
the percentage of students who took the test who answered items correctly; Item Difficulty (p) = # people correct/total
a p value is a behavioral measure, difficulty is a characteristic of the item and the sample, extreme p values restrict variability
What is item discrimination? What are the two indices that determine discrimination?
determines how well that a single item on the test is measuring the same things the test itself; don’t want it too high, otherwise repetitive of there items; don’t want it 0 or negative because then it undermines the test
two indices to determine discrimination
1. Item discrimination index D
2. Discrimination coefficients
How do you calculate item discrimination?
score and rank order; take 27% from both ends of distribution;
D=(# correct upper - # correct lower) / # people in largest group
What does a D score tell us?
- 40+=Good; 0.30-0.39=Reasonably good; 0.20-0.29=Marginal; 0.19 and below=poor
e. g., 7 people in an exam were the lowest 27% lowest scorers, and 27% of high scorers had 8 people. On one item, only 3 of lower scores got it right, and 7 of the high scores got it right: 7-3/9 (of the leftover people, 100%-54%)
How do you know if a distractor is good and what are the guidelines when writing them?
obtain discrimination index for each option to determine usefulness of distractor: should be low and preferably negative; be cautious of large D values as well
create distractors that are plausible; make all of the alternatives parallel in length and grammatical structure; keep the alternatives short; don’t write distractors that mean the same thing; alternate the position of the correct answer within the distractors; use the alternatives “all of the above” and “none of the above” as little as possible; make sure each alternative agrees with the stem
What are the three core questions of dimensionality considered in factor analysis?
- How many psychological attributes (i.e., dimensions) are reflected in the test’s items?
- If a test’s items reflect more than one dimension, then are those dimensions correlated with each other?
- If a test’s items reflect more than one dimension, then what are the dimensions?
What are the three levels of dimensionality?
- Unidimensional tests: conceptual homogeneity; one single score;
- multidimensional tests with correlated dimensions: test with higher order factors and a variety of scores also have a total score which is the subtests combined with each other;
- multidimensional tests with uncorrelated dimensions: no total score is computed
Why is dimensionality important?
implications for appropriate scoring, evaluation, and interpretation of test scores; for example, the number of dimensions (each dimension should be scored separately [one person might receive more than one score from a test]; each such score requires its own psychometric evaluation; each score would be interpretable in terms of the psychological dimensions underlying the score)
What is factor analysis? What are the two types?
a statistical procedure often used to evaluate a test’s dimensionality 1. Exploratory FA 2. Confirmatory FA
What is the purpose of EFA?
EFA is generally used to discover the factor structure of a measure and to examine its internal reliability; EFA is often recommended when researchers have no hypotheses about the nature of the underlying factor structure of their measure
What are the three basic points in EFA?
- Decide the number of factors 2. Choosing an extraction method 3. Choosing a rotation method
What are eigenvalues? How are they produced?
The total variance explained by a particular factor; produced by a process called principal components analysis (PCA) and represent the variance accounted for by each underlying factor; they are not represented by percentages but scores that total to the number of items (12-item scale with theoretically have 12 possible underlying factors, each factor will have an eigenvalue that indicates the amount of variation in the items accounted for by each factor; if the first factor has a eigenvalue of 3.0, it accounts for 25% of the variance (3/12=0/.25). The total of all the eigenvalues will be 12 if there are 12 items, so some factors will have similar eigenvalues)
How do you decide the number of factors?
the most common approach to deciding the number of factors is to generate a scree plot (two-dimensional graph with factors on the x-axis and eigenvalues on the y-axis); on a scree plot, the eigenvalues are typically arranged in a scree plot in descending order
What is a scree plot?
two-dimensional graph with factors on the x-axis and eigenvalues on the y-axis, eigenvalues usually arranged in depending order
What is the Kaiser-Guttman Rule?
it states that the number of factors are equal to the number of factors with eigenvalues greater than 1.0; this approach is often not recommended because it tends to produce many factors
What is factor extraction (EFA)?
once the number of factors are decided, the researcher runs another factor analysis to get the loadings for each of the factors; to do this, one has to decide which mathematical solution to use to find the loadings. There are about 5 basic extraction methods; regardless of the method chosen, the factor extraction will produce factor loadings for every item on every extracted factor; researchers hope their results will show a simple structure, with most items having a large loading on one factor but small loadings on other factors; Loading: how well an item correlates with a factor/dimension of construct
What is a factor loading?
how well an item correlates with a factor/dimension of construct
What is rotation? (EFA)
once initial solution is obtained, the loadings are rotated (it is a way of maximizing high loadings and minimizing low loadings so that the simplest possible structure is achieved; there are two basic types 1. Orthogonal (varimax, quartamax, equamax) assumes factors are not correlated 2. Oblique (oblimin, promax, direct quartimin) doesn’t make any assumptions
What are the two basic types of item rotation?
- Orthogonal (varimax, quartamax, equamax) assumes factors are not correlated
- Oblique (oblimin, promax, direct quartimin) doesn’t make any assumptions
How are EFA and CFA similar?
Both EFA and CFA are used to investigate the theoretical constructs, or factors, that might be represented as a set of items; either can be assumed the factors are uncorrelated, or orthogonal; both are used to assess the quality of individual items
How are EFA and CFA different?
- With EFA, reserachers usually decide on the numbers of factors by examining output from a principal components analysis (i.e., eigenvalues are used). With CFA, the researchers must specify the number of factors a priori
- CFA requires that a particular factor structure be specified, in which the resracher indicates which item load on which factor. EFA allows all items to load on all factors
- CFA provides a fit of the hypothesized factor structure to the observed data
- in CFA, researchers typically use maximum likelihood to estimate factor loadings, whereas maximum liklihood is only one of a variety of estimators used with EFA