Test Construction and Factor Analysis Flashcards

1
Q

What are the 6 steps in test construction?

A
  1. Defining the Test’s Purpose
  2. Preliminary Design Issues
  3. Item Preparation
  4. Item Analysis
  5. Standardization and Ancillary Research
  6. Preparation of Final Materials and Publication

Polly Pocket Is A Silly Face

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is involved with defining a test’s purpose?

A

Statement of purpose: traits to be measured and target audience (e.g., “the WMS-R is an individually administered clinical instrument for appraising major dimensions of memory functions in adolescents and adults”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 6 components of preliminary design issues?

A
  1. Background research
  2. Mode of administration
  3. Length
  4. Number of scores
  5. Question and response format
  6. Administrator training

Rafael Ate Lard Sandwiches Timidly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is involved in background research (preliminary design issues)?

A

Literature search (theoretical and empirical), books. Ask yourself how have others defined the construct? Which assessments are already available? Subject matter experts: interview with individuals who study the phenomenon you are interested in, people who have specific knowledge about the construct, include different constituent groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the mode of administration options (preliminary design issues).

A

group vs individual; pen and paper, scantron, online, oral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the tradeoff involved with test length (preliminary design issues)?

A

tradeoff between reliability and efficiency, number of scores (constructs X dimensions within constructs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the different types of response format (preliminary design issues)?

A

self-report, informant rated, expert rated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the different ways a question can be formatted (preliminary design issues)?

A
  1. Dichotomously scored
  2. Likert scored
  3. Forced/multiple choice
  4. Graded response options
  5. Ranking
  6. Visual Analog Scale
  7. Open format
  8. Perfomance assessment
  9. Semi-structured interview
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does it mean for a question to be dichotomously scored?

A

(Woodworth in 1915 with Army Recruits; little effort on answerer, do not distinguish differences among people who answer yes or no; require little effort on the part of the person answering questions; do not distinguish differences among the people who answered yes or no; e.g. are you currently depressed yes vs no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does it mean for a question to be Likert scored?

A

(most widely used for assessing attitudes, beliefs, and feelings; agree or disagree on 5, 7 or 9 point scale; response options are typically symmetric or balanced because there are equal numbers of positive and negative positions; extended to assess frequency, importance, quality, and likelihood; a recent empirical study found that tiems with five or seven levels may produce slightly higher mean scores relative to the highest possible attainable score, compared to those produced form the use of 10 levels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a multiple choice question consist of? What is another name for it?

A

(typically used to test knowledge and ability; usually 4 or 5 responses, only one correct)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a graded response option question?

A

options (every option from any test question, in which options are viewed as progressively more severe [e.g., strongly agree vs agree]; some tests utilize options that are more explicit, such as Beck Depression Inventory and IQ tests)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a ranking question?

A

(respondents order elements from a group of objects, activities, characteristics, or conditions; advantage of forcing people to choose one over another—prevents ties among items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a visual analog scale?

A

(VAS is a method of capturing the degree or amount of some condition or attitude an individual has without the use of explicit numbers; described in 1921 and referred to as graphic rating method; actually measure distance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are components and concerns when using an open format question?

A

(benefit of not controlling response; risk of not asking about certain aspects of the construct being assessed that may be important; allows respondent to provide any information they wish; must be rated by researchers using a scoring key)
*constructed response item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a performance assessment and why would it be used?

A

(critically important to specifically define the target behaviors; identify the antecedent conditions that trigger the behavior; SME’s review the behaviors to make sure you are measuring the construct, e.g., customer service: sales associate must greet customer within 30s of customer entering the store)
*constructed response item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why do psychologists use semi-structured interviews? What are the two variance factors to consider when making these questions?

A

designed to help clinicians comprehensively assess the presence and absence of psychiatric symptoms; facilitate decisions about whether or not there were sufficient symptoms and impairment to render a formal diagnosis; designed to control information and streamline decision making:

  1. Information variance (i.e. what information is collected and considered in deciding whether or not someone was depressed)
  2. Criterion variance (i.e. how symptoms would be combined to reach a diagnosis). The goal is to ensure that every person making a diagnosis would ask exactly the same question about all of the relevant symptoms. Assumptions is that diagnoses are discrete entities—either you have it or you don’t.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two types of test items?

A
  1. Selected-response times (e.g., T/F, multiple choice, Likert format)
  2. Constructed-response items (open format [essay or oral responses] and performance assessments)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are some benefits of selected-response test items?

A
  1. Scoring reliability
  2. Temporal efficiency
  3. Scoring efficiency
20
Q

What are some benefits of constructed-response items?

A
  1. Behavioral observation
  2. Exploring/explaining on responses
  3. Development of study habits
21
Q

What are some considerations when training administrators (preliminary design issues)?

A

Scoring, Behavior assessment, Interviewing

22
Q

What are the four parts of a test considers in the item preparation step of test construction?

A
  1. Stimulus
  2. Response
  3. Conditions governing responses
  4. Scoring procedures
23
Q

What are some guiding principles when writing items?

A

deal with ONE central thought in each item, be precise, be brief, avoid awkward wording or dangling constructs, avoid irrelevant information, present items in positive language, avoid double negatives, avoid terms like all or none, avoid indeterminate terms like frequently or sometimes

24
Q

What are some item preparation recommendations?

A

prepare at least 2-3 times as many items as you intend to have. Review: spelling and grammar, content review, beware of stereotype

25
Q

What are the four ways to analyze an item?

A
  1. item tryout
  2. statistical analysis (item difficulty, item discrimination, distractor analysis)
  3. factor analysis
  4. item selection
26
Q

What are the two characteristics a perfect item has?

A
  1. People who know the answer would always choose the right answer
  2. People who do not know the answer would chose randomly among the possible responses
27
Q

What is item difficulty? How do you calculate it?

A

the percentage of students who took the test who answered items correctly; Item Difficulty (p) = # people correct/total

a p value is a behavioral measure, difficulty is a characteristic of the item and the sample, extreme p values restrict variability

28
Q

What is item discrimination? What are the two indices that determine discrimination?

A

determines how well that a single item on the test is measuring the same things the test itself; don’t want it too high, otherwise repetitive of there items; don’t want it 0 or negative because then it undermines the test
two indices to determine discrimination
1. Item discrimination index D
2. Discrimination coefficients

29
Q

How do you calculate item discrimination?

A

score and rank order; take 27% from both ends of distribution;
D=(# correct upper - # correct lower) / # people in largest group

30
Q

What does a D score tell us?

A
  1. 40+=Good; 0.30-0.39=Reasonably good; 0.20-0.29=Marginal; 0.19 and below=poor
    e. g., 7 people in an exam were the lowest 27% lowest scorers, and 27% of high scorers had 8 people. On one item, only 3 of lower scores got it right, and 7 of the high scores got it right: 7-3/9 (of the leftover people, 100%-54%)
31
Q

How do you know if a distractor is good and what are the guidelines when writing them?

A

obtain discrimination index for each option to determine usefulness of distractor: should be low and preferably negative; be cautious of large D values as well

create distractors that are plausible; make all of the alternatives parallel in length and grammatical structure; keep the alternatives short; don’t write distractors that mean the same thing; alternate the position of the correct answer within the distractors; use the alternatives “all of the above” and “none of the above” as little as possible; make sure each alternative agrees with the stem

32
Q

What are the three core questions of dimensionality considered in factor analysis?

A
  1. How many psychological attributes (i.e., dimensions) are reflected in the test’s items?
  2. If a test’s items reflect more than one dimension, then are those dimensions correlated with each other?
  3. If a test’s items reflect more than one dimension, then what are the dimensions?
33
Q

What are the three levels of dimensionality?

A
  1. Unidimensional tests: conceptual homogeneity; one single score;
  2. multidimensional tests with correlated dimensions: test with higher order factors and a variety of scores also have a total score which is the subtests combined with each other;
  3. multidimensional tests with uncorrelated dimensions: no total score is computed
34
Q

Why is dimensionality important?

A

implications for appropriate scoring, evaluation, and interpretation of test scores; for example, the number of dimensions (each dimension should be scored separately [one person might receive more than one score from a test]; each such score requires its own psychometric evaluation; each score would be interpretable in terms of the psychological dimensions underlying the score)

35
Q

What is factor analysis? What are the two types?

A

a statistical procedure often used to evaluate a test’s dimensionality 1. Exploratory FA 2. Confirmatory FA

36
Q

What is the purpose of EFA?

A

EFA is generally used to discover the factor structure of a measure and to examine its internal reliability; EFA is often recommended when researchers have no hypotheses about the nature of the underlying factor structure of their measure

37
Q

What are the three basic points in EFA?

A
  1. Decide the number of factors 2. Choosing an extraction method 3. Choosing a rotation method
38
Q

What are eigenvalues? How are they produced?

A

The total variance explained by a particular factor; produced by a process called principal components analysis (PCA) and represent the variance accounted for by each underlying factor; they are not represented by percentages but scores that total to the number of items (12-item scale with theoretically have 12 possible underlying factors, each factor will have an eigenvalue that indicates the amount of variation in the items accounted for by each factor; if the first factor has a eigenvalue of 3.0, it accounts for 25% of the variance (3/12=0/.25). The total of all the eigenvalues will be 12 if there are 12 items, so some factors will have similar eigenvalues)

39
Q

How do you decide the number of factors?

A

the most common approach to deciding the number of factors is to generate a scree plot (two-dimensional graph with factors on the x-axis and eigenvalues on the y-axis); on a scree plot, the eigenvalues are typically arranged in a scree plot in descending order

40
Q

What is a scree plot?

A

two-dimensional graph with factors on the x-axis and eigenvalues on the y-axis, eigenvalues usually arranged in depending order

41
Q

What is the Kaiser-Guttman Rule?

A

it states that the number of factors are equal to the number of factors with eigenvalues greater than 1.0; this approach is often not recommended because it tends to produce many factors

42
Q

What is factor extraction (EFA)?

A

once the number of factors are decided, the researcher runs another factor analysis to get the loadings for each of the factors; to do this, one has to decide which mathematical solution to use to find the loadings. There are about 5 basic extraction methods; regardless of the method chosen, the factor extraction will produce factor loadings for every item on every extracted factor; researchers hope their results will show a simple structure, with most items having a large loading on one factor but small loadings on other factors; Loading: how well an item correlates with a factor/dimension of construct

43
Q

What is a factor loading?

A

how well an item correlates with a factor/dimension of construct

44
Q

What is rotation? (EFA)

A

once initial solution is obtained, the loadings are rotated (it is a way of maximizing high loadings and minimizing low loadings so that the simplest possible structure is achieved; there are two basic types 1. Orthogonal (varimax, quartamax, equamax) assumes factors are not correlated 2. Oblique (oblimin, promax, direct quartimin) doesn’t make any assumptions

45
Q

What are the two basic types of item rotation?

A
  1. Orthogonal (varimax, quartamax, equamax) assumes factors are not correlated
  2. Oblique (oblimin, promax, direct quartimin) doesn’t make any assumptions
46
Q

How are EFA and CFA similar?

A

Both EFA and CFA are used to investigate the theoretical constructs, or factors, that might be represented as a set of items; either can be assumed the factors are uncorrelated, or orthogonal; both are used to assess the quality of individual items

47
Q

How are EFA and CFA different?

A
  1. With EFA, reserachers usually decide on the numbers of factors by examining output from a principal components analysis (i.e., eigenvalues are used). With CFA, the researchers must specify the number of factors a priori
  2. CFA requires that a particular factor structure be specified, in which the resracher indicates which item load on which factor. EFA allows all items to load on all factors
  3. CFA provides a fit of the hypothesized factor structure to the observed data
  4. in CFA, researchers typically use maximum likelihood to estimate factor loadings, whereas maximum liklihood is only one of a variety of estimators used with EFA