Zumbo 2013 Flashcards
Define a construct:
A construct may be conceived of as a concept or a mental representation of shared attributes or characteristics, and it is assumed to exist because it gives rise to observable or measureable phenomena.
Name the 6 Measurement Theories
Observed Score approaches include: 1. classical test theory and 2. generalizability theory
Latent variable approaches include 3. factor analytic theory, 4. item response theory (IRT), 5. Rasch Theory and 6. Mixture Models
- CTT - True Score Theory
is based on the decomposition of observed scores (x) into true Scores (T) and Error Scores (E). X = T + E. Only produces a single estimate of reliability and standard error of measurement.
- Generalizability Theory
is an outreach of CTT because it is often used to decompose the E into different facets or sources ie) error results from items selected, raters used, gender of test administrator. Unpacking the error E one redefines the true score T
- Factor Analytic Theory
It is based on computational tools and statistical modeling strategy , model estimation methods and fit statistics.
- IRT - Item Response Theory
Focuses on the range of latent ability )theta) and the characteristics of items at various points along the continuum of this ability. IRT produces estimates across the range of the latent variable measured by a test.
- Rasch Theory
Can be characteristed as IRT with only one item parameter, item difficulty. It has a guessing parameter of zero and an item discrimination parameter value of 1 for all items.
There are 4 primary approaches to the development of scales and measures
- Rational-Theoretical
- Factor Analytic
- Emperical Criterion keyed
- Projective
Rational Theoretical
- the researcher uses either theory or an intuitive common sense approach to developing items for a test and is the most commonly used approach. Expert opinion forms the basis for the development and selection of items.
- The most common
- Expert opinion forms the basis for the development and selection of items.
Factor Analytic Approach
Items are selected on the basis of whether they load on a factor and a statistical rule forms the basis for the development and selection of items. Many large personality inventories have been developed using this approach.
- I want to derive the items from a large pool to determine which items for which to factor.
- Second most commonly used
- Most tests today use some combination of the rational-theoretical and factor analytic approaches.
Emperical Criterion Approach
- items are selected if they can discriminate the group of interest from a control group, is not frequently used today in the development of measures. MMPI
Do these items discriminate between items? Very effective in prediction.
Projective Approach
not many new tests use this. The basic idea behind it is to use ambiguous stimuli, inkblots, pictures and have individuals create their own drawings, and they will project their own concerns, fears, attitudes and beliefs onto the drawing.
Not popular in NA, used a lot in Europe
2 Types of Tests
- By field of study ie) personality, intelligence, achievement, aptitude
- by general administration procedures: individual tests administered one on one or group tests.
- by general type of info gathered: a) self report personality test, b) performance or task - IQ, class exam or c) observational
- By purpose and scoring / interpretations Norm Referenced and Criterion Tests
- Maximum Performance Tests measure howe well an individual performs under standard conditions when exerting maximal errort and are presumed to include measures such as intelligence test and achievement tests. ie) performance or task related tests.
- Typical Response tests measures an individual’s response in a situation and are presumed to include measures such as personality tests and attitude scales. ie) self report and observational tests
Another two types of tests
- Norm Referenced Tests: compares an individuals performance on a test with a predefined population or normative group. Items selected to have average difficulty levels and high discrimination between low and high scorers on the test. Interpretation of test scores are based on percentiles, standard scores and grade equivalent scores. Scores are based on how an individual scored relative to the normative group but gives little information on the persons knowledge of, performance on, or level of the construct.
- Criterion Referenced Tests: evaluates performance in terms of mastery of a set of well defined objectives, skills or competencies. Items are selected on the basis of how well they match the learning outcomes that are deemed most important. Scores based on percentages or non-master-mastery categories. Outcomes give detailed info about how well a person has performed on each of the objectives, skills or competencies included in the test.
A third test
Ipsative can be contrasted with norm referenced tests. An individual’s performance is compared with his or her performance either in the same domain or construct over time or relative to his or her performance on other domains or constructs. This is referred to as profiling.
Types of Scaling
- Thurstone’s Equal Appearing Interval Scaling
- Likert’s Summarative Scaling
- Guttman’s Scalogram Analysis
- Thurstone’s Equal Appearing Interval Scaling
- Selection of 100 - 200 Statements
- No of judges asked to sort them into single pile from highest to lowest
- Median rank of each statement computed and its the scale value of that statement
- Select a limited no of statements about 25 having equal intervals btw successive items and spanning the entire range of values.
- Applying scale to respondent - they were asked to indicate the statement which applies to him/her
- Respondent score will be avg score of item selected.
One selects items that not only reflect a range of attitudes but also cover that range at roughly equal intervals. To do this, judges rate each item according to the severity of response (or level of attitude) it represented on an 11 point scale and the mean and SD are used to select items at these intervals. Respondents are asked to agree or disagree with each item, the score for each item is equal to the mean rating assigned to it and the overall score is obtained by averaging the ratings over all of the items with which the respondent agrees. It is expensive and time consuming and is meant to produce scores on ain interval scale.
- Likert’s Summarative Scaling
Respondents used symbols to indicate the degree to which they agreed or disagreed with statements and these symbols were converted to a scale ranging from 1 - 5. The total score was obtained by summing the points assigned for each statement. The goal is to combine item responses for people in such a way that the obtained numbers represent reliable and valid individual differences among people.