Zumbo 2013 Flashcards

1
Q

Define a construct:

A

A construct may be conceived of as a concept or a mental representation of shared attributes or characteristics, and it is assumed to exist because it gives rise to observable or measureable phenomena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the 6 Measurement Theories

A

Observed Score approaches include: 1. classical test theory and 2. generalizability theory

Latent variable approaches include 3. factor analytic theory, 4. item response theory (IRT), 5. Rasch Theory and 6. Mixture Models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. CTT - True Score Theory
A

is based on the decomposition of observed scores (x) into true Scores (T) and Error Scores (E). X = T + E. Only produces a single estimate of reliability and standard error of measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Generalizability Theory
A

is an outreach of CTT because it is often used to decompose the E into different facets or sources ie) error results from items selected, raters used, gender of test administrator. Unpacking the error E one redefines the true score T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Factor Analytic Theory
A

It is based on computational tools and statistical modeling strategy , model estimation methods and fit statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. IRT - Item Response Theory
A

Focuses on the range of latent ability )theta) and the characteristics of items at various points along the continuum of this ability. IRT produces estimates across the range of the latent variable measured by a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Rasch Theory
A

Can be characteristed as IRT with only one item parameter, item difficulty. It has a guessing parameter of zero and an item discrimination parameter value of 1 for all items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

There are 4 primary approaches to the development of scales and measures

A
  1. Rational-Theoretical
  2. Factor Analytic
  3. Emperical Criterion keyed
  4. Projective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Rational Theoretical

A
  1. the researcher uses either theory or an intuitive common sense approach to developing items for a test and is the most commonly used approach. Expert opinion forms the basis for the development and selection of items.
  2. The most common
  3. Expert opinion forms the basis for the development and selection of items.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Factor Analytic Approach

A

Items are selected on the basis of whether they load on a factor and a statistical rule forms the basis for the development and selection of items. Many large personality inventories have been developed using this approach.

  • I want to derive the items from a large pool to determine which items for which to factor.
  • Second most commonly used
  • Most tests today use some combination of the rational-theoretical and factor analytic approaches.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Emperical Criterion Approach

A
  • items are selected if they can discriminate the group of interest from a control group, is not frequently used today in the development of measures. MMPI

Do these items discriminate between items? Very effective in prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Projective Approach

A

not many new tests use this. The basic idea behind it is to use ambiguous stimuli, inkblots, pictures and have individuals create their own drawings, and they will project their own concerns, fears, attitudes and beliefs onto the drawing.

Not popular in NA, used a lot in Europe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 Types of Tests

  1. By field of study ie) personality, intelligence, achievement, aptitude
  2. by general administration procedures: individual tests administered one on one or group tests.
  3. by general type of info gathered: a) self report personality test, b) performance or task - IQ, class exam or c) observational
  4. By purpose and scoring / interpretations Norm Referenced and Criterion Tests
A
  1. Maximum Performance Tests measure howe well an individual performs under standard conditions when exerting maximal errort and are presumed to include measures such as intelligence test and achievement tests. ie) performance or task related tests.
  2. Typical Response tests measures an individual’s response in a situation and are presumed to include measures such as personality tests and attitude scales. ie) self report and observational tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Another two types of tests

A
  1. Norm Referenced Tests: compares an individuals performance on a test with a predefined population or normative group. Items selected to have average difficulty levels and high discrimination between low and high scorers on the test. Interpretation of test scores are based on percentiles, standard scores and grade equivalent scores. Scores are based on how an individual scored relative to the normative group but gives little information on the persons knowledge of, performance on, or level of the construct.
  2. Criterion Referenced Tests: evaluates performance in terms of mastery of a set of well defined objectives, skills or competencies. Items are selected on the basis of how well they match the learning outcomes that are deemed most important. Scores based on percentages or non-master-mastery categories. Outcomes give detailed info about how well a person has performed on each of the objectives, skills or competencies included in the test.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A third test

A

Ipsative can be contrasted with norm referenced tests. An individual’s performance is compared with his or her performance either in the same domain or construct over time or relative to his or her performance on other domains or constructs. This is referred to as profiling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Types of Scaling

A
  1. Thurstone’s Equal Appearing Interval Scaling
  2. Likert’s Summarative Scaling
  3. Guttman’s Scalogram Analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. Thurstone’s Equal Appearing Interval Scaling
  2. Selection of 100 - 200 Statements
  3. No of judges asked to sort them into single pile from highest to lowest
  4. Median rank of each statement computed and its the scale value of that statement
  5. Select a limited no of statements about 25 having equal intervals btw successive items and spanning the entire range of values.
  6. Applying scale to respondent - they were asked to indicate the statement which applies to him/her
  7. Respondent score will be avg score of item selected.
A

One selects items that not only reflect a range of attitudes but also cover that range at roughly equal intervals. To do this, judges rate each item according to the severity of response (or level of attitude) it represented on an 11 point scale and the mean and SD are used to select items at these intervals. Respondents are asked to agree or disagree with each item, the score for each item is equal to the mean rating assigned to it and the overall score is obtained by averaging the ratings over all of the items with which the respondent agrees. It is expensive and time consuming and is meant to produce scores on ain interval scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. Likert’s Summarative Scaling
A

Respondents used symbols to indicate the degree to which they agreed or disagreed with statements and these symbols were converted to a scale ranging from 1 - 5. The total score was obtained by summing the points assigned for each statement. The goal is to combine item responses for people in such a way that the obtained numbers represent reliable and valid individual differences among people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
  1. Guttman’s Scalogram Analysis
A

Consists of a unidimensional set of items that are ranked in order of difficulty from the least extreme position to the most extreme. If someone scoresa 3 on a 5 item Guttman scale then they would agree with items 1 to 3 but and disagree with items 4 and 5. This perfect relationship is rarely achived and thus some degree of deviation from this is expected. Many achievement tests use a form of Guttman scaling to order questions on the basis of difficulty. If you can answer a 5 difficulty question than you would assume that you could answer 1 - 4

20
Q

Response Formats

A
  1. Continuous -
  2. Orginal
    They can both be broken down into direct estimation and comparative methods
  3. Dichotomous - yes or no, true or false and agree/disagree
21
Q

Direct Estimation Methods

A

The respondent provides a direct a estimation of the magnitude of an attribute or characteristic. Likert scale in format. A semantic differential format presents bipolar adjectives ie) strong vs weak as anchors with a # of point between them.

22
Q

Comparative Methods

A

Compare the magnitude of an attribute or characteristic with something else. ie rate your quality of life relative to a) other people, b0 their quality of life in the past or c) their ideal way of life. Likert scale ranging from much worse to better than others

23
Q

Scoring

A
  1. Continuous: Infinite or large # of points.
  2. Ordinal: involve 3 to 10 possible values
  3. Dichotomous: only 2 values, 0 or 1 ie) True/False
  4. Reverse Scoring: use of positively and negatively worded items on a scale to identify when a respondent is displaying acquiescence.
  5. Item Weighting: usually equal weight is assigned to each item.
24
Q

Item Analysis Procedures in CTT

A
  1. Item-Total correlations consist of the correlation between the score on a single item and the composite or total score on the scale.
  2. Alpha if Item Deleted
  3. Corrected Item - total correlations consist of the correlation between the score on a single item.
  4. Inter-item correlations
  5. Item Difficulty Index
  6. Item Discrimination Index
25
Q

Item Difficulty and Item Discrimination

A
  1. The item difficulty index = the proportion of respondents who answer an item correctly.
  2. The item discrimination index: a measure of the effectiveness of an item in discriminating between high and low scorers on a test.
    Rule: The item discrimination index is maximized when the item difficulty index is close to .5.
26
Q

Item Analysis Procedures in IRT

A
  1. Item Difficulty (the b parameter)
  2. Item Discrimination (the a parameter)
  3. Guessing (the c parameter); the probability of selecting the correct answer at the lowest level of ability)
  4. Conditional reliability
  5. Conditional Standard error of measurement
27
Q

Unlike CTT, which consists of item difficulty and discrimination in relation to the sample of respondents, IRT examines difficulty and discrimination across the range of the latent variable.

A

The item characteristic curve = the regression of the probability of endorsing the item (or, in achievement tests, the probability of getting the item correct) onto the latent variable score (commonly denoted as theta)

28
Q

Test Validation and Explanation

A

Validity is an integrated and evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores or other modes of assessment.

It involves presenting evidence and providing a compelling argument to support the intended inferences and show that alternative or competing inferences are not more viable.

29
Q

Messick’s Progressive Matrix

A

is the consequential basis which adds both value implications and social consequences. Value implications challenge researchers to reflect on the values that led to their interest in and labeling of the construct, the theory underlying the construct and its measurement and the broader social ideologies that affected the development of the identified theory.

30
Q

Social Consequences:

A

Refers to the unanticipated or unintended consequences of legitimate test interpretation and use. Concerned with the about the relationship between score meaning and social consequences.

31
Q

Hubley and Zumbo put together a framework and links social and personal consequence and unintended social and personal side effects.

A

Psychometrics and other theories as well as values influence the construct, the measure, and validity and validation.

32
Q
  1. Content Related Evidence
A

Examines the degree to which elements of assessment instrument are relevant to and representative of the targeted construct for a particular assessment purpose. Not just item content but also instructions, response formats, and scoring instructions.

33
Q
  1. Score Structure
A

Factor Analysis is used to a) discover how many factors are being tapped by the items in the test or b) confirm whether the test items measure the factors intended. Knowing the factor structure of tests is important because it greatly affects how one scores a test and how one assesses the reliability of score and the validity of inferences made from a test.

34
Q
  1. Reliability
A

Reliability refers to the degree to which test scores are repeatable or consistent. Free from measurement of error.

  1. Alternate forms: two different forms of the same test were administered to the same group of people in the same session or on different occasions.
  2. Test-Retest the same test administered on two different occasions.
35
Q

There are 3 different estimates of reliability:

A
  1. Split Half Reliability: one correlates the scores from two halves of a test that has been administered only once and applies the Spearman-Brown formula to correct for an underestimation of reliability that results from treating the test as being only half its length.
  2. Coefficient Alpha: is the most commonly reported estimate of reliability. It can be thought of as the mean of all possible split-half coefficients corrected by the Spearman-Brown formula.
  3. Kuder Richardson formula 20: refers to the case in which coefficient alpha is used with dichotomous (yes/no, true/false) data. In each case a reliability coefficient tells you the variability in the sample scores that is the result of individual differences rather than random (unsystematic) measurement error.
36
Q

Interrater (or interscorer) reliability

A

One is interested in how repeatable the scores are when two or more different people are scoring or observing the same behaviour. If interrater reliability is low, then one needs to consider whether the scoring criteria are clear or complete enough, whether training was sufficient or whether some raters are not doing a good job.

37
Q

Reliability and Validity Coefficients

A

provide related but separate info. High reliability coefficients indicate a high proportion of true score variance in the observed scores and thus one is confident that one is measuring real individual differences.

38
Q
  1. Criterion Related Evidence
A

demonstrates the degree to which scores obtained on a measure are related to a criterion. A criterion is an outcome indicator that represents the construct, diagnosis or behaviour that one is attempting to predict using a measure.

a criterion can be thought of as being what one would really like to have (a diagnosis) but cannot obtain because of cost or time and so the test acts as a substitute or a short cut.

Can be predictive or concurrent

39
Q

Predictive evidence

A

examines how well a score on a measure is related to or predicts a future criterion (behaviour, test performance, or diagnosis obtained at a later date)

40
Q

Concurrent evidence

A

examines how well a score on a measure is related to or predicts a current criterion.

41
Q
  1. Convergent and Discriminative Evidence
A

Convergent measures may consists of measures of highly related constructs (eg depression and anxiety ) or the same constructs (depression) in the latter case, correlations of such scores are sometimes misidentified as criterion-related validity evidence.

42
Q

Correlations between convergent measures should be relatively high, whereas correlations between discriminant measures should be relatively low.

A

Discriminant validity coefficients should be significantly lower than convergent validity coefficients.

43
Q

The most common types of norms used are:

A
  1. Percentiles

2. Standard Scores

44
Q

Percentile:

A

A percentile is a ranking that provides info about the relative position of a score within a distribution of scores. It indicates the percentage of people in the specific normative group whose scores fall below a given raw score.

45
Q

Standard Scores:

A

indicate where a raw score sits in the distribution relative to the mean. They indicate how far away the raw score is above or below the mean.

Common standard scores are z scores, with a mean of 0 and a standard deviation of 1.

T Scores with a mean of 50 and a SD of 10.

They are linear transformations of raw scores. That is the mean and the SD of the scores may change but the shape of the distribution remains the same.

46
Q

Norms consist of percentiles and standard scores and are often used to:

A

interpret an individuals performance relative to that of other people of the same group.