Chapter 12 & 13 - Measurement of Variables Flashcards
Four levels of measurement
Nominal
Ordinal
Interval
Ratio
Categorical levels of measurement
Nominal
Ordinal
Continuous levels of measurement
Interval
Ratio
Nominal level of measurement
Identification; classification
Ordinal level of measurement
Ranking of categories, but not equidistant (at equal distances)
Interval level of measurement
- Equidistant (at equal distances) ranking, but zero point not fixed
- Likert-type scales: Ordinal or interval?
Ratio level of measurement
Possesses a unique origin (zero point)
Rating Scale Formats
– Dichotomous/binary scale
– Category scale
– Semantic differential scale
– Numerical scale
– Itemized rating scale
– Likert-type scale
– Fixed or constant sum scale
– Stapel scale
– Consensus scale
– Graphic rating scale
– Paired comparison scale
– Forced choice scale
– Comparative scale
Dichotomous/binary scale rating scale example
Yes vs. No
Category scale rating scale example
English; French; Other
Semantic differential scale rating scale example
Good —– Bad; Emotionally stable —– Neurotic
Numerical scale rating scale example
Responsive 1 2 3 4 5 6 7 Unresponsive
Itemized rating scale rating scale example
Can be balanced or unbalanced; forced or unforced
Likert-type scale rating scale example
Fixed or constant sum scale rating scale example
distributing 100 points across several items (need to add
up to 100)
Stapel scale rating scale example
-3 -2 -1 Interpersonal skills +1 +2 +3
Consensus scale rating scale example
Developed by consensus by a panel of judges
Graphic rating scale rating scale example
Paired comparison scale rating scale example
Respondents asked to choose between two objects at a time
(among a small number of objects)
Forced choice scale rating scale example
Ranking objects among the provided alternatives
Comparative scale rating scale example
Provides a point of reference to assess attitudes toward a
particular object/event/situation
Response-scale format considerations
– Measurement scale (nominal, ordinal, interval, ratio)
– Number of scale points/categories
* Need to be mutually exclusive and collectively exhaustive
– Balanced or unbalanced scales
* Equal # of favourable & unfavourable categories?
– Forced or non-forced choice
* “Neutral” category
* Odd or even number of categories
– Category labels for scales (anchors)
* Verbal; numerical; unlabeled (e.g., graphic)
– Number of items in a scale
Operationalization of variables
Breaking an abstract construct down to its measureable or
tangible components
* Can tap into a construct by looking at the behavioural dimensions, facets, and properties denoted by the construct and translating them into observable/measurable elements
Delineating the antecedents, consequences, or correlates of the construct is not _______________
operationalization
Steps in the Operationalization of variables
- Clear definition of the construct (and possibly its dimensions)
- Develop a pool of items (indicators or elements) representing more concrete manifestations or operalizations of the construct
- Choose a response format (e.g., Likert-type scale)
- Collect data from a (representative) sample
- Conduct item analyses and select items for the scale(s)
- Test the reliability and validity of the scale(s)
Typical process in Developing Scales
– Define the concept to be measured
– Identify components/elements of the concept
– Specify sample of observable, measurable items representing the components/elements of the concept
– Select an appropriate response format to measure the items
– Combine the items into a composite (summated) scale
– Pretest the scale to assess respondent understanding
– Assess scale reliability and validity
– Revise the scale (if needed)
considerations for evaluating existing scales
– Title, author(s), publisher (if applicable)
– Language(s); equivalence of translated forms
– Construct(s) (allegedly) measured
– Characteristics of development/normative sample(s)
– Costs/permissions required
– User qualifications
– Format; administration method; scoring; length/time
– Psychometrics: Evidence of reliability, validity, fairness
– Independent reviews or peer-reviewed research
____________ Consist of a number of closely related
items (questions or statements) whose
responses are combined into a composite
score to measure a construct
Multi-Item (Summated) Scales
Recommendations for Multi-Item (Summated) Scales
Items should be closely related, represent only
a single construct, and represent the construct
completely
Assessing Measurement Scales
Reliability
Validity (unitary)
Assessing Measurement Scales - Reliability
- Test-retest reliability
- Parallel-form reliability
- Inter-rater reliability
- Internal consistency reliability
- Split-half reliability
Assessing Measurement Scales - Validity (unitary)
- Content validity
- Construct validity
- Convergent validity
- Discriminant validity
- Criterion-related validity
- Concurrent validity
- Predictive validity
Reliability
the stability and consistency of scores generated by a scale
Stability of scores (derived from scales)
– Test-retest reliability (stability over time)
– Parallel-form reliability (stability across forms)
– Administer them to the same subjects
– Inter-rater reliability (stability across raters)
Inter-item (or internal) consistency reliability
- Consistency of answers to all the items in a measure
– If these items independently measure the same concept,
they will be highly correlated - Coefficient α or Cronbach’s α
– A different “alpha” than the one associated with Type I error - Rule of thumb: Use α >.70 as “acceptable” in our field
Split-half reliability
- Correlations between two halves of an instrument
- Typically not as useful as Cronbach’s α
Validity
whether an instrument measures what it sets out to measure
Construct validity
Degree of correspondence between a construct and its
operational definition (measure or manipulation)
Forms of evidence in validity
- Content-based (content validity)
- Convergent and discriminant validity
- Criterion-related evidence (concurrent and predictive)
Content validity
Evidence that the content of a test corresponds to
the content of the construct it was designed to
measure
* Usually relies on opinions of subject matter experts
Face validity
Does the scale appear to measure the
construct?
Convergent validity
– Identify another scale that measures the same
construct as the one being validated
– Obtain scores on both scales and compute the
correlation between them (should be high)
Discriminant validity
– Identify a scale that measures a different construct
– Specify how the two scales are expected to differ
– Obtain scores on both scales and compute the correlation between them (should be low)
Convergent and discriminant validity
Professional credentialing examinations are designed to assess the knowledge required for competent professional practice in a given discipline.
Measurement
The assignment of numbers or other symbols to characteristics (or attributes) of objects according to a prespecified set of rules.
Operationalizing
Reduction of abstract concepts to render them measurable in a tangible way.
Some considerations for evaluating existing scales
– Title, author(s), publisher (if applicable)
– Language(s); equivalence of translated forms
– Construct(s) (allegedly) measured
– Characteristics of development/normative sample(s)
– Costs/permissions required
– User qualifications
– Format; administration method; scoring; length/time
– Psychometrics: Evidence of reliability, validity, fairness
– Independent reviews or peer-reviewed research
Multi-Item (Summated) Scales
Consist of a number of closely related
items (questions or statements) whose
responses are combined into a composite
score to measure a construct
– Scale / Index / Summated rating scale /
Multi-item scale
Scale
A tool or mechanism by which individuals, events, or objects are distinguished on the variables of interest in some meaningful way.
Likert scale
An interval scale that specifically uses the five anchors of
Strongly Disagree, Disagree, Neither Disagree nor Agree, Agree, and Strongly Agree.
Nominal scale
A scale that categorizes individuals or objects into mutually
exclusive and collectively exhaustive groups, and offers basic,
categorical information on the variable of interest.
Ordinal scale
A scale that not only categorizes the qualitative differences in the variable of interest, but also allows for the rank‐ordering of these categories in a meaningful way.
Interval Scale
A multipoint scale that taps the differences, the order, and the
equality of the magnitude of the differences in the responses
Ratio scale
A scale that has an absolute zero origin, and hence indicates not only the magnitude, but also the proportion, of the differences.
Rating scale
Scale with several response categories that evaluate an object on a scale.
Ranking Scale
Scale used to tap preferences between two or among more objects or items
Dichotomous scale
Scale used to elicit a Yes/No response, or an answer to two
different aspects of a concept.
Category scale
A scale that uses multiple items to seek a single response.
Semantic differential scale
Usually a seven‐point scale with bipolar attributes indicated at its extremes.
Numerical scale
A scale with bipolar attributes with five points or seven points
indicated on the scale.
Unbalanced rating scale
An even‐numbered scale that has no neutral point.
Faces scale
A particular representation of the graphic scale, depicting faces with expressions that range from smiling to sad.
Consensus scale
A scale developed through consensus or the unanimous
agreement of a panel of judges as to the items that measure a concept.
Constant sum rating scale
A scale where the respondents distribute a fixed number of points across several items.
Paired comparisons
Respondents choose between two objects at a time, with the
process repeated with a small number of objects.
Forced choice
Elicits the ranking of objects relative to one another.
Comparative scale
A scale that provides a benchmark or point of reference to assess attitudes, opinions, and the like.
Reliability
Attests to the consistency and stability of the measuring
instrument.
Validity
Evidence that the instrument, technique, or process used to
measure a concept does indeed measure the intended concept.
Goodness of measures
Attests to the reliability and validity of measures.
Content validity
Establishes the representative sampling of a whole set of items that measures a concept, and reflects how well the dimensions and elements thereof are delineated.
Face validity
An aspect of validity examining whether the item on the scale, on the face of it, reads as if it indeed measures what it is supposed to measure.
Criterion-related validity
That which is established when the measure differentiates
individuals on a criterion that it is expected to predict.
Concurrent validity
Relates to criterion‐related validity, which is established at the same time the test is administered.
Predictive validity
The ability of the measure to differentiate among individuals as to a criterion predicted for the future.
Construct validity
Testifies to how well the results obtained from the use of the
measure fit the theories around which the test was designed.
Convergent validity
That which is established when the scores obtained by two
different instruments measuring the same concept, or by
measuring the concept by two different methods, are highly
correlated.
Discriminant validity
That which is established when two variables are theorized to be uncorrelated, and the scores obtained by measuring them are indeed empirically found to be so.
Test–retest reliability
A way of establishing the stability of the measuring instrument by correlating the scores obtained through its administration to the same set of respondents at two different points in time.
Parallel‐form reliability
That form of reliability which is established when responses to two comparable sets of measures tapping the same construct are highly correlated.
Internal consistency
Homogeneity of the items in the measure that tap a construct.
Inter item consistency reliability
A test of the consistency of responses to all the items in a measure to establish that they hang together as a set.
Split‐half reliability
The correlation coefficient between one half of the items
measuring a concept and the other half.
Reflective scale
Each item in a reflective scale is assumed to share a common
basis (the underlying construct of interest).
Formative scale
Used when a construct is viewed as an explanatory combination of its indicators.