Week 4 - PSYCHOLOGICAL MEASUREMENT Flashcards

Question 1

Q

Describe - The Rosenberg Self-Esteem Scale

Answer

A

(Rosenberg, 1989)[2] is one of the most common measures of self-esteem

Participants respond to each of the 10 items that follow with a rating on a 4-point scale: Strongly Agree, Agree, Disagree, Strongly Disagree

Question 2

Q

Describe ‘measurement’ in psychometrics and give an example…

Answer

A

so that the scores represent some characteristic of the individuals.

PSYCHOMETRICS EXAMPLE

Imagine — Cognitive psychologist - measure - person’s working memory capacity—their ability to hold in mind and think about several pieces of information all at the same time. (% & @ # ^)

To do this, she might use a backward digit span task, in which she reads a list of two digits to the person and asks them to repeat them in reverse order. She then repeats this several times, increasing the length of the list by one digit each time, until the person makes an error.

The length of the longest list for which the person responds correctly is the score and
REPRESENTS their working memory capacity.

(EXAMPLE: Beck Depression Inventory, which is a 21-item self-report questionnaire in which the person rates the extent to which they have felt sad, lost energy, and experienced other symptoms of depression over the past 2 weeks.
SUM = represents the person’s current level of depression.)

Requires - SYSTEMATIC procedure for assigning scores to individuals or objects…
(1 = # // 2 = * // 3 = $)
SO… those scores represent the = ** characteristic of interest**.

Question 3

Q

Describe psychological constructs and give examples

Answer

A

Psychological Constructs

Variables that aren’t easy to quantify

These kinds of variables are called constructs (pronounced CON-structs)

EXAMPLE
#Personality traits (e.g., extraversion)
:) emotional states (e.g., fear)
» attitudes (e.g., toward taxes)
(*) abilities (e.g., athleticism).

Question 4

Q

Describe and give example - Psychological Constructs

Answer

A

Psychological constructs :

cannot be observed directly
often represent tendencies to think, feel, or act in certain ways
often involve internal processes

EXAMPLE

FEAR - activates central and peripheral nervous system structures, AND certain kinds of thoughts, feelings, and behaviors…
NOT OBVIOUS TO AN OBSERVER

IMPORTANT NOTE -

Neither extraversion nor fear “reduces to” any particular thought, feeling, act, or physiological structure or process.

INSTEAD each is a kind of summary of a
COMPLEX SET set of behaviors and internal processes.

Question 5

Q

Describe and give example - Conceptual definition

Answer

A

The conceptual definition of a psychological construct describes…

the behaviors and internal processes that MAKE UP that construct, along with HOW IT RELATES to other variables.

EXAMPLE:
A conceptual definition of NEUROTICISM (another one of the Big Five) = people’s tendency to experience negative emotions such as anxiety, anger, and sadness across a variety of situations.

This definition might ALSO INCLUDE that it has a strong genetic component, remains fairly stable over time, and is positively correlated with the tendency to experience pain and other physical symptoms.

(EG. The Big Five is a set of five broad dimensions that capture much of the variation in human personality. Each of the Big Five can even be defined in terms of six more specific constructs called “facets” (Costa & McCrae, 1992))

Question 6

Q

Why use a conceptual definition instead of using the dictionary?

Answer

A

Many scientific constructs do not have counterparts in everyday language (e.g., working memory capacity).

Researchers are in the business of developing definitions that are;

—more detailed and precise
—and that more accurately describe the way the world is—than the informal definitions in the dictionary.

As we will see, they do this by
1. PROPOSING conceptual definitions
2. Testing them empirically
3. Revising them as necessary

Sometimes they throw them out altogether.

This is why the RESEARCH LITERATURE often includes different conceptual definitions of the same construct.

In some cases, an older conceptual definition has been replaced by a newer one that fits and works better.

In others, researchers are still in the process of deciding which of various conceptual definitions is the best.

Question 7

Q

Describe operational definition (and 3 measure categories)

Answer

A

An operational definition is a
definition of a variable in terms of precisely how it is to be measured.

These measures generally fall into one of three broad categories.

Self-report measures are those in which PARTICIPANTS REPORT on their own thoughts, feelings, and actions, as with the Rosenberg Self-Esteem Scale (Rosenberg, 1965)[2].

Behavioral measures are those in which some OTHER aspect of participants’ behavior is
OBSERVED & RECORDED.

EXAMPLE
Lab - measuring working memory capacity using the backward digit span task.

Natural Setting - Physical aggression from researcher Albert Bandura and his colleagues (Bandura, Ross, & Ross, 1961)[3].

They let each of several children play for 20 minutes in a room that contained a clown-shaped punching bag called a Bobo doll. They filmed each child and counted the number of acts of physical aggression the child committed. These included hitting the doll with a mallet, punching it, and kicking it. Their operational definition, then, was the number of these specifically defined acts that the child committed during the 20-minute period.

Physiological measures are those that involve recording any of a wide variety of physiological processes, EG.
<3 Heart rate
~~~ BLOOD pressure
/// electrical activity ///

Question 8

Q

For ANY VARIABLE OR CONSTRUCT, there will be multiple operational definitions - Give an example

Answer

A

For ANY VARIABLE OR CONSTRUCT, there will be multiple operational definitions.

EXAMPLE - STRESS
conceptual definition = stress is an adaptive response to a *perceived danger or threat *that involves physiological, cognitive, affective, and behavioral components.

Opreational Definition

**The Social Readjustment Rating Scale **(Holmes & Rahe, 1967)[4] is a self-report questionnaire on which people identify stressful events that they have experienced in the past year and assigns points for each one depending on its severity.

For example, a man who has been divorced (73 points), changed jobs (36 points), and had a change in sleeping habits (16 points) in the past year would have a total score of 125.

The Hassles and Uplifts Scale (Delongis, Coyne, Dakof, Folkman & Lazarus, 1982) [5] is similar but focuses on everyday stressors like misplacing things and being concerned about one’s weight.

The Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983) [6] is another self-report measure that focuses on people’s feelings of stress (e.g., “How often have you felt nervous and stressed?”).

Researchers have also operationally defined stress in terms of several physiological variables including blood pressure and levels of the stress hormone cortisol.

Question 9

Q

Describe converging operations

Answer

A

When psychologists use multiple operational definitions of the same construct—either within a study or across studies—they are using converging operations.

The idea is that the VARIOUS operational definitions are “converging” or coming together on the same construct.

When scores based on several different operational definitions are closely related to each other
and produce similar patterns of results,
this constitutes good evidence that the construct is being measured effectively
and that it is useful.

EXAMPLE - various measures of stress, are all correlated with each other and have all been shown to be correlated with other variables such as** immune system functioning** (also measured in a variety of ways) (Segerstrom & Miller, 2004)[7].

Question 10

Q

Name the four levels of measurement and describe their genesis

Answer

A

Levels of Measurement
The psychologist S. S. Stevens suggested that scores can be assigned to individuals in a way that communicates more or less quantitative information about the variable of interest (Stevens, 1946)[8]. (He proposes four levels)

The nominal level
The ordinal level
The interval level
The ratio level

Question 11

Q

Describe and give examples - The nominal level

Answer

A

The nominal level of measurement is used for categorical variables and involves
ASSIGNING SCORES = that are **category labels. **

CATEGORY LABELS communicate whether any two individuals are the same or different in terms of the variable being measured.

EXAMPLE

Asking about **marital status or ethnicity **
NO implied order (one is not higher than the other

Responses are merely categorized.
Nominal scales thus embody the LOWEST level of measurement

Question 12

Q

Describe and give examples - The ordinal level

Answer

A

ORDINAL = ORDER (Rank order)

Ordinal level of measurement involves ASSIGN SCORES so that they represent the rank order of the individuals.

Ranks communicates whether one individual variable is higher or lower on that variable.

EXAMPLE -
APP REVIEW Questions - “very dissatisfied,” “somewhat dissatisfied,” “somewhat satisfied,” or “very satisfied.” The items in this scale are ordered, ranging from least to most satisfied.

Question 13

Q

Describe ordinal level limitations

Answer

A

EXAMPLE

The DIFFERENCE between two levels of an ordinal scale cannot be assumed to be the same as the difference between two other levels

In our satisfaction scale…

The difference between the responses “very dissatisfied” and “somewhat dissatisfied” is probably not equivalent to the difference between “somewhat dissatisfied” and “somewhat satisfied.”

Nothing in our measurement procedure allows us to determine whether the two differences reflect the same difference in psychological satisfaction.

Statisticians express this point by saying that the differences between adjacent scale values do not necessarily represent equal intervals on the underlying scale giving rise to the measurements.

(In our case, the underlying scale is the true feeling of satisfaction, which we are trying to measure.)

Question 14

Q

Describe interval level of measurement

Answer

A

The interval level of measurement involves assigning scores using **numerical scales **in which intervals have the same interpretation throughout.

EXAMPLE
Fahrenheit or Celsius temperature scales.
The difference between 30 degrees and 40 degrees represents the same temperature difference as the difference between 80 degrees and 90 degrees.

This is because each 10-degree interval has the same physical meaning (in terms of the kinetic energy of molecules).

Question 15

Q

Describe the limitations of interval scales

Answer

A

Interval scales are not perfect, however.

In particular, they do not have a true zero point even if one of the scaled values happens to carry the name “zero.”

EXAMPLE

Measuring IQ - Someone may get a ‘0’ score but it doesn’t indicate the complete absense of intellect.

The Fahrenheit scale illustrates the issue. Zero degrees Fahrenheit does not represent the complete absence of temperature (the absence of any molecular kinetic energy).

In reality, the label “zero” is applied to its temperature for quite accidental reasons connected to the history of temperature measurement.

Since an interval scale has no true zero point, it does not make sense to compute ratios of temperatures.

Question 16

Q

Describe the Ratio Scale

Answer

A

Ratio level of measurement involves assigning scores in such a way that there is a true zero point that represents the COMPLETE ABSENSE of the quantity.

EXAMPLE - Height measured in meters and weight measured in kilograms are good examples.

RATIO SCALE

provides a name or category for each object
objects are ordered
the same difference at two places on the scale has the same meaning.
the same ratio at two places on the scale also carries the same meaning (see Table 4.1).

EXAMPLE

$$$ Amount of money you have in your pocket right now (25 cents, 50 cents, etc.).

Money is measured on a ratio scale because, in addition to having the properties of an interval scale, it has a true zero point: if you have zero money, this actually implies the absence of money.

Question 17

Q

Give 2 reasons why Stevens’s levels of measurement are important

Answer

A

Stevens’s levels of measurement are important for at least two reasons.

they **emphasize the generality of the concept of measurement. **Although people do not normally think of categorizing or ranking individuals as measurement, in fact, they are as long as they are done so that they represent some characteristic of the individuals.
the levels of measurement can serve as a rough guide to the statistical procedures that can be used with the data and the conclusions that can be drawn from them.

Nominal-level measurement - can use mode.

Ordinal-level measurement - median or mode

Interval and ratio-level measurement are typically considered the most desirable because they permit for any indicators of central tendency to be computed (i.e., mean, median, or mode).

Also, ratio-level measurement is the only level that allows meaningful statements about ratios of scores.

Once again, one cannot say that someone with an IQ of 140 is twice as intelligent as someone with an IQ of 70 because IQ is measured at the interval level, but one can say that someone with six siblings has twice as many as someone with three because number of siblings is measured at the ratio level.

Question 18

Q

Identify the two dimensions of evaluating measurement method

Answer

A

Psychologists DO NOT simply assume that their measures work.

Instead, they **collect data ** to demonstrate that they work. If their research does not demonstrate that a measure works, they stop using it

Evaluating a measurement method, psychologists consider two general dimensions:

Reliability and Validity

Question 19

Q

Define reliability

Answer

A

Reliability refers to the consistency of a measure.

Psychologists consider three types of consistency:
- Over time (test-retest reliability),
- Across items (internal consistency)
- Across different researchers (inter-rater reliability).

Question 20

Q

Describe Test-Retest Reliability

Answer

A

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time.

Test-retest reliability is the extent to which this is actually the case.

EXAMPLE
IQ - intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week.

Assessing test-retest reliability requires using the measure on a group of people at ONE TIME,

Using it again on the same group of people at aLATER TIME, and then looking at the test-retest correlation between the two sets of scores.

This is typically done by graphing the data in a SCATTERPLOT and computing the** correlation coefficient**. Figure 4.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart.

The correlation coefficient for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions.

But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Question 21

Q

DESCRIBE Internal Consistency

Answer

A

Internal Consistency = the consistency of people’s responses across the items on a multiple-item measure

In general, all the items on such measures are supposed to REFLECT the same underlying construct, so people’s scores on those items SHOULD BE CORRELATED with each other.

EXAMPLE
On the Rosenberg Self-Esteem Scale

People who AGREE that they are a person of worth should tend ====== AGREE that they have a number of good qualities.

No correlation = Not the same underlying construct

EXAMPLE - people might make a series of bets in a simulated game of roulette
—– as a measure of their level of RISK SEEKING.

This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.

Question 22

Q

DESCRIBE a way to APPROACH Internal Consistency

Answer

A

Internal consistency can only be assessed by collecting and analyzing data.

One approach is to look at a split-half correlation.

This involves splitting the items into TWO sets,

Then a score is computed for each set of items, and === the RELATIONSHIP between the two sets of scores is EXAMINED.

A split-half correlation of +.80 or greater
==== GOOD internal consistency.

Perhaps the most common measure of internal consistency used by researchers in psychology is a statistic called Cronbach’s α (the Greek letter alpha).

Conceptually, α is the mean of all possible split-half correlations for a set of items.

For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic. Again, a value of +.80 or greater is generally taken to indicate good internal consistency.

Question 23

Q

Describe Interrater Reliability

Answer

A

Interrater Reliability

Many behavioral measures involve significant judgment on the part of an OBSERVER OR RATER.

Inter-rater reliability is the extent to which DIFFERENT observers are consistent in their judgments.

EXAMPLE
- Measuring university students’ social skills
- You could make video recordings of them as they interacted with another student whom they are meeting for the first time.
- Then you could have two or more observers watch the videos and rate each student’s level of social skills.

To the extent that each participant does, in fact, have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other.

Inter-rater reliability would also have been measured in Bandura’s Bobo doll study.

In this case, the observers’ ratings of HOW MANY ACTS OF AGGRESSION a particular child committed while playing with the Bobo doll should have been highly positively correlated.

Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative

OR

an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.

Question 24

Q

Describe Validity

Answer

A

Validity is the EXTENT to which the scores from a measure represent the variable they are intended to.

Good VARIABILITY = Good test-retest reliability (Hig positive correlation) and internal consistency (A split-half correlation of +.80 or greater)

HOWEVER a measure can be extremely reliable but have no validity whatsoever.

EXAMPLE
ABSURD: Index finger length REFLECTS Self-esteem
imagine someone who believes that people’s index finger length reflects. STRONG retest reliability BUT NO VALIDITY…

THREE MEASURES OF VALIDITY:
Face validity
Content validity
Criterion validity.

Question 25

Q

Describe Face validity

Answer

A

:)
Face validity is the extent to which a measurement method appears “on its face” to measure the construct of interest.

Can be assessed qualitatively and quantitatively

EXAMPLE - by having a LARGE SAMPLE of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very WEAK kind of evidence that a measurement method is measuring what it is supposed to.

One reason is that it is** based on people’s intuitions** about human behavior, which are frequently wrong.

It is also the case that many established measures in psychology work quite well despite lacking face validity.

EXAMPLE
The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over** 567 different statements applies to them**

—where MANY of the statements do not have any obvious relationship to the construct that they measure.

EXAMPLE the items “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both measure the suppression of aggression.

In this case, it is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals
=== who tend to suppress their aggression.

Question 26

Q

Describe Content Validity

Answer

A

Content validity is the extent to which a measure “covers” the construct of interest.

EXAMPLE

if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings)

and

negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts.

Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

Question 27

Q

Describe Criterion Validity AND the different types of criterion validity

Answer

A

Criterion validity is the extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with.

EXAMPLE

People’s SCORES of test anxiety should be = NEGATIVELY correlated with their performance on an important school exam.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured.

There will usually be MANY of them.

Concurrent validity
When the criterion is MEASURED at the same time as the construct

Predictive validity
When the criterion is measured at —
some point in the FUTURE (after the construct has been measured) because scores on the measure have “predicted” a future outcome.

Convergent validity

NEW measures positively correlated with EXISTING established measures of the same constructs.

Assessing convergent validity requires collecting data using the measure.

EXAMPLE

Researchers John Cacioppo and Richard Petty did this when they created their self-report Need for Cognition Scale to measure how much people value and engage in thinking (Cacioppo & Petty, 1982)[1].

In a series of studies, they showed that people’s scores were POSITIVELY correlated with their scores on a standardized academic achievement test, and that their scores were NEGATIVELY correlated with their scores on a measure of dogmatism (which represents a tendency toward obedience).

In the years since it was created, the Need for Cognition Scale has been used in literally hundreds of studies and has been shown to be correlated with a wide variety of other variables, including the effectiveness of an advertisement, interest in politics, and juror decisions (Petty, Briñol, Loersch, & McCaslin, 2009)[2].

Question 28

Q

Describe Discriminant Validity

Answer

A

Discriminant validity, on the other hand, is the extent to which scores on a measure are NOT correlated with measures of variables that are conceptually distinct.

EXAMPLE

Self-esteem is a general attitude toward the self that is fairly stable over time.

NOT THE SAME as mood.

EXAMPLE

Need for Cognition Scale
(Cacioppo and Petty)
WEAK correlation between people’s need for cognition and
- COGNITIVE STYLE
- TEST ANXIETY
- Tendency to respond in socially desirable ways
(extent to which they tend to think analytically by breaking ideas into smaller parts or holistically in terms of “the big picture.”)

Question 29

Q

How do you measure a psychological construct for a research project?

Answer

A

Broadly speaking, there are four steps in the measurement process:

(a) conceptually defining the construct
(b) operationally defining the construct
(c) implementing the measure
(d) evaluating the measure. In this section, we will look at each of these steps in turn.

Question 30

Q

Describe Conceptually Defining the Construct

Answer

A

CLEAR & COMPLETE conceptual definition of a construct

Allows you to make SOUND decisions about EXACTLY how to measure the construct.

EXAMPLE

Memory - conceptualized as a set of semi-independent systems

PRECISION required - Long term memory, working memory, short term memory require DIFFERENT conceptual definitions
DIFFERENT forms of measurement

Question 31

Q

Describe Operationally Defining the Construct

Answer

A

Operational definition is a definition of the variable in terms of precisely how it is to be measured.

Abstract concepts =
Observation is at the heart of the scientific method

Conceptual definitions MUST BE TRANSFORMED into something that can be directly observed and measured.

EXAMPLE

Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983) [1], cortisol concentrations in their saliva, or the number of stressful life events they have recently experienced.

Question 32

Q

Describe - Using an Existing Measure

Answer

A

It is usually a good idea to use an existing measure that has been used successfully in previous research.

(a) Save time and trouble
(b) already evidence of validity
(c) your results can more easily be compared with and combined with previous results.

EXAMPLE

The Ten-Item Personality Inventory (TIPI) is a self-report questionnaire that measures all the Big Five personality dimensions with just 10 items (Gosling, Rentfrow, & Swann, 2003)[2].

It is not as reliable or valid as longer and more comprehensive measures, BUT a researcher might choose to use it when testing time is severely limited

(EXTRA INFO - JUST IN CASE - EXTRA READING)

When an existing measure was created primarily for use in scientific research, it is usually described in detail in a published research article and is free to use in your own research—with a proper citation. You might find that later researchers who use the same measure describe it only briefly but provide a reference to the original article, in which case you would have to get the details from the original article. The American Psychological Association also publishes the Directory of Unpublished Experimental Measures and PsycTESTS, which are extensive catalogs/collections of measures that have been used in previous research. Many existing measures—especially those that have applications in clinical psychology—are proprietary. This means that a publisher owns the rights to them and that you would have to purchase them. These include many standard intelligence tests, the Beck Depression Inventory, and the Minnesota Multiphasic Personality Inventory (MMPI). Details about many of these measures and how to obtain them can be found in other reference books, including Tests in Print and the Mental Measurements Yearbook. There is a good chance you can find these reference books in your university library.

Question 33

Q

Describe - Creating Your Own Measure

Answer

A

Creating Your Own Measure
- NO existing measure
- Evaluate CONVERGENT validity

ISSUES in creating new measures that apply equally to self-report, behavioral, and physiological measures.

most new measures in psychology are really variations of existing measures, so you should still look to the research literature for ideas.

EG. the famous Stroop task (Stroop, 1935)[3]—in which people quickly name the colors that various color words are printed in—has been ADAPTED for the study of social anxiety. People high in social anxiety are slower at color naming when the words have negative social connotations such as “stupid” (Amir, Freshman, & Foa, 2002)[4].

Strive for SIMPLICITY. Create a set of CLEAR instructions using SIMPLE LANGUAGE that you can present in writing or read aloud (or both). It is also a good idea to include one or more practice items so that participants can become familiar with the task.
BREVITY - however, needs to be weighed against the fact that it is nearly always better for a measure to include MULTIPLE rather than a single item.

There are two reasons for this.
One is a matter of content validity. MULTIPLE items are often required to cover a construct adequately.

The other is a matter of reliability.
People’s responses to single items can be influenced by all sorts of irrelevant factors

Remember, however, that multiple items must be structured in a way that allows them to be combined into a single overall score by summing or averaging.

Question 34

Q

Describe - Implementing the Measure

Answer

A

You will want to implement any measure in a way that MAXIMIZES its reliability and validity.

In most cases, it is best to TEST everyone under similar conditions

Be aware also that people can react in a variety of ways to being measured that REDUCE the reliability and validity of the scores.

Disagreeable participants - Disrupt
Agreeable participants - respond to socially desirable ways // EXPECTATIONS

EXAMPLE

IN BUILT DEMAND CHARACTERISTICS
A participant whose attitude toward exercise is measured immediately after she is asked to read a passage about the dangers of heart disease might reasonably conclude that the passage was meant to improve her attitude.
May respond favourable
Own expectations cause BIAS

Precautions - minimize reactivity

Procedures clear and brief as possible
Guarantee anonymity

Group tests - seated FAR AWAY from each other
Give same pens and paper
Blind test - minimize bias

STANDARDIZE ALL INTERACTIONS

Question 35

Q

Describe - Evaluating the Measure

Answer

A

Once you have used your measure on a sample of people and have a set of scores, you are in a position to evaluate it more thoroughly in terms of reliability and validity.

Even if the measure has been used extensively by other researchers and has already shown evidence of reliability and validity, you should not assume that it worked as expected for your particular sample and under your particular testing conditions.

Regardless, you now have additional evidence bearing on the reliability and validity of the measure, and it would make sense to add that evidence to the research literature.

In most research designs, it is not possible to assess test-retest reliability because participants are tested at only one time.

For a new measure, you might design a study specifically to assess its test-retest reliability by testing the same set of participants at two separate times.

In other cases, a study designed to answer a different question still allows for the assessment of test-retest reliability.

For example, a psychology instructor might measure his students’ attitude toward critical thinking using the same measure at the beginning and end of the semester to see if there is any change.

Even if there is no change, he could still look at the correlation between students’ scores at the two times to assess the measure’s test-retest reliability.

It is also customary to assess internal consistency for any multiple-item measure—usually by looking at a split-half correlation or Cronbach’s α.

Criterion validity can be assessed in various ways. For example, if your study included more than one measure of the same construct or measures of conceptually distinct constructs, then you should look at the correlations among these measures to be sure that they fit your expectations. Note also that a successful experimental manipulation also provides evidence of criterion validity.

Recall that MacDonald and Martineau manipulated participant’s moods by having them think either positive or negative thoughts, and after the manipulation, their mood measure showed a distinct difference between the two groups. This simultaneously provided evidence that their mood manipulation worked and that their mood measure was valid.

But what if your newly collected data cast doubt on the reliability or validity of your measure? The short answer is that you have to ask WHY.

It could be that there is something wrong with your measure or how you administered it. It could be that there is something wrong with your conceptual definition.

It could be that your experimental manipulation failed. EXAMPLE - if a mood measure showed no difference between people whom you instructed to think positive versus negative thoughts, maybe it is because the participants did not actually think the thoughts they were supposed to or that the thoughts did not actually affect their moods. In short, it is “back to the drawing board” to revise the measure, revise the conceptual definition, or try a new manipulation.

Question 36

Q

What are demand characteristics?

Answer

A

CUES that might indicate the aim of a study to participants.

These cues can lead to participants CHANGING THEIR BEHAVIOUR OR RESPONSES based on what they think the research is about.