Ch. 4 Flashcards

1
Q

Measurement

A

Is the assignment of scores to individuals so that the scores represent some characteristic of the individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Psycometrics

A

A subfield of psychology concerned with the theories and techniques of psychological measurement.

The important point here is that measurement does not require any particular instruments or procedures. What it does require is some systematic procedure for assigning scores to individuals or objects so that those scores represent the characteristic of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Constructs

A

Psychological variables that represent an individual’s mental state or experience, often not directly observable, such as personality traits, emotional states, attitudes, and abilities.

Psychological constructs cannot be observed directly.

One reason is that they often represent tendencies to think, feel, or act in certain ways.

Another reason psychological constructs cannot be observed directly is that they often involve internal processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

conceptual definition

A

Describes the behaviors and internal processes that make up a psychological construct, along with how it relates to other variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

operational definition

A

A definition of the variable in terms of precisely how it is to be measured.

For any given variable or construct, there will be multiple operational definitions.

These measures generally fall into one of three broad categories
- Self-report measures
- behavioural
- physiological

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Self-report measures

A

Measures in which participants report on their own thoughts, feelings, and actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Behavioral measures

A

Measures in which some other aspect of participants’ behavior is observed and recorded

This is an extremely broad category that includes the observation of people’s behavior both in highly structured laboratory tasks and in more natural settings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

physiological measures

A

Measures that involve recording any of a wide variety of physiological processes, including heart rate and blood pressure, galvanic skin response, hormone levels, and electrical activity and blood flow in the brain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

converging operations

A

When psychologists use multiple operational definitions of the same construct—either within a study or across studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Levels of Measurement

A

Four categories, or scales, of measurement (i.e., nominal, ordinal, interval, and ratio) that specify the types of information that a set of scores can have, and the types of statistical procedures that can be used with the scores.

Important for at least two reasons.

First, they emphasize the generality of the concept of measurement.
- Although people do not normally think of categorizing or ranking individuals as measurement, in fact, they are as long as they are done so that they represent some characteristic of the individuals.

Second, the levels of measurement can serve as a rough guide to the statistical procedures that can be used with the data and the conclusions that can be drawn from them.

Interval and ratio-level measurement are typically considered the most desirable because they permit for any indicators of central tendency to be computed (i.e., mean, median, or mode).

Also, ratio-level measurement is the only level that allows meaningful statements about ratios of scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

nominal level

A

A measurement used for categorical variables and involves assigning scores that are category labels.

Category labels communicate whether any two individuals are the same or different in terms of the variable being measured.

The essential point about nominal scales is that they do not imply any ordering among the responses.

Nominal scales thus embody the lowest level of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ordinal level

A

A measurement that involves assigning scores so that they represent the rank order of the individuals.

Ranks communicate not only whether any two individuals are the same or different in terms of the variable being measured but also whether one individual is higher or lower on that variable.

ordinal scales allow comparisons of the degree to which two individuals rate the variable.

ordinal scales fail to capture important information that will be present in the other levels of measurement we examine.

In particular, the difference between two levels of an ordinal scale cannot be assumed to be the same as the difference between two other levels

(just like you cannot assume that the gap between the runners in first and second place is equal to the gap between the runners in second and third place).

In our satisfaction scale, for example, the difference between the responses “very dissatisfied” and “somewhat dissatisfied” is probably not equivalent to the difference between “somewhat dissatisfied” and “somewhat satisfied.” Nothing in our measurement procedure allows us to determine whether the two differences reflect the same difference in psychological satisfaction.

Statisticians express this point by saying that the differences between adjacent scale values do not necessarily represent equal intervals on the underlying scale giving rise to the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

interval level

A

A measurement that involves assigning scores using numerical scales in which intervals have the same interpretation throughout.

they do not have a true zero point even if one of the scaled values happens to carry the name “zero.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ratio level

A

A measurement that involves assigning scores in such a way that there is a true zero point that represents the complete absence of the quantity.

You can think of a ratio scale as the three earlier scales rolled up in one.

Like a nominal scale, it provides a name or category for each object (the numbers serve as labels).

Like an ordinal scale, the objects are ordered (in terms of the ordering of the numbers).

Like an interval scale, the same difference at two places on the scale has the same meaning.

However, in addition, the same ratio at two places on the scale also carries the same meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reliability

A

Refers to the consistency of a measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test-Retest Reliability

A

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the same group of people at a later time, and then looking at the test-retest correlation between the two sets of scores.

This is typically done by graphing the data in a scatterplot and computing the correlation coefficient.

17
Q

Internal Consistency

A

The consistency of people’s responses across the items on a multiple-item measure.

internal consistency can only be assessed by collecting and analyzing data.

One approach is to look at a split-half correlation.

18
Q

Internal Consistency

split-half correlation

A

A score that is derived by splitting the items into two sets and examining the relationship between the two sets of scores in order to assess the internal consistency of a measure.

splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items.

Then a score is computed for each set of items, and the relationship between the two sets of scores is examined.

19
Q

Internal Consistency

Cronbach’s α

A

A statistic that measures internal consistency among items in a measure.

Conceptually, α is the mean of all possible split-half correlations for a set of items.

For example, there are 252 ways to split a set of 10 items into two sets of five.

Cronbach’s α would be the mean of the 252 split-half correlations.

Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic.

Again, a value of +.80 or greater is generally taken to indicate good internal consistency.

20
Q

Inter-rater Reliability

A

The extent to which different observers are consistent in their judgments.

Many behavioral measures involve significant judgment on the part of an observer or a rater.

Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative or an analogous statistic called Cohen’s κ when they are categorical.

Inter-rater reliability would also have been measured in Bandura’s Bobo doll study.

In this case, the observers’ ratings of how many acts of aggression a particular child committed while playing with the Bobo doll should have been highly positively correlated.

21
Q

Validity

A

The extent to which the scores from a measure represent the variable they are intended to.

a measure can be extremely reliable but have no validity whatsoever.

22
Q

Face Validity

A

The extent to which a measurement method appears, on superficial examination, to measure the construct of interest.

Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities.

can be assessed quantitatively — for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to —it is usually assessed informally.

very weak kind of evidence

One reason is that it is based on people’s intuitions about human behavior, which are frequently wrong.

It is also the case that many established measures in psychology work quite well despite lacking face validity.

23
Q

Content Validity

A

The extent to which a measure reflects all aspects of the construct of interest.

is not usually assessed quantitatively.

Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

24
Q

Criterion Validity

A

The extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them.

25
Q

Criterion Validity

concurrent validity

A

A form of criterion validity, where the criterion is measured at the same time (concurrently) as the construct.

26
Q

Criterion Validity

predictive validity

A

A form of validity whereby the criterion is measured at some point in the future (after the construct has been measured), to determine that the construct “predicts” the criterion.

27
Q

Criterion Validity

convergent validity

A

A form of criterion validity whereby new measures are correlated with existing established measures of the same construct.

Assessing convergent validity requires collecting data using the measure.

28
Q

Discriminant Validity

A

The extent to which scores on a measure of a construct are not correlated with measures of other, conceptually distinct, constructs and thus discriminate between them.

29
Q

Conceptually Defining the Construct

A

Having a clear and complete conceptual definition of a construct is a prerequisite for good measurement.

it allows you to make sound decisions about exactly how to measure the construct.

30
Q

Operationally Defining the Construct

A

Once you have a conceptual definition of the construct you are interested in studying it is time to operationally define the construct.

Recall an operational definition is a definition of the variable in terms of precisely how it is to be measured.

Since most variables are relatively abstract concepts that cannot be directly observed (e.g., stress), and observation is at the heart of the scientific method, conceptual definitions must be transformed into something that can be directly observed and measured.

31
Q

Using an Existing Measure

A

It is usually a good idea to use an existing measure that has been used successfully in previous research.

Among the advantages are that
(a) you save the time and trouble of creating your own,
(b) there is already some evidence that the measure is valid (if it has been used successfully), and
(c) your results can more easily be compared with and combined with previous results.

In fact, if there already exists a reliable and valid measure of a construct, other researchers might expect you to use it unless you have a good and clearly stated reason for not doing so.

If you choose to use an existing measure, you may still have to choose among several alternatives.
- most common one.
- one with the best evidence of reliability and validity.
- one that best measures a particular aspect of a construct that you are interested in.
- one that would be easiest to use.

32
Q

Creating Your Own Measure

A

First, be aware that most new measures in psychology are really variations of existing measures, so you should still look to the research literature for ideas.

Perhaps you can modify an existing questionnaire, create a paper-and-pencil version of a measure that is normally computerized (or vice versa), or adapt a measure that has traditionally been used for another purpose.

When you create a new measure, you should strive for simplicity.

create a set of clear instructions using simple language that you can present in writing or read aloud (or both).

It is also a good idea to include one or more practice items so that participants can become familiar with the task, and to build in an opportunity for them to ask questions before continuing.

It is also best to keep the measure brief to avoid boring or frustrating your participants to the point that their responses start to become less reliable and valid.

The need for brevity, needs to be weighed against the fact that it is nearly always better for a measure to include multiple items rather than a single item.

There are two reasons for this.

One is a matter of content validity. Multiple items are often required to cover a construct adequately.

The other is a matter of reliability. People’s responses to single items can be influenced by all sorts of irrelevant factors—misunderstanding the particular item, a momentary distraction, or a simple error such as checking the wrong response option. But when several responses are summed or averaged, the effects of these irrelevant factors tend to cancel each other out to produce more reliable scores.

Finally, the very best way to assure yourself that your measure has clear instructions, includes sufficient practice, and is an appropriate length is to test several people.

Observe them as they complete the task, time them, and ask them afterward to comment on how easy or difficult it was, whether the instructions were clear, and anything else you might be wondering about.

33
Q

Implementing the Measure

A

You will want to implement any measure in a way that maximizes its reliability and validity.

In most cases, it is best to test everyone under similar conditions that, ideally, are quiet and free of distractions.

Participants are often tested in groups because it is efficient, but be aware that it can create distractions that reduce the reliability and validity of the measure.

Be aware also that people can react in a variety of ways to being measured that reduce the reliability and validity of the scores.

Although some disagreeable participants might intentionally respond in ways meant to disrupt a study, participant reactivity is more likely to take the opposite form.

Agreeable participants might respond in ways they believe they are expected to.

research studies can have built-in demand characteristics

your own expectations can bias participants’ behaviors in unintended ways.

34
Q

Implementing the Measure

socially desirable responding

A

When participants respond in ways that they think are socially acceptable.

35
Q

Implementing the Measure

demand characteristics

A

Subtle cues that reveal to participants how the researcher expects them to respond in the experiment.

36
Q

Implementing the Measure

precautions you can take to minimize these kinds of reactivity.

A

One is to make the procedure as clear and brief as possible so that participants are not tempted to vent their frustrations on your results.

Another is to guarantee participants’ anonymity and make clear to them that you are doing so.

Although informed consent requires telling participants what they will be doing, it does not require revealing your hypothesis or other information that might suggest to participants how you expect them to respond.

Finally, the effects of your expectations can be minimized by arranging to have the measure administered by a helper who is “blind” or unaware of its intent or of any hypothesis being tested.

you should standardize all interactions between researchers and participants—for example, by always reading the same set of instructions word for word.

37
Q

Evaluating the Measure

A

Even if the measure has been used extensively by other researchers and has already shown evidence of reliability and validity, you should not assume that it worked as expected for your particular sample and under your particular testing conditions.

In most research designs, it is not possible to assess test-retest reliability because participants are tested at only one time.

customary to assess internal consistency for any multiple-item measure—usually by looking at a split-half correlation or Cronbach’s α.

Criterion validity can be assessed in various ways.

For example, if your study included more than one measure of the same construct or measures of conceptually distinct constructs, then you should look at the correlations among these measures to be sure that they fit your expectations.

Note also that a successful experimental manipulation also provides evidence of criterion validity.

If your newly collected data cast doubt on the reliability or validity of your measure.

The short answer is that you have to ask why.
It could be that there is something wrong with your measure or how you administered it. It could be that there is something wrong with your conceptual definition. It could be that your experimental manipulation failed.