Chapter 5- Identifying good measurement Flashcards

1
Q

Conceptual definition

A

The researcher’s definition of the variable in question at a theoretical level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Operational definition

A

A researcher’s decision about how to measure or manipulate the conceptual variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are conceptual variables operationalized?

A

Researchers start by stating a definition of their construct (the conceptual variable) and then create an operational definition. Ex- measuring gratitude toward a partner by asking people how often they thank their partner for something they did. Even a simple variable like gender needs to be operationalized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

3 common types of variables

A
  1. Self report measures
  2. Observational measures
  3. Physiological measures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Self report measures

A

Operationalize a variable by recording people’s answers to questions about themselves in a questionnaire or interview. Diener’s five-item scale is an example, as is asking someone to report their gender identity. For children, self reports can be replaced with parent or teacher reports

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Observational measures

A

Operationalize a variable by recording observable behaviors or physical traces of behaviors. For example, operationalizing happiness by observing how many times a person smiles. Intelligence tests are also observational measures, since an individual’s intelligent behaviors are being observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Physiological measures

A

Operationalizes a variable by recording biological data, such as brain activity, hormone levels, or heart rate. Usually requires equipment to amplify, record, or analyze. One way to operationalize stress could be to measure the amount of cortisol released in the saliva (stress hormone).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which operationalization is best?

A

One construct can be operationalized many different ways. Physiological measures aren’t necessarily the most accurate, and they must also be validated by other measures. For example, fMRI tests can be used to learn that the brain works more efficiently relative to level of intelligence. However, in this case participant intelligence was determined prior to the scans using an IQ test- an observational measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many levels must each variable have?

A

All variables must have at least 2 levels, but the levels of operational variables can be coded using different scales of measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical/nominal variables

A

Variables that are categories- sex, species, and others. The researcher might assign numbers to each category, but the numbers don’t have numerical meaning or quantify the difference between categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quantitative/continuous variables

A

Variables that are coded with meaningful numbers, like height, weight, level of brain activity, or scales that produce quantitative scores (Diener’s scale of well being).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3 types of quantitative variables

A
  1. Ordinal scale
  2. Interval scale
  3. Ratio scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ordinal scale

A

Applies when the numbers of a quantitative variable represent a ranked order- these rankings could be unequal. For example, a bookstore might rank their top 10 best selling books, but we don’t know how many more copies of book 1 were sold than book 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interval scale

A

Applies to the numerals of a quantitative variable that represent equal intervals (distances) between levels. Also, numerals must not have a “true zero”. For example, the distance between each degree on the celsius scale is equal. There is also no true zero, because 0 degrees celsius (freezing point) does not mean that something has “no temperature”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ratio scale

A

Applies when the numerals of a quantitative variable have equal intervals, and a value of zero actually means “none”. For example, a score of zero on a knowledge test when measuring how many questions people get right does actually mean zero- the individual got 0 questions correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Reliability

A

Refers to how consistent the results of a measure are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Validity

A

Refers to whether the operationalization is measuring what it’s supposed to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

3 types of reliability

A
  1. Test-retest reliability
  2. Interrater reliability
  3. Internal reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Test-retest reliability

A

A study participant will get pretty much the same score each time they are measured with it. Applies whether the operationalization is self report, observational, or physiological, but it’s most relevant when researchers are measuring constructs. If participants take an IQ test one day and then take it again a month later, the pattern of scores should be consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Interrater reliability

A

Consistent scores are obtained no matter who measures the variable.

21
Q

Internal reliability

A

A study participant gives a consistent pattern of answers, no matter how the researchers phrase the question. People who agree with the first question on the well being scale should also agree with the next few questions

22
Q

Statistical devices that researchers can use for data analysis (2)

A

Scatterplots and the correlation coefficient

23
Q

How can scatterplots indicate interrater reliability?

A

Scatterplots can show interrater agreement or disagreement. Example- two observers rate how happy children seem while playing. With high interrater reliability, each observer would give each child a similar ranking, and the data points would be clustered close to the line of best fit on a scatter plot. With low interrater reliability, rankings will differ more significantly, and the points are not clustered close to the line.

24
Q

What could cause low interrater reliability?

A

This could be due to the observers not having a clear enough operational definition of happiness. Also, the coders might not have been trained well enough yet.

25
Q

Correlation coefficient (R)

A

A single number that indicates how close the dots are to the line on the scatterplot. The R value can be positive or negative, which indicates the slope direction. The R value is always between -1 and 1. A strong relationship means the R value is close to -1 or 1. If there is no relationship, r will be .00 or close to it

26
Q

Slope direction

A

The direction of the slope of the line of best fit. It can be positive, negative, or zero

27
Q

Strength of the relationship

A

The relationship between variables is considered to be strong when the dots in a scatterplot are close to the line

28
Q

How is test-retest reliability assessed using r?

A

To assess this, we would assess the same set of participants on that measure at least twice. We would record each person’s score at time 1 and time 2 (around 2 months apart) and calculate R. If R is positive and strong (.5 or higher) the test-retest reliability is good. If positive but weak, we know that the scores have changed.

29
Q

When would a low r value indicate poor test-retest reliability?

A

A low R is a sign of poor reliability if we are measuring something that should stay the same over time. If measuring IQ, it should stay the same over the span of two months. If measuring something like seasonal stress, R will be low because this is a construct that changes over time

30
Q

How is interrater reliability assessed using r?

A

To test this, we would ask two observers to rate the same participants at the same time, and then compute R. If R is positive and strong (.70 or higher), then reliability is good. If positive and weak, reliability is low. A negative correlation is rare but would indicate a problem with the observers

31
Q

R is best used to evaluate interrater reliability when observers are rating which type of variable?

A

R can be used to evaluate interrater reliability when the observers are rating a quantitative variable. A statistic called kappa is more appropriate when observers are rating a categorical variable. A kappa close to 1 means that the raters agree.

32
Q

When is internal reliability relevant?

A

Internal reliability is relevant for measures that use multiple items or observations to get at the same construct. A scale with 5 items that say roughly the same things worded differently should mean that a participant should answer all items consistently

33
Q

How are responses quantified to determine internal reliability?

A

Researchers ask the participants to answer all of the items. Then, they compute the correlations between every item and every other item. They compute the average inter-item correlation (AIC)- the average of all of these correlations. AIC from .15-.50 means that the items go well together. They compute Cronbach’s alpha- mathematically combines the AIC and the number of items in the scale. The closer it is to 1, the better the scale’s reliability

34
Q

Construct validity

A

How well a measure measures the conceptual variables it was intended for

35
Q

How are validity and reliability different?

A

Validity and reliability are separate concepts. For example, an adult’s scale might say they weigh 50 pounds every time they step on it. It’s reliable (consistent), but not valid (the measurement isn’t accurate). Reliability is necessary for validity- a measure can be less valid than it is reliable, but it can’t be more valid than it is reliable. If a measure doesn’t correlate with itself, then how can it be more strongly associated with some other variable?

36
Q

2 subjective ways to assess validity

A

Face and content validity

37
Q

3 empirical ways to assess validity

A

Criterion, convergent, and discriminant validity

38
Q

How do we measure validity of abstract concepts?

A

Abstract concepts would include happiness, intelligence, stress, and self-esteem. There is no way of directly measuring how happy someone is, although we can estimate it in multiple ways. We can know if operationalizations are measuring our construct by collecting a variety of data and evaluating it in light of our theory about the construct

39
Q

Face validity

A

A measure has face validity if it is subjectively considered to be a plausible operationalization of the conceptual variable in question. Measures with this validity align well with the conceptual definition.
Example- head circumference would have a high face validity for hat size but low face validity for intelligence

40
Q

Content validity

A

A measure must capture all parts of a defined construct. Ex- a conceptual definition of intelligence could be the ability to “reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience”. To have adequate content validity, an operationalization of intelligence should include questions or items to assess each of these 7 components.

41
Q

Criterion validity

A

Evaluates whether the measure under consideration is associated with a concrete behavioral outcome that it should be associated with, according to the conceptual definition. Criterion validity is important for self report measures because the correlation can indicate how well people’s self reports predict their actual behavior.

42
Q

Types of evidence for criterion validity (2)

A
  1. Correlational evidence for criterion validity

2. Known-groups evidence for criterion validity

43
Q

Correlational evidence for criterion validity

A

For example, a sales company is choosing between aptitude test A and aptitude test B - they have face and content validity, but do they correlate with the key behavior- work success? The company can collect data to tell them how well aptitude tests are correlated with success with sales. Both sales tests are given to all current sales representatives and then their number of sales is determined- two scatter plots are made to determine the correlation between aptitude test A and sales and aptitude test B and sales. Aptitude test A has a stronger correlation- we can conclude that test A has better criterion ability as a measure of selling ability

44
Q

Known groups paradigm

A

Another way to gather evidence for criterion validity in which researchers see whether scores on the measure can discriminate among two or more groups whose behavior is already confirmed. For example, to validate salivary cortisol as a measure of stress, a researcher could compare the salivary cortisol levels in two groups of people- those who are about to give a speech in front of a classroom and those who are in the audience. If salivary cortisol is a valid measure of stress, people in the stress group (public speaking) should have higher cortisol levels than those in the audience.

45
Q

How is the known groups method used to validate self report measures?

A

An example is the Beck Depression Inventory. This is a 21 item self report scale where participants circle one of 4 choices. The scores are added to get a total from 0-63. Participants answered the inventory. Then, psychiatrists conducted clinical interviews to diagnose each person with depression (if they were depressed), as well as their level of depression. The average BDI score of the known group of depressed people was higher than the average score of the known people who were not depressed. The level of the BDI inventory also correlated with the level of depression

46
Q

How was convergent validity determined for the BDI?

A

If the BDI really quantifies depression, it should be correlated with other self report measures of depression. A strong positive correlation between the 2 scores provides evidence for the convergent validity of the BDI. Convergent validity evidence also includes similar constructs, not just the same one. BDI scores were also strongly correlated with a score quantifying psychological well being. The strong negative correlation makes sense because people who are depressed are also expected to have lower levels of well being

47
Q

How was discriminant validity determined for the BDI?

A

The BDI should not correlate strongly with measures of constructs that are very different from depression- it should show discriminant validity with them. We would not expect the BDI to be strongly correlated with a measure of perceived physical health problems, for example. We would expect the BDI to be much more strongly correlated with similar constructs than with constructs that aren’t similar. Example- many developmental disorders have similar symptoms. We wouldn’t want a screening instrument to diagnose a child with autism when they actually have a speech delay. It’s not necessary to establish discriminant validity with random other variables- we want to focus on other variables that are “near neighbors” of the one being evaluated.

48
Q

Convergent and discriminant validity

A

Convergent validity and discriminant validity- the patterns of correlations with measures of theoretically similar and dissimilar constructs. Convergent and discriminant validity are usually evaluated together, as a pattern of correlations among self report measures. A measurement should have higher correlations (higher r values) with similar traits (convergent validity) than it does with dissimilar traits (discriminant validity).