Midterm 2 - Chpt. 5 Flashcards

1
Q

Concepts (in Measurement):

A
  • A concrete way to measure an abstract concept
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quality of operational definitions evaluated by (2):

A
  1. Reliability
    – Is your measure consistent?
  2. Construct Validity
    – Are you measuring what you hope you’re measuring?
    – Accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are concepts helpful?

A

Ways to evaluate operational definitions

Especially measurement instruments (e.g., scales, surveys, coding schemes)

Check that reliability & validity have been demonstrated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which kinds of designs are typically most concerned with demonstrating reliability & validity?

A

A) Correlation/Surveys/quasi-experiments
B) Experimental designs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 components to evaluating operational definitions:

A
  • Reliability
  • Construct Validity
  • Internal Validity
  • External Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability:

A
  • Does it measure the construct with little error?
  • Is it a stable & consistent measure?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Construct

A

Are we measuring what we think we’re measuring?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Internal Validity

A

Can we infer causality?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

External Validity

A

Can we generalize our findings beyond this group setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reliability - True Scores

A

Each participant has a true score
₋ That’s the target, but we can’t observe it

Must rely on measurement which has “measurement error” (Deviation from target)

A measure is considered reliable if it has relatively little measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the first concern with any measure?

A

Reliability is your first concern with any measure

If it isn’t measuring a thing consistently, then validity (accuracy) is not even an issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Reliability:

A
  1. Test-retest reliability
  2. Internal consistency reliability
  3. Inter-rater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Test-Retest Reliability

A

Is a Participant’s score consistent across time
- EX: an extrovert; should be socializing, not staying in
Positive linear relationship/correlation
- Rule of thumb: min r =+ 0.80
*For relatively stable constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Internal Consistency Reliability

A

Is a P’s score on this construct similar across items that are aimed to measure related aspects of the construct
- Items = questions (“is talkative”, “is full of energy”, “is rarely shy”)
From text:
- Split-half
- Cronbach
- Item-total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Inter-rater Reliability:

A

How similar are a participant’s score when measured by different raters?

Relevant when behaviour is observed or texts are coded by multiple “raters”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Validity

A
  • Are you measuring what you hope you’re measuring
  • Accuracy
  • Is it measuring what ‘it’ is supposed t
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Components of Construct Validity

A
  • Face Validity
  • Content Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Face Validity

A

Look at each item.
Does it look like it’s assessing loneliness?
- If yes, then high face validity
Usually happens, but not a requirement of measures.
Alternative to FV:
– Give a whole bunch of items to a large group, see what predicts loneliness (don’t care Why)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Content Validity

A

Look at the whole measure.
Is it capturing all the important parts of what it means to be lonely? And nothing more
Theoretical question
Can be debated!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Predictive Validity

A

Predicts future, conceptually related behaviours

  • Q: Do people with high scores on your measure at T1 go on to do relevant behaviours at T2?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Concurrent Validity

A

Able to distinguish between theoretically relevant behaviours

  • Q: Do people with high scores on your measure behave in ways you’d expect them to behave if they were high on this construct?
  • Constructs that are supposed to be related, ARE related
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of Construct Validity - Behaviours

A
  • Predictive
  • Concurrent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Types of Construct Validity - Other Constructs

A
  • Convergent
  • Discriminant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Convergent Validity

A

Related to scores on measures of similar constructs

  • Q: Do people with high scores on your measure have high (or low) scores on measures of related constructs (ie high correlation)?
  • Do your measurements of happiness compare to other known studies of happiness?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Discriminant Validity

A

Not related (ie low or zero correlation) to what it shouldn’t relate to

  • Q: Do people with high scores on your measure randomly vary in how much they show constructs that could be alternative explanations of what your scale is measuring?
26
Q

SUMMARY - Evaluating a measure

A

Reliability:
- Test-retest
- Inter-rater
- Internal consistency

Construct Validity:
- Face
- Content
- Predictive
- Concurrent
- Convergent
- Discriminant

External Validity:
- Generalizability

27
Q

Self-report measures:

A

Used to study personality traits, clinical counselling, clinical diagnoses

Better to use existing measures/scales than your own
- Will have reliability and validity data to help decide which measure to use
- Also able to compare findings with prior research using the same measure
- Can find using Mental Measures Yearbook

28
Q

READINGS

A
29
Q

Reliability - Any measurement involves two components:

A
  • True Score
  • Measurement Error
30
Q

True Score

A

person’s actual level of the variable of interest

31
Q

Measurement Error

A

any contributor to a measure’s score that is not based on the actual level of the variable of interest
- i.e. measuring response time with button pressing (after a beep)

Other factors: Someone reacts quickly, but presses the wrong button, then has to react again to press the right button (CONTRIBUTES TO OUR MEASURE OF REACTION TIME, BUT UNRELATED TO WHAT WE’RE TRULY INTERESTED IN)

Consider how much measurement error exists in a measure

32
Q

How do measurement errors affect reliability?

A

Excess measurement error makes it hard to detect any true relationship between variables, but cause amounts of true variance being captured is small

33
Q

Are there measures that DON’T have measurement error?

A

ALL measures contain some amount of measurement error

Key is to minimize this measurement error and maximize the amount of true score being captured by any particular measure

34
Q

In many areas, reliability can be increased by…

A

making multiple observations of the same variable

A method normally found in assessing personality traits and cognitive abilities

Reliability increases as the number of items increases

i.e. a scale with ten or more questions designed to assess the same trait

35
Q

How can we know how reliable a measure is?

A

Can assess the stability of measures using a correlation coefficient: number that tells us how strongly two variables are related to each other

36
Q

Most common method of calculating correlation coefficients:

A

Most common method of calculation being the Pearson Correlation Coefficient
Ranges from 0.00 - +1.00 when positive, and 0.00 - -1.00 when negative
Lowercase r

0 = no relation
Closer to 1 or -1 = strong correlation

Can be positively linear or negatively linear

37
Q

Test-retest reliability:

A

Test-retest reliability is assessed by giving many people the same measure twice
- EX: reliability can be assessed by giving it to a group of people on one day and then again a week later
- With two scores for each person, you would calculate a correlation coefficient to determine the relationship between the first/second scores

No agreed-upon cut-off for determining when a correlation is high enough to be acceptable, but some suggest a test-retest correlation of at least .80

38
Q

4 issues with Test-retest reliability:

A

Practice effects - correlations between two scores can become inflated if people are likely to remember how they responded the first time
SOLUTION - alternate forms reliability: two different forms of the same test are administered on two separate occasions

Some constructs are relatively stable across time - like personality traits

Some aren’t, and expected to change - like mood
Obtaining two measures from the same people at different points can be difficult

39
Q

Internal consistency reliability:

A

Examines how successful the different items in a scale are at measuring the same construct or variable

EX: think of each item as a different attempt to measure the same construct

  • When people respond to similarly across these different attempts, it suggests that the measure is reliable
40
Q

Most common indicator of internal consistency

A

Cronbach’s Alpha
Researcher calculates how well each item correlates with every other item, which produces a large number of inter-item correlations

Give info about each individual item and its relation to the total score

Items that don’t correlate with the others can be eliminated to increase measure’s reliability

While reliability can increase with longer measures, a shorter version can be more convenient to administer and also have accepted reliability

41
Q

Other forms of internal consistency

A

Recently superior alternative - coefficient omega

Another form - split-half reality: attempts to determine the degree to which all the items in a scale are related to one another
- Split items in a scale into two parts based on some random process, then administering both halves to a group of people
- After scoring each half, can calculate a correlation to see how well performance on one half is related to performance on the second half

42
Q

Inter-rater reliability:

A

In some research, raters observe behaviours and make ratings or judgements
- i.e. rating the amount of perceived emotion in someone

To make these ratings, raters follow a strict set of guidelines in order to make these judgements as systematic as possible
- To improve, have more than one person as a rater

Reliability of these ratings can be determined by calculating inter-rater reliability: extent to which raters agree in their observations, so if one rater gives a high score for a target, the other ratings also rate this behaviour as high

43
Q

A measure can be highly reliable, but that doesn’t mean..

A

its measuring what it’s intending to measure

44
Q

Internal Validity

A

Degree to which an experiment is well-designed and can support a causal claim

45
Q

When it comes to operationalization, what’s most relevant is…

A

Construct validity: whether a variable’s operationalization is accurate in capturing the intended phenomenon

Degree to which the operationalization of a variable reflects the true theoretical meaning of the variable

Variables can be measured and manipulated in a variety of different ways, and there is never a perfect operationalization of a variable
- Thus, different indicators of construct validity are used to build an argument that a construct has been accurately operationalized and is properly measured by a particular scale

46
Q

Indicators of construct validity:

A

□ How do we know that a measure is a valid indicator of a particular construct?
□ We gather construct validity information by examining many different forms of validity
- Helps us build an overall case for the broader category

47
Q

Face Validity:

A

The measure appears, “on the face of it”, to measure what it’s supposed to measure - whether it appears to asses the intended variable

  • Involves only a judgement of whether the content of the measure appears to measure this variable; subjective process
  • i.e. pointless Buzzfeed quizzes
  • Not efficient enough by itself to conclude that a measure has a construct validity
48
Q

Content validity:

A

Evaluated by comparing the content of the measure with the theoretical definition of the construct, ensuring that the measure captures all aspects of the construct and nothing extraneous to the construct
- i.e. a construct has 3 different aspects, your scale should try to measure all 3 of these aspects

  • Focus on assessing whether the content of a measure reflects the meaning of the construct being measured (like face validity)
49
Q

Predictive validity:

A

seeing if the measure can usefully predict some future behaviour that is theoretically related

  • EX: academic motivation at beginning to predict final grades at the end of term
  • Grades are the standard or criterion by which we are judging the validity of our measure
  • Important when studying measures designed to improve our ability to make predictions about different behaviours
50
Q

Concurrent validity:

A

similar to predictive validity in that it examines the prediction of a criterion, but instead of a future behaviour it examines a criterion measured at the same time as the measure is administered

  • One common method is to study whether two or more groups of people differ on the measure in expected ways
  • i.e. psychopaths versus the general public

Another being studying how people who score either low or high on the measure behave in different situations

51
Q

Convergent validity:

A

extent to which scores on the target measure are related to scores on the other measures of the same construct or similar constructs

Different measures of similar constructs should “converge”, or be related to one another

EX: one measure of psychopathy should correlate highly with another psychopathy measure or measures of similar constructs

52
Q

Discriminant validity:

A

sort of the opposite of convergent validity, in that it’s a demonstration that the measure is not related to variables that are conceptually unrelated to the construct of interest

The measure should discriminate between the construct being measured and other, unrelated constructs

Scores on the mean should diverge rather than converge with these measurements of unrelated constructs

53
Q

Reactivity:

A

potential issue with measuring behaviours is that people can behave differently when they know they’re being observed

People reacting to the act of measurement and changing their behaviour
- If this occurs, then we’re no longer learning about how someone would behave in the real world, only how they would behave when they know they’re being observed

Measures of behaviour vary in terms of their potential reactivity

54
Q

How to minimize reactivity

A

Ways to minimize:
- allowing time for people to become used to the presence of the observer or the recording equipment
- Measure something without that person noticing or knowing (AKA non-reactive or nonobtrusive operationalizations)
- Clever ways of measuring

55
Q

A variable’s levels can be conceptualized in terms of 4 different kinds of measurement scales:

A
  • Nominal
  • Ordinal
  • Interval
  • Ratio
56
Q

How can the different measurement scales for variables affect research (pos/neg)?

A
  • research, including conclusions drawn
  • options available for establishing construct validity
  • the kinds of statistical analyses that are possible and appropriate to use when analyzing your data
57
Q

Nominal

A

no numerical or quantitative properties

Instead, categories or groups simply differ from one another
EX: country of birth - people are born in a certain country, and we can classify people based on what country they were born in
- Don’t have numerical properties, can’t be more or less than when it comes to the country of birth: levels are merely different

EX: attractive vs not attractive
- Doesn’t tell you anything about how attractive they find the person, but tells you whether or not they do find them attractive (like a yes or no)

58
Q

Ordinal

A

allows us to order the levels of the variable in terms of rank

Instead of having categories that are different, the categories can be ordered from first to last
- EX: Olympic medals, birth order

However, we don’t know anything about the distance or difference between each element

No particular value Is attached to the intervals between numbers

59
Q

Interval

A

difference between the numbers on the scale are equal in size
- EX: difference between 1 and 2 is the same size as the differences between 2 and 3

  • Increases of two (15-17)
  • Zero doesn’t indicate a complete absence of quantity; only an arbitrary reference point
  • Cannot form ratios based on these numbers (Temperature example)
60
Q

Ratio

A

like an interval scale, except it does have a meaningful absolute zero point that indicates total absence of the variable being measured

  • Can enable statements such as “a person who weighs 100 kilograms weighs twice as much as a person who weights 50 kilograms)
    (Not possible with other scales)
  • Often used with variables involving physical measures