Midterm 2 - Chpt. 5 Flashcards

1
Q

Concepts (in Measurement):

A
  • A concrete way to measure an abstract concept
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quality of operational definitions evaluated by (2):

A
  1. Reliability
    – Is your measure consistent?
  2. Construct Validity
    – Are you measuring what you hope you’re measuring?
    – Accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are concepts helpful?

A

Ways to evaluate operational definitions

Especially measurement instruments (e.g., scales, surveys, coding schemes)

Check that reliability & validity have been demonstrated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which kinds of designs are typically most concerned with demonstrating reliability & validity?

A

A) Correlation/Surveys/quasi-experiments
B) Experimental designs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 components to evaluating operational definitions:

A
  • Reliability
  • Construct Validity
  • Internal Validity
  • External Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability:

A
  • Does it measure the construct with little error?
  • Is it a stable & consistent measure?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Construct

A

Are we measuring what we think we’re measuring?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Internal Validity

A

Can we infer causality?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

External Validity

A

Can we generalize our findings beyond this group setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reliability - True Scores

A

Each participant has a true score
₋ That’s the target, but we can’t observe it

Must rely on measurement which has “measurement error” (Deviation from target)

A measure is considered reliable if it has relatively little measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the first concern with any measure?

A

Reliability is your first concern with any measure

If it isn’t measuring a thing consistently, then validity (accuracy) is not even an issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Reliability:

A
  1. Test-retest reliability
  2. Internal consistency reliability
  3. Inter-rater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Test-Retest Reliability

A

Is a Participant’s score consistent across time
- EX: an extrovert; should be socializing, not staying in
Positive linear relationship/correlation
- Rule of thumb: min r =+ 0.80
*For relatively stable constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Internal Consistency Reliability

A

Is a P’s score on this construct similar across items that are aimed to measure related aspects of the construct
- Items = questions (“is talkative”, “is full of energy”, “is rarely shy”)
From text:
- Split-half
- Cronbach
- Item-total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Inter-rater Reliability:

A

How similar are a participant’s score when measured by different raters?

Relevant when behaviour is observed or texts are coded by multiple “raters”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Validity

A
  • Are you measuring what you hope you’re measuring
  • Accuracy
  • Is it measuring what ‘it’ is supposed t
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Components of Construct Validity

A
  • Face Validity
  • Content Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Face Validity

A

Look at each item.
Does it look like it’s assessing loneliness?
- If yes, then high face validity
Usually happens, but not a requirement of measures.
Alternative to FV:
– Give a whole bunch of items to a large group, see what predicts loneliness (don’t care Why)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Content Validity

A

Look at the whole measure.
Is it capturing all the important parts of what it means to be lonely? And nothing more
Theoretical question
Can be debated!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Predictive Validity

A

Predicts future, conceptually related behaviours

  • Q: Do people with high scores on your measure at T1 go on to do relevant behaviours at T2?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Concurrent Validity

A

Able to distinguish between theoretically relevant behaviours

  • Q: Do people with high scores on your measure behave in ways you’d expect them to behave if they were high on this construct?
  • Constructs that are supposed to be related, ARE related
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of Construct Validity - Behaviours

A
  • Predictive
  • Concurrent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Types of Construct Validity - Other Constructs

A
  • Convergent
  • Discriminant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Convergent Validity

A

Related to scores on measures of similar constructs

  • Q: Do people with high scores on your measure have high (or low) scores on measures of related constructs (ie high correlation)?
  • Do your measurements of happiness compare to other known studies of happiness?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Discriminant Validity
Not related (ie low or zero correlation) to what it shouldn't relate to - Q: Do people with high scores on your measure randomly vary in how much they show constructs that could be alternative explanations of what your scale is measuring?
26
SUMMARY - Evaluating a measure
Reliability: - Test-retest - Inter-rater - Internal consistency Construct Validity: - Face - Content - Predictive - Concurrent - Convergent - Discriminant External Validity: - Generalizability
27
Self-report measures:
Used to study personality traits, clinical counselling, clinical diagnoses Better to use existing measures/scales than your own - Will have reliability and validity data to help decide which measure to use - Also able to compare findings with prior research using the same measure - Can find using Mental Measures Yearbook
28
READINGS
29
Reliability - Any measurement involves two components:
- True Score - Measurement Error
30
True Score
person's actual level of the variable of interest
31
Measurement Error
any contributor to a measure's score that is not based on the actual level of the variable of interest - i.e. measuring response time with button pressing (after a beep) Other factors: Someone reacts quickly, but presses the wrong button, then has to react again to press the right button (CONTRIBUTES TO OUR MEASURE OF REACTION TIME, BUT UNRELATED TO WHAT WE'RE TRULY INTERESTED IN) Consider how much measurement error exists in a measure
32
How do measurement errors affect reliability?
Excess measurement error makes it hard to detect any true relationship between variables, but cause amounts of true variance being captured is small
33
Are there measures that DON'T have measurement error?
ALL measures contain some amount of measurement error Key is to minimize this measurement error and maximize the amount of true score being captured by any particular measure
34
In many areas, reliability can be increased by...
making multiple observations of the same variable A method normally found in assessing personality traits and cognitive abilities Reliability increases as the number of items increases i.e. a scale with ten or more questions designed to assess the same trait
35
How can we know how reliable a measure is?
Can assess the stability of measures using a correlation coefficient: number that tells us how strongly two variables are related to each other
36
Most common method of calculating correlation coefficients:
Most common method of calculation being the Pearson Correlation Coefficient Ranges from 0.00 - +1.00 when positive, and 0.00 - -1.00 when negative Lowercase r 0 = no relation Closer to 1 or -1 = strong correlation Can be positively linear or negatively linear
37
Test-retest reliability:
Test-retest reliability is assessed by giving many people the same measure twice - EX: reliability can be assessed by giving it to a group of people on one day and then again a week later - With two scores for each person, you would calculate a correlation coefficient to determine the relationship between the first/second scores No agreed-upon cut-off for determining when a correlation is high enough to be acceptable, but some suggest a test-retest correlation of at least .80
38
4 issues with Test-retest reliability:
Practice effects - correlations between two scores can become inflated if people are likely to remember how they responded the first time SOLUTION - alternate forms reliability: two different forms of the same test are administered on two separate occasions Some constructs are relatively stable across time - like personality traits Some aren't, and expected to change - like mood Obtaining two measures from the same people at different points can be difficult
39
Internal consistency reliability:
Examines how successful the different items in a scale are at measuring the same construct or variable EX: think of each item as a different attempt to measure the same construct - When people respond to similarly across these different attempts, it suggests that the measure is reliable
40
Most common indicator of internal consistency
Cronbach's Alpha Researcher calculates how well each item correlates with every other item, which produces a large number of inter-item correlations Give info about each individual item and its relation to the total score Items that don't correlate with the others can be eliminated to increase measure's reliability While reliability can increase with longer measures, a shorter version can be more convenient to administer and also have accepted reliability
41
Other forms of internal consistency
Recently superior alternative - coefficient omega Another form - split-half reality: attempts to determine the degree to which all the items in a scale are related to one another - Split items in a scale into two parts based on some random process, then administering both halves to a group of people - After scoring each half, can calculate a correlation to see how well performance on one half is related to performance on the second half
42
Inter-rater reliability:
In some research, raters observe behaviours and make ratings or judgements - i.e. rating the amount of perceived emotion in someone To make these ratings, raters follow a strict set of guidelines in order to make these judgements as systematic as possible - To improve, have more than one person as a rater Reliability of these ratings can be determined by calculating inter-rater reliability: extent to which raters agree in their observations, so if one rater gives a high score for a target, the other ratings also rate this behaviour as high
43
A measure can be highly reliable, but that doesn't mean..
its measuring what it's intending to measure
44
Internal Validity
Degree to which an experiment is well-designed and can support a causal claim
45
When it comes to operationalization, what's most relevant is...
Construct validity: whether a variable's operationalization is accurate in capturing the intended phenomenon Degree to which the operationalization of a variable reflects the true theoretical meaning of the variable Variables can be measured and manipulated in a variety of different ways, and there is never a perfect operationalization of a variable - Thus, different indicators of construct validity are used to build an argument that a construct has been accurately operationalized and is properly measured by a particular scale
46
Indicators of construct validity:
□ How do we know that a measure is a valid indicator of a particular construct? □ We gather construct validity information by examining many different forms of validity - Helps us build an overall case for the broader category
47
Face Validity:
The measure appears, "on the face of it", to measure what it's supposed to measure - whether it appears to asses the intended variable - Involves only a judgement of whether the content of the measure appears to measure this variable; subjective process - i.e. pointless Buzzfeed quizzes - Not efficient enough by itself to conclude that a measure has a construct validity
48
Content validity:
Evaluated by comparing the content of the measure with the theoretical definition of the construct, ensuring that the measure captures all aspects of the construct and nothing extraneous to the construct - i.e. a construct has 3 different aspects, your scale should try to measure all 3 of these aspects - Focus on assessing whether the content of a measure reflects the meaning of the construct being measured (like face validity)
49
Predictive validity:
seeing if the measure can usefully predict some future behaviour that is theoretically related - EX: academic motivation at beginning to predict final grades at the end of term - Grades are the standard or criterion by which we are judging the validity of our measure - Important when studying measures designed to improve our ability to make predictions about different behaviours
50
Concurrent validity:
similar to predictive validity in that it examines the prediction of a criterion, but instead of a future behaviour it examines a criterion measured at the same time as the measure is administered - One common method is to study whether two or more groups of people differ on the measure in expected ways - i.e. psychopaths versus the general public Another being studying how people who score either low or high on the measure behave in different situations
51
Convergent validity:
extent to which scores on the target measure are related to scores on the other measures of the same construct or similar constructs Different measures of similar constructs should "converge", or be related to one another EX: one measure of psychopathy should correlate highly with another psychopathy measure or measures of similar constructs
52
Discriminant validity:
sort of the opposite of convergent validity, in that it's a demonstration that the measure is not related to variables that are conceptually unrelated to the construct of interest The measure should discriminate between the construct being measured and other, unrelated constructs Scores on the mean should diverge rather than converge with these measurements of unrelated constructs
53
Reactivity:
potential issue with measuring behaviours is that people can behave differently when they know they're being observed People reacting to the act of measurement and changing their behaviour - If this occurs, then we're no longer learning about how someone would behave in the real world, only how they would behave when they know they're being observed Measures of behaviour vary in terms of their potential reactivity
54
How to minimize reactivity
Ways to minimize: - allowing time for people to become used to the presence of the observer or the recording equipment - Measure something without that person noticing or knowing (AKA non-reactive or nonobtrusive operationalizations) - Clever ways of measuring
55
A variable's levels can be conceptualized in terms of 4 different kinds of measurement scales:
- Nominal - Ordinal - Interval - Ratio
56
How can the different measurement scales for variables affect research (pos/neg)?
- research, including conclusions drawn - options available for establishing construct validity - the kinds of statistical analyses that are possible and appropriate to use when analyzing your data
57
Nominal
no numerical or quantitative properties Instead, categories or groups simply differ from one another EX: country of birth - people are born in a certain country, and we can classify people based on what country they were born in - Don't have numerical properties, can't be more or less than when it comes to the country of birth: levels are merely different EX: attractive vs not attractive - Doesn't tell you anything about how attractive they find the person, but tells you whether or not they do find them attractive (like a yes or no)
58
Ordinal
allows us to order the levels of the variable in terms of rank Instead of having categories that are different, the categories can be ordered from first to last - EX: Olympic medals, birth order However, we don't know anything about the distance or difference between each element No particular value Is attached to the intervals between numbers
59
Interval
difference between the numbers on the scale are equal in size - EX: difference between 1 and 2 is the same size as the differences between 2 and 3 - Increases of two (15-17) - Zero doesn't indicate a complete absence of quantity; only an arbitrary reference point - Cannot form ratios based on these numbers (Temperature example)
60
Ratio
like an interval scale, except it does have a meaningful absolute zero point that indicates total absence of the variable being measured - Can enable statements such as "a person who weighs 100 kilograms weighs twice as much as a person who weights 50 kilograms) (Not possible with other scales) - Often used with variables involving physical measures