lecture 17 Flashcards
What do measures attempt to quantify? Example?
Measures attempt to quantify the “true value” of a latent (or hidden) psychological construct
“How extroverted are you?”
the way the construct would be presented to us if we assessed the construct perfectly. ExThe measurement would show exactly what it is.
Are humans stochiastic?
yes
what does stochiastic mean?
functionally stochastic (unpredictable in ways we don’t understand)
Will measures ever be perfectly reliable?
We humans are not perfectly reliable
Thus, measures will never be perfectly reliable
what is measurement error?
measurement error is the difference between what we see and the true value
What is the best we can do?
§ Best we can do is estimate psychological constructs:
combining a quess about true value and measurement error. We will never know what mixture this is. We won’t know how much true value and measurement error there is.
what is the goal around measurement error?
minimize measurement error on average and hopefully maximize true value on average. This limits what we can say about the individual
What example was provided for measurement error?
assume that a person knows for certain their true value is 7.5 out of ten, but the scale you provided them doesn’t go to half numbers, already the measurement instrument is forcing the participant into measurement error. AKA we are getting a half unit of measurement error already.
give examples of sources of measurement error.
“Should I select 7 or 8?”
Response: 7
“Oops, flipped the scale options”
Response: 2 (but meant 8)
many people also flip flop the scale (think small numbers are better)
“Recently hanging out with gregarious friends; I’m comparatively less extroverted,”
Response: 6
“Does extroverted mean extraordinary?”
Response: 3
“I’m bored, choose middle”
Response: 5
Whats the intelligence test example of measurement error?
Intelligence test (IQ)
True value: 100
Sources of measurement error:
Unusually stressful day (score will probably be lower than true value)
Oops, had too much coffee (score will probably be lower than true value)
Oops, had too little coffee (score will probably be lower than true value)
Had to guess on 4 items (any good IQ test would require that you have to guess on some questions otherwise we would have a ceiling effect
Guessed all 4 correct!! Score = 115
Guessed all 4 wrong!!! Score = 85
Guessed 2 correct, 2 wrong. Score = 100
Why is estimation fundamentally limited? but what can we do?
We cannot remove all measurement error
Thus, we never obtain a person’s true value
Best we can do is estimate a person’s true value
While minimizing measurement error
While trying to measure only the intended psychological construct and nothing else
(our intelligence test shouldn’t correlate with cultural backround, first language, gender etc. If this correlates with things we don;t think they should, this reflects measurement error etc. )
are measurement instruments created equal?
Measurement instruments are not created equal! We revise our instruments overtime
Do you need reliability before you get validity?
yes
What is reliability?
What is Reliability: How consistent is a measurement tool?
Is my score similar each time?
Try the color test: www.colorquiz.com
If we can assume that the true value isn’t changing what should happen?
if we can assume that the true value isnt changing, your score should be similar overtime.
What is validity?
Validity: Does the tool measure the psychological construct it claims to?
Is my score representative of something meaningful?
Color test: Does my color score reflect my personality?
What are the 2 measruement goals?
- be consistent (reliability)
- hit the target (validity)
What is the visual analogy of the true value
visual metaphor of a dart board. lightening bolt is a measurement occasion.
tests can be reliable but they are not hitting in the right spot.
if our darts are hitting all over, we know its not at the true value most of the time. We need consistency/ It needs to land in the same place multiple times.
What questions do we need to ask ourselves about mesurement error?
this is derived from the theories we have about the characteristics and the true value. After we have an idea of what the true valie should be, we need to make sure that the scale allows people to assess that true value
What is the nature of the true value?
Does my measurement instrument allow people to express their true value?
If not, estimates will appear to fluctuate even though true value is stable
How does myers briggs type inventory (16 personality types test) encourage measurement error?
this introduces error by pushing people in the middle to either of 2 sides like introverted or extraverted.
Is the big five personality encouraging measurement error?
not so much. this derives theoretically from the idea that intra and extraversion is a mixture and identified by a general trend. It predicts that most people are in between the 2.
on the dart board analogy how would we know we are being reliable and valid?
Reliable: Hitting the same place
Valid: Hitting the target
on the dart board analogy how would we know we are being reliable and not valid?
Reliable: Hitting the same place
Invalid: Off-target
Estimates are biased
on the dart board analogy how would we know we are being not reliable and not valid?
Unreliable: Not hitting the same place
Low validity: No bias, but rarely on target
What is internal consistency?
- Internal consistency: Do the items in the measure correlate with each other?
Example: IQ test
if you fail the easier question you should fail the harder question
Question 3/ What is the square root of pi?
this is probably measuring crystallized knowledge not intelligence. It probably won’t correlate with performance on pattern matching
Question 4: what is your favourite colour?
if this doesn’t correlate with the patter nshown from the other masures, we would say that these are measuring something different because they aren’t in the same cluster together.
what is Test-retest reliability
How consistent is the measure over time?
this is the raven’s matrices
we expect there would be changes in the true value of this goes down
What is Interrater reliability
Do observers agree on ratings?
observers are like items on a scale. We want them to all agree on what they see.
the manual will bias the results. It will increase the consistency of scores so the process of choosing what the peopel are paying attention to is the validity process/
the measurement process often works backwards, we start out by assuming that caregivers are attached to ttheir caregivers in different ways
How is reliability often expressed?
Reliability is often expressed in terms of correlation Pearson r correlation coefficient (or reliability coefficient)
What are the ways that you can measure internal consistency?
Correlate estimates between items
Method 1: Split-half procedure
split measruement instrument into 2 equal parts. Will get the score from one half and comare it to the score on the other. Ex; could take all even items, then calculate score if you only did the odd items, the even and od scores should be similar if they are measuring the same tihings.
Method 2: Cronbach’s alpha procedure
𝛂 or 𝛚
get each of the pairs (all different options) of items on the measurement instrument and correlate performance on each of these pairs. Then you take the average of all of those correlations. The nice things about this procedure means each items is important in establishing reliability of scle. Als otells you which items are most problematic because it will have low correlations with all of the other items.
What is the benchmark for intenral consistency using cronbach’s alpha?
Benchmark internal consistency:
r = .80 is a good starting point
when it comes to internal consistency we want to see a positive r of at least .80.
How do you measure test-retest reliability?
Test-retest reliability
Correlate estimates between measurement occasions
Participants complete same measure at Time 1 and Time 2
(Repeated-measures design)
repeated measures design where time is the independent variable.
What are moderators of test-retest reliability?
Moderators of test-retest reliability:
How much time between time 1 and time 2?
How stable is the construct?
IQ vs. attitudes
we think IQ is stable but attitudes towards things are more changeable.
What is the benchmark for test-retest reliability?
Benchmark test-retest reliability: § r = .80 is a good starting point
Assuming little time between measurements
Assuming construct should be relatively stable
how do you measure interrater reliability?
Interrater reliability
Correlate estimates between observers
Observers rate same behavior
Kappa = 𝛋
kappa represents correlations to interrater reliability. it is almost always lower than we would like.
Often lower than is desirable!
Do we have benchmarks for interrater reliability?
no
How do you increase interrater reliability?
Increasing interrater reliability:
Generate concrete, easily observable guidelines Initiate dialogue between observers
Practice with feedback
compare your ratings with the expected ratigns with samples.
ho would you look at reliability with the ravens matrices example?
Internal consistency
Split half reliability – odd & even items correlated, r = .93-.96
Cronbach’s Alpha – average item correlations, r = .88-.90
Test-retest reliability
2 days, r = .97
4 weeks, r = .87
79 years, r = .54
this is not concerning because we expect the correlation to be less strong overtime
Interrater reliability Not applicable
What is construct validity?
An evaluation of whether the measurement instrument quantifies the psychological construct
What are the 7 types of validity?
Face valdiity
content validity
predicitve validity
concurrent validity
convergent validity
discriminant validity
reactivity
What is face valdiity?
Face validity: Does the measure appear to assess the psychological construct?
Do Raven’s matrices seem like intelligence?
when I look at the measurement, does it look like it is capturing the construct? everyone can have their own judgement
this is a good place to start but it is possible that the person is wrong about the validity
researchers may intentionally obscure the face validity to prevent people from knowing what is being measured.
What is content validity?
Content validity: Does the measure appear to assess ‘the whole construct and nothing but the construct’?
Are Raven’s comprehensive?
is it contaminated by other variables like language background?
what is Predictive validity?
Predictive validity: Does the measure correlate with future
behaviors relevant to the construct?
- If ‘no,’ why measure the construct?
- Raven’s correlates with job performance, r = .3-.5
if you have someone take the ravens test and ask their boss to rate their job performance, you will see that there is a correlation. The correlation gets higher for jobs that require more intellect. This shows predictive validity.
What is concurrent validity?
Concurrent validity: Does the measure correlate with current behavior?
in the same test session or on the same day.
What is convergent validity?
Convergent validity: Does the measure correlate with other defensible operationalizations of the construct?
Raven’s matrices correlate with Weschler’s test, r = .85
if you score highly on one you should score highly on the other if they are both measuring the same construct.
what is Discriminant validity?
Discriminant validity: Does the measure not correlate with
unrelated constructs?
IQ scores do not correlate with favorite colors, r = .05
it shouldn’t correlate with unrelated things.
What is reactivity?
Reactivity: Does awareness of the construct change the psychological construct that is measured?
If so, the measure cannot be valid