Week 6: Reliability & Validity of Measurement (same same) Flashcards by CAITLIN ANTOON

Distinguish btn state characteristics and trait characteristics?

Some phenomena don’t change or do so only
gradually, height, gender (trait characteristics)

Some phenomena change within an individual over a relatively short time span (state characteristics)

why important? cos tools for state and trait
are different. E.g. if depression (how someone is
feeling at that moemnt youd use state, if more interested
in person over time and stable then use trait

How well did you know this?

Not at all

Perfectly

what is the measurement problem?

The fact we cant measure some constructs that affect us directly.

• We can measure height, temperature and other physical phenomena directly.
– these are directly observable variables.
• We might like to measure happiness, sadness, depression, …
– but these are not directly observable
(LATENT variables)

How well did you know this?

Not at all

Perfectly

What is a latent variable?

A construct we cant measure directly. e.g. happiness

How well did you know this?

Not at all

Perfectly

What are the five steps involved in SCALE DEVELOPMENT if youre measuring a latent variable?

Technically at beginning youre stuck with an intuitive concept, then….

1) Have the construct DEFINED and agreed upon before we try to measure it!
Vague, intuitive definitions are a recipe for trouble !

2) OPERATIONAL definition
3) Measurement scales
4) Validity of the meausure
5) Reliability of measure

How well did you know this?

Not at all

Perfectly

Does having reliability mean having validity?

NO theyre not linked.

How well did you know this?

Not at all

Perfectly

On a target/dart throwing board:

Darts are all in the centre, all together on target.

Comment on reliability and val?

Good validity

Good reliability

How well did you know this?

Not at all

Perfectly

On a target/dart throwing board:

Darts are all on the outer ring of board, bunched together closely.

Comment on reliability and val?

Good reliability,  (consistency)
poor validity (are they on target?)

How well did you know this?

Not at all

Perfectly

On a target/dart throwing board:

Darts are all equally spread out on outer most ring

Comment on reliability and val?

Poor reliability, 
good validity (on average)

Tricky one-if dont know just memorise.

How well did you know this?

Not at all

Perfectly

What is face validity?

Does the measure (instrument) appear to measure what it claims to measure ?
• What is it ?
– Does the instrument appear to measure the desired
construct
• How is it assessed ?
– Often assessed qualitatively
– In some cases an expert panel and/or patient/client
input may be sought
• Things to consider:
– Has little scientific rigor
– Might be thought of as a necessary but not sufficient
condition

e.g. if testing anxiety levels, you dont want to be asking their fave sex positions.

How well did you know this?

Not at all

Perfectly

What do Initial development of measures involve?

Initial development of measures may involve expert opinion and/or individuals from the relevant population.

How well did you know this?

Not at all

Perfectly

What is concurrent v?

• What is it ?
– The new measure should correlate (imperfectly) with an
established measure of the same or a related construct
– The construct underlying the established measure should be theoretically related to the construct being measured
• How is it assessed ? By correlation of the new and established measures

NOTE: if you decide to make a new scale of depression,
you would hope there is a positive correlation
between yours and established one.
So you want + correlation but you dont want it
to be .97 or something highly correlated cos then
theres no point in yours. nothing new

How well did you know this?

Not at all

Perfectly

What is convergent v?

• What is it ?
• The new measure should correlate (imperfectly) with an another measure of the same construct
• In convergent validity we are using a 2nd ‘new’ measure as opposed to an established measure as used in concurrent validity
• How is it assessed ?
– Correlation between the measure being validated and another
measure of the same construct

How well did you know this?

Not at all

Perfectly

Is there possibly some confusion here between convergent and concurrent validity ?
YES a lot of confusion. But what is our distinction?

WE make a distinction.
Concurrent: there is an existing scale and you correlate yours with that.
Convergent: there is no accepted scale and
you simply correlate all of the new attempts.

How well did you know this?

Not at all

Perfectly

What is construct v?

• What is it ?
– Demonstrate that the measure being validated behaves as the construct ought to behave under varying conditions
• How is it assessed ?
– Through a triangulation of correlations such as the
excerpt from the D-HS article illustrates
– Confirmatory factor analysis (not covered here)

How well did you know this?

Not at all

Perfectly

Methodological flaw from authors in screenshot example?

GO BACK TO THIS?

A methodological flaw here is that the authors are developing the measure and validating it in one set of data. This can lead to a self-fulfilling finding of validity.

How well did you know this?

Not at all

Perfectly

Discriminant validity?

Study These Flashcards

Discriminant validity
• What is it ?
– Groups that ought to differ with respect to the construct are found to do so. Related to predictive validity.

• How is it assessed ?
– Describe differences in mean scores by group and unpaired t-test to assess statistical significance of difference

Remind me of the reliability equation again?

Study These Flashcards

Measured score = True score + error

The concept of reliability inherently assumes the underlying construct is stable
A reliable measure will yield the same score across measurement occasions and observers/ raters
Reliability is only interesting if the measure has been shown to be valid

Sources of unreliability:

-Observer error

Study These Flashcards

Observer error
– Meaning: technical errors or misjudgements by
the individual rating or scoring participants
– Assessment: Difficult. Inter-rater reliability covers part of this problem.

(mistake occurred in scoring or entering data).

Sources of unreliability:

-Environmental changes

Study These Flashcards

Environmental changes
– Meaning: changes in the environment may
influence performance by the participant
– Assessment: ideally test under consistent circumstances

e.g. if youre measuring someones happiness
should do so at same time in their circadian rhythm
same time of day

Sources of unreliability:

-participant changes

Study These Flashcards

Participant changes
– Meaning: changes within the participant that
alter score
– Assessment: as per environmental changes

Sources of unreliability:

-Change in the construct

Study These Flashcards

• Change in the construct
– Meaning: the construct being measured changes between testing occasions (eg pain, joyfulness)
– Assessment: measurements need to be made
near enough in time to minimize risk of change
in the underlying construct e.g. someone no longer depressed.

What is test-retest reliability?

How is it assessed?

Study These Flashcards

• What is it ?
– Do we obtain the same scores from individuals across 2 points in time ?

• How is it assessed ?
– Correlation between scores on the same measure within a sample of individuals at 2 points in time
– Assess the magnitude and significance of the change
within-person across 2 points in time in a sample

parallel forms of reliability?

Study These Flashcards

Similar to test-retest but alternate versions of the instrument are used instead of the same version at two time points

e.g. in parallel forms: you have test A and test B that can be administered at the same time to a group.

e.g. in test restest: you have only test A that you have to administer once in morn and then later in day. (diff times)

Internal consistency?

Study These Flashcards

• What is it ?
– Assesses the extent to which the items that make up a measure are all measuring a consistent construct

• How is it assessed ?
– Split-half reliability: correlate scores calculated by randomly splitting the items into 2 halves (can quasi randomly select 50% with the other).
– Chronbach’s α. NOTE CRONBACH goes up as no. of items goes up or stays the same for a bit then goes up

average of all possible half correlations. SPSS does this for us. Rule of thumb is about .8 for it to be reliable. But SHOULD also look at not just value of it but also how many items there are. smaller number say of 10 participants of items with .8 is AMAZING, compared to say with 150 participants.

What is cronbach's alpha?

* Split-half methods will yield different correlations depending on how the items are split * α can be thought of as the average of all possible split-half correlations

If all my items have cronbach alpha of .85+ is that acceptable or not?

Yes acceptable.

Conclusion?

• Validity = true measure of theoretical construct • Reliability = consistent measure of theoretical construct • Both validity and reliability are multidimensional and it is not always possible to assess all aspects in a single study • We cannot use a measurement instrument with confidence until its validity and reliability have been established

Week 6: Reliability & Validity of Measurement (same same) Flashcards

(27 cards)