Chapter 5: Identifying Good Measurement Flashcards
What are three ways psychologists measure variables?
- self report
- observational
- physiological
What is operationalization?
the process of turing a concept of interest into a measured or manipulated variable
Two ways a variable can be expressed?
- conceptual: definition of a variable at an abstract level
2. operational: represents a researchers specific decision about how to measure or manipulate the conceptual variable
Operationalization of a conceptual variable? Steps?
- start by developing careful definitions of their constructs (conceptual variable) and then create operational definitions.
It is important to remember that any conceptual variable can be ____ in a wide variety of ways. This is where what comes into the research process?
- operationalized
- creativity
What is a self report measure?
- operationalizes a variable by recording peoples answers to verbal questions about themselves in a questionnaire or interview.
What is an observational measure?
- operationalizes a variable by recording observable behaviours or physical traces of behaviour
What is a physiological measure?
- operationalizes a variable by recording biological data such as brain activity, hormone levels or heart rate. Usually this requires the use of equipment to amplify, record and analyze bio activity.
ex: measuring moment to moment happiness via facial EMG.
Many people erroneously believe which of the three measures is the most accurate?
- physiological measures
L> no matter the measure it must HAVE good construct validity
All variables must have at least two level….but the levels of operational variables may be what?
- coded using different scales of measurement
What do we first classify operational variables as? (2)
- categorical variables
- quantitative variables
What is a categorical variable?
- categories……(nominal variables)
- numbers do not have numerical meaning!!
ex: sex (levels are male and female)
What is a quantitative variable?
- coded with meaningful numbers. Height and weight are examples
What are the three kinds of quantitative variables?
- Ordinal Scale
- Interval Scale
- Ratio Scale
What is an ordinal scale?
- applies when the numerals of a quantitative variable represent rank order.
L> we know they are different but not HOW different they are.
What is an interval scale?
- applies to the numerals of a quantitative variable that meet two conditions. First the numerals represent equal intervals between levels and second there is no true zero!
- we cannot say things like something is twice as hot as something else since there is no true zero!
What is a ratio scale?
- applies when the numerals of a quantitative variable have the equal intervals and when the value of zero really means nothing.
ex: weight or income !
Construct Validity?
- whether the operationalized variable is measuring what it should be
Reliability??
- how consistent is the measure??
The construct validity of a measure has what two aspects?
- reliability
- validity
What are the three types of reliability?
- Test-retest
- Interrater
- Internal
Test retest reliability?
- most relevant to what measures?
- researcher gets consistent results every time they use the measure
- can be relevant for all three types of measurement
- mostly relevant though just for when measuring constructs we suspect should be stable over time aka not something like subjective well being.
Interrater Reliability??
- most relevant to what measures?
- two or more independent observers will come up with the same (or similar) findings.
- most relevant for observational measures
Internal Reliability???
- often researchers test this how?
- interpretation?
- what kind of claim is this?
- study participant gives a consistent pattern of answers no matter how the research phrases a question.
- researchers collect data from samples and evaluate the results.
- use statistical devices such as scatterplots and the correlation coefficient r.
- version of an association claim*
Scatterplot significance?
- you can see whether two ratings agree (near the line of best Fit)or if they disagree (scattered from the line of best fit)
Correlation coefficient r??
- indicates how close dots are to the line on a scatterplot
- range= -1 to +1
Describe the relationships seen on a scatterplot.
- strong= points close to the line
- weak= spread out
- direction and strength of the relationship
Relationship between scatterplot and r?
- when the plots slope is positive …r is positive
- ## when slope is negative r is neg
Within r’s range what is a strong relationship?Weak?
- close to either +1 or -1
- close to zero
- no relationship the r value will be zero or very close.
Test-Retest Reliability and r?
- r is + and strong
- r is + and weak?
- measure the same participants at least twie.
- r is + and strong = 0.5 or above (good trr)
- r is + but weak= we know participants scores have changed ….poor measurement reliability.
Interrater Reliability and r?
- r is + and strong?
- r is + and weak
- neg r?
-two observers rate the same participants at the same time.
- r is + and strong….r= 0.7 or higher…good interrater reliability
- if r is + and weak…not trust the observers ratings therefore retain coders or refine the operational definition.
- neg r would indicate a big problem..
L> when assessing reliability neg r is unwanted and rare.
Interrater reliability and r?
-Kappa??
- kappa measures the extent to which two raters place participants in the same categories.
Internal Reliability?
- mainly for?
-good reliability =? (r)
L> if it is good we can?
- self report that contain more than one question to measure the same construct
L> are responses consistent even when q’s are phrased differently?
-good reliability = strong correlation with one another….. if it is strong we can take the average of all items and create a single score for each person.
Internal Reliability:
- Cronbach’s alpha?
L> what happens before doing this?
L> what does CA tell us
- first collect data then compute all possible correlations among the items.
- C alpha gives us one value from averaging the inter-item correlations and the number of items n the scale…closer to one the better the scales reliability. (0.7 or higher for self report q’s)
Internal Reliability
- Cronbach’s alpha
L> bad reliability?
L> good reliability?
- do not combine all items into one scale…..revise the items or avg only the items that correlated strongly together.
- average all items together
For internal reliability why do we average items?
- it cancels out any random errors…
Something can be reliable and not ___.
- valid
* *cannot be valid and not reliable though
Construct validity is easier/harder for measures of abstract constructs.
harder than what it is like for testing concrete constructs.
What are the first kinds of measurement validity used to start?
- Face validity
- Content validity
* depend on experts judgements
Face validity?
- when the extent that it is plausible measure of the variable in question…aka if it looks as if it should be a good measure it has this.
- checked by consulting experts
Content validity?
- a measure must capture all parts of a defined construct
* *experts are consulted
After face validity and content validity are examined what are the next validities tested?(2)
- Predictive
- Concurrent
- both evaluate whether the measure under consideration is related to a concrete outcome that it should be related to according to the theory being tested.
Predictive validity?
- testing the correlation with the outcome in the future.
Concurrent validity?
- when testing the correlation with an outcome at the same time
We use what two things to assess the validity of the measurement in question?
scatterplots and r
What two types of validity provide perfect evidence for content validity?
- predictive and concurrent
No matter the operationalization if it is a good measure of its construct it should ___ with a behaviour or outcome that is related to the construct.
correlate
Instead of going by correlation coefficients what else can we use to represent evidence for predictive and concurrent validity?
- known-groups paradigm
L> researchers see whether scores on the measure can discriminate among a set of groups whose behaviour is already well understood.
ex: testing cortical levels as a measure of stress in a group about to public speech and those in the audience. We know public speaking causes stress but what about being in an audience?
Known-groups paradigm can be used for what types of measurements?
- self report and physiological
ex: self report
L> Beck teste the BDI on people who are depressed and those that are not
Besides Face, content, predictive and concurrent validity what other two criterions for validity are there?
- convergent and discriminant.
Convergent Validity?
the measure should correlate more strongly with other measures of the same constructs
Discriminant Validity?(divergent)
the measure should correlate less strongly with measures of other distinct constructs.
When do researchers worry about discriminant validity?
- when one is worried their measure is not accidentally capturing a similar but different construct.
A measure may be less valid, than it is reliable it cannot be what?
- cannot be more valid than it is reliable
L> reliability has to do with how well a measure correlates with itself
L> validity has to do with how well a measure is associated with some other similar but not identical measure.
Reliability is ____(but not sufficient) for validity.
necessary