3: Foundations of Quantitative Measurement Flashcards
In the social sciences, quantitative and qualitative approaches are typically associated with what?
Deep philosophical differences in epistemology.
What is the classical test theory measurement model? What concepts does it underpin?
Observed score = true score + error.
Underpins concepts of reliability and validity.
Levels of specificity in variables are predicated on constructs, operational definitions, and measures. Define each.
Construct: a psychological concept that is not directly observable. No tangible existence outside person’s mind.
Operational definition: clear, measurable definition of a construct based on theory. May capture only a portion of entire construct.
Measures: clearly defined set of procedures for obtaining a measure of the construct of interest. Must be clear and precise enough to be replicated by other scientists.
What is construct validity?
How well you translated/transformed construct into a functioning, operating reality - measuring what you wish to measure.
In what way is it useful to think of construct validity?
As an umbrella variable that encompasses all other forms of validity.
What is content validity? Provide an example. What can improve it?
Does measure cover all aspects of underlying construct?
E.g., measure missing some of depression DSM-5 components may lack content validity.
Multiple measurements of construct can improve content validity.
What is face validity? Provide an example.
Extent measure ‘appears’ to measure underlying construct.
E.g., item “nervous” has face validity for measuring anxiety, item “jealous” does not.
When studying older adults, some items for personality disorders may have poor _____ due to developmental changes.
Face validity.
What is criterion validity?
How well measure correlates with established “gold standard” measures of the same construct.
List the two subtypes of criterion validity. Provide examples.
Concurrent validity: “at the same time”(e.g., correlate questionnaire with a clinical interview).
Predictive validity: “predicting the future”(e.g., does your anorexia measure predict future weight loss).
Two concepts are specific to criterion validity for clinical psychology and medical diagnosis. What are they?
Sensitivity: how well it picks out people with the disorder (i.e., few false negatives).
Specificity: how well it avoids diagnosing healthy people with a disorder (i.e., few false positives).
How does one calculate sensitivity and specificity using signal detection theory?
Sensitivity: hit / (hit + miss)
Specificity: correct rejection / (false alarm + correct rejection)
When finding optimal cut-off score to use to get the best balance of sensitivity and specificity via an ROC curve, what indicates such?
Larger values on the y-axis indicate better sensitivity (% of hits). Smaller values on the x-axis indicate better specificity (% of correct rejections).
As sensitivity increases, what happens to specificity?
Decreases.
Define convergent and discriminant validity.
Convergent validity: correlate with other measures that it should be related to.
Discriminant: not correlate with measures that it should not correlate with.
Unreliability is the amount of _____ in the measurement.
Error.
What is test-retest reliability? When is it generally more useful?
Is the measure consistent over time - do scores stay more or less the same when repeatedly measured?
Useful for constructs which are theorized to be stable (i.e., personality) rather than transient states (e.g., fear).
Test-retest reliability is often assessed with what? What do higher values indicate?
Correlation coefficient. Higher values indicate higher test-retest reliability.
Describe the concept of internal consistency.
Used to assess a questionnaire with multiple items. Do all the items in the questionnaire more or less measure the same thing?
The statistic most commonly used to measure internal consistency is Chronbach’s Alpha α. Conceptually, it’s calculated as a function of what two things? How can you increase it?
Number of items, the average intercorrelation among all the items.
Increase number of items or remove items that are very weakly correlated with other items.
Describe the concept of inter-rater reliability. When is it used?
Two or more trained coders independently review data, provide their ratings. Ideally, ratings from all coders are similar.
Used when scores are derived from a trained coder looking at raw data.
What are the two nominal scales used to determine inter-rater reliability?
% agreement.
Cohen’s Kappa: more conservative, controls for agreement which occurs by chance alone.
What are the two ordinal, ratio, or interval scales used to determine inter-rater reliability?
Pearson correlation.
Intraclass correlation: more conservative, more complex calculations for different situations.
What are the rules of thumb when evaluating reliability and validity stats that determine whether stats are:
- good
- acceptable
- marginal
- poor
Good: 0.80 (reliability); 0.50 (validity)
Acceptable: 0.70 (reliability), 0.30 (validity)
Marginal: 0.60 (reliability), 0.20 (validity)
Poor: 0.50 (reliability), 0.10 (validity)
What are the four types of basic hypotheses?
Descriptive: what is X like?
Descriptive/comparison: does group 1 differ from group 2?
Correlation: do X and Y covary?
Psychometric: is a measurement reliable and valid?
What are the three most common hypotheses in published psychology research?
Mediation: X leads to M (mediator), which in turn leads to Y.
Moderation: relationship between X and Y varies depending on the value of the moderator, M.
Incremental validity: X1 predicts Y over and above another known predictor (X2).