Exam 2 - Chapter 5 (Measurement Issues) Flashcards
True Score vs. Measurement Error
True Score: Someone’s real value on a give variable
* (ie: true intelligence, true reaction time, true happiness, etc)
* a true score cannot be directly measured (measurement error will always impact the score)
Measurement Error: When measuring a “true” score, something will cause deviation from the real value.
Reliability
Reliability = Consistency
Reliability assessment helps us figure out if our measure is consistent/stable
A measurement that is as close to its real score as possible.
Validity
Validity = Accuracy
Validity helps us figure out if we’re truly studying what we intended to study.
Differentiating: Reliability & Validity
Reliability vs. Validity
A good study should be BOTH Reliable & Validity
Consistency and Accuracy are important and work together.
Types of Reliability
- Internal Consistency Reliability – Reliability of Items across people
- Item Total Correlations
- Split-Half Reliability
- Cronbach’s Alpha
- Reliability across Time – Reliability of scales over time (or versions)
- Test-retest and alternate form
Reliability across People – Reliability of ratings across raters
- Inter-rater Agreement
Internal Consistency Reliability: Item Total Correlation
Item Total Correlation: How well a specific item tracks responses to the rest of the scale.
- Useful for creating & refining questionnaires
Internal Consistency Reliability: Split-Half Correlation:
Split-Half Correlation: Split data in half, and find the correlation between the halfs.
Issue: Room for malpractice because “half” can be defined differently.
- Someone could try picking different “halfs” until they get a correlation coefficient that they like
Internal Consistency Reliability
How much individual items in a scale/survey relate to each other and how consistent they are in measuring the same concept or trait in their results.
The more they overlap/are related, the greater the Internal Consistency
Internal Consistency Reliability: Cronback’s Alpha:
Cronback’s Alpha Statistical solution to estimating reliability across infinite number of sets (ie: the average of every possible combination of ‘halfs’)
- alpha > 0.8
Reliability Across Time: Test-Retest Reliability
Test-Retest: How well does one agree with himself at multiple time points.
- Give the same test at two points in time.
Reliability Across Time: Alternate Forms Reliability
Alternate Forms Reliability: Correlations between multiple measures of the same category.
*Give two different forms of the same test at two points in time. (Used when given closer together)
Reliability Across People:
Inter-Rater Reliability
Inter-Rater Reliability: Degree of agreement or consistency between multiple people (raters) assessing or scoring the same thing.
- It shows how much raters produce similar results when evaluating the same subject.
- Cohen’s Kappa: To calculate the consistency use Cohen’s kappa
Types of Validity
- Construct validity
What are we measuring?
- Face validity
- Content validity
- Concurrent validity
How do our measures relate to other measures?
- Convergent validity
- Divergent (i.e., discriminant) validity
- Predictive validity
Construct Validity
How well an operational definition of a variable accurately reflects the variable being measured or manipulated.
- Other types of Validity fall under Construct Validity
Face Validity
How well a measurement device appears to accurately measure a variable. (not that useful)
- Face Validity is not sufficient to conclude that a measure is valid.
Content Validity
How well a test or survey relates to/covers all parts of the concept it aims to measure.
(pretty useful)
Convergent Validity
Checks if a measure is strongly related to other measures of the same construct.
- Our measure should be related to other measures that assess a similar construct
- Asks: Is my new scale of measurement as good as the old scale that measures the same thing?
Concurrent Validity
Tells us if the measurement used successfully differentiates people who are theoretically supposed to be different
- Test this by giving the measure to a different demographic and see if they score much lower than the target demographic
Divergent/ Discriminant Validity
Tests whether a measure is not strongly related to different or unrelated constructs
- This confirms that the measure accurately distinguishes between distinct concepts.
- Our measure should not be related to other measures that assess different constructs.
Predictive Validity
How well does our measure predict future scores on another test related to the concept it is assessing.
- Ie: a measure of depression should predict loneliness at a future time