Week 3 Flashcards
1
Q
Reliability
A
- Synonym for dependability and consistency
- Refers to consistency of measurement in psychometrics
- Not necessarily reflecting good or bad results, just consistent ones
- Test may be reliable in one context but not another
2
Q
Reliability Coefficient
A
- Quantifies reliablity ranging from 0 (not reliable) to 1 (perfectly reliable)
3
Q
Measurement Error
A
- In everyday language Error means some kind of preventable mistake
- In science the meaning is broader relating to measurement imperfection which is inevitable
e.g. 25 could be 24.9978 or 25.1232 - Small fluctuations can be rounded but they are almost never trivial
- Noticeable differences are routinely observed - Think building a steel bridge in hot climate
4
Q
True Scores
A
- Can never be observed directly
- Useful fiction that allows us to understand reliability
- At best true scores are guessed by averaging many measurements
5
Q
Repeated Measurements
A
- Repeated measurement can have problems
- Time between testing has an effect
- some states are in constant flux so they might average differently at different times
6
Q
Carryover Effects
A
- An effect of being tested in one condition on participants’ behavior in later conditions
- The practice effect , where participants perform a task better in later conditions because they have had a chance to practice it.
- The fatigue effect happens when repeated measurement causes results to diminish due to fatigue
7
Q
Construct Score
A
- A theoretical variable we beleive exists such as depression agreeableness or reading ability
- Testing for these is flawed so it can never be a True Score
- Long term averages can still produce close to True Score flaws and all
8
Q
True Scores
A
- We can never observe True Scores directly
- The concept helps us understand reliability
- High reliability does not mean high validity
9
Q
Concept of Reliability
A
- True Score is the long term average of many measurements
- No Carryover Effects
- T = A True Score
- X = Measurement is called Observed Score
- E = Measurement Error
- If the observed score is moslty found by the measurement error the test is unreliable
- Better if the true score is found by the True Score
X = T + E
10
Q
variance (o squared)
A
- Useful to describe test score variability
- Standard Deviation Squared
- Can be broken into components
- True Scores are stable and give consistency to tests
Reliability referesd to the proportion of total variance attributed to T
11
Q
Measurement Error
A
Chapter 5 - Page 297
12
Q
Measuring Psychological Constructs
A
- Constructs can’t be observed
- Can be inferred from what we observe
- Observe behaviour
- Observe responses to self report scales
13
Q
Characteristics of a Typical Scale/Sub-Scale
A
- Statements or questions designed to measure a construct behaviour
- Fixed choide responses - consistent across scale
- Responses are correlated because they asses the same thing
- responses averaged to find an overall score
- Some items are reverse coded
- Strong psychometric properties - reliableitly validity & factor structure
- Normative or Standardised data collected from a wide range of people
14
Q
Commercial Scales
A
- Pay per use
- Published by commercial publishers
- Commonly used for clinical or applied purposes like recruitment or diagnosis
- Sometimes used in research
- Expensive
- Detailed normative data
- MMPI, NEO-PI, Beck Depresion Inventory
15
Q
Non-Commercial Scales
A
- Free to use
- Published in books, journals articles or online
- Often used for research purposes
- Unlikely to be published with Normative Data
- MINI-IPIP, Person Environment Fit Scale
- Typically used by research students
16
Q
Uni Dimensional Scales
A
- Measures only one construct
- All items are intercorrelated
- All items averaged to derive overall score
e.g. Relationship Satisfaction Scale, Beck Depression Inventory
17
Q
Multi Dimensional Scales
A
- Measures Multiple scales
- Each construct has a sub-scale and is a variable
- Each sub-scale is intercorrelated
- Each sub-scale is averaged to derive the score
- Adding up scores for multiple constructs is meaningless
- Sub-scales do not correlate
- Separate reliability and validity data calcullated for each sub-scale.
e.g.: MINI-IPIP (Big 5); Person Environment Fit (three fit dimensions).
18
Q
Define Reliability
A
- How consistent are the tools we use to measure a construct
- Does it produce the same results over time
- Unreliable measures cannot be trusted
19
Q
Four types of Reliability
A
- Internal Consistency
- Test-Retest
- Alternate Forms
- Inter Rater
20
Q
Internal Consistency
A
- Consistency amongst items.
- Responses to all the items on a sub-scale should be similar/consistent
- if all items on the sub-scale measure the same thing then people would respond similarly to them
21
Q
Test-Retest
A
- Consistency over time
- Scale scores at time 1 should be very similar to their scores at time 2.
22
Q
Alternate-Forms
A
- Consistency over equvalent versions
- Scale scores on version 1 should be similar to their scores on version 2.
23
Q
Alternate-Forms
A
- Consistency over equvalent versions
- Scale scores on version 1 should be similar to their scores on version 2.
24
Q
Inter-Rater
A
- Consistency over Observers/Raters
- Multiple observers/researchers should provide similar accounts of the same event or behaviour
25
Internal Consistancy - Split Half Method
* A measure is split in half and averages for first half correlate with averages of second half
* Often multiple ways to split a scale
* Each different split will have a different reliability
26
Internal Consistency - Crohnbach's Alpha
* Solves the problem of multiple split-half results
* Reports average of all possible split-half reliabilites
* Increasing the number of related items on a scale increases it's internal consistency
* Only remove items if it substantially improves Chronbach's Alpha
* Good is between 0.5 - 0.7
27
Acceptable Internal
Consistency
* Estimates of 0.7 or over are good
* Higher reliability estimates needed for diagnostic tools
28
Effect of Unreliable Measures - Attenuation
* If a measure is unreliable, it's correlations with other variables are attenuated.
* That means they're reduced in correlations
* True Correlation is always 0.6 between variables
29
Why is reliablity important
* Unreliable measures cannot be trusted.
* Unreliable measures make relationships between constructs difficult to detect.
* larger the error the less reliable the measure
* Observed Score = True Score + Error
30
Test-Retest Reliability
* If construct is stable, we should arrive at the same result over multiple tests
* Assumes life circumstances haven't changed dramatically between tests
* Both sets of data need to come from the same respondents
* As time between tests increases the reliability decreases
* test retest coefficient is not interpretable without knowing the interval between tests
31
Acceptable Test-Retest Reliability
Test-Retest coefficients of 0.7 or over are generally considered fine.
Need to be interpreted with consideration to:
* The length of the test-retest interval.
* The stability of the characteristic being measured (e.g., a trait vs a state).
* The internal consistency of the measure.
* The impact of practice effects (use alternate forms?)
32
Test-Retest Practice Effects
* Abilities Tests are troublesome because people know the answers after the first test
* Alternate froms of the test are a good idea but difficult to create
33
Validity
* Does a test measure what it is intended to measure?
* Measure can be reliable but not valid
* Not all vlidity involve statistics
* measures are valid for specific purposes
* Validity is not inherent to a characteristic
* Evidence for validity builds up over time
* Different authors discuss validity in different but overlapping ways
* This can be confusing
34
Four Types of Validity
1. Face
2. Content
3. Criterion
4. Construct
35
Face Validity
* How much does measure look like it really measures what it says its measuring?
e.g. appear to measure extraversion/sociability
36
Content Validity
* Samples the full range/breadth of a factor
* Is it covering everything we think it should be covering
* Items cover all aspects of extraversion
e.g., talkative AND adventurous AND active etc
37
Criterion Validity
* Related to phsysiological or behavioural manifestations of factor
* Measured in the present and in the future
* Concurrent and Predictive measure
e.g. Measure predicts current base rate cortical arousal levels; as well as future sociable behaviour at parties, work, school etc
38
Construct Validity
* Related to other convergent measures of the same factor
* Not related to discriminant measures
* Does it behave consistently with theoretical predictions
e.g. Sales people have higher scores on this measure than accountants.
39
Face Validity - more
* How much a test looks like it is measuring what it says it is measuring.
* Crude, but can influence motivation to take the test seriously etc.
*
40
Content Validity - More
* How much a measuring instrument covers a sample of the behaviours to be measured
* e.g. Extraversion Captures:
Active = No
Assertive = No
Energetic = No
Outgoing = yes
Talkative = yes
Gesturally expressive = No
Gregarious= yes
41
Criterion Validity - more
* How much scores on a measure predict a behavioural or physicological criterion
* Is it related to things we expect it to be related to?
42
Two types of Criterion Validity
| `
1. Concurrent
2. Predictive
43
Concurrent Validity
* A type of Criterion Validity
* Correlation between scores of Extraversion measure and base rate physiological arousal
* Both measures taken at the same time.
* Introverts have higher base line arousal than extroverts
44
Predictive Validity
* A type of Criterion Validity
* Good predictive validity if it can predict your preferences
e.g. Extraversion measure can predict if you wish to study alone or with others
45
Construct Validity
* How much a measure actually measures the construct it claims to measure
* Conceptualises a theory perspective
e.g. How well does IQ Test measure Intelligence?
How well does choice between toys gun/doll reflect aggression?
* the more abstract the construct ther harder it is to have construct validity
46
Two Main Types of Construct Validity
1. Convergent
2. Discriminant
47
Convergent Validity
* A type of Construct Validity
* Strong relationship between the test and another similar test
* e.g. Allen Extraversion Questionairre & Extraverson subscale fro NEO-PI
48
Discriminant Validity
* A type of Construct Validity
* No or weak relationship between the current test and another measure of something different
e.g. Allen Extraversion Questionnaire & Beck Depression Inventory
* Can also be examined using factor analysis and in testing theoretical predictions