Week 3 Flashcards

1
Q

Reliability

A
  • Synonym for dependability and consistency
  • Refers to consistency of measurement in psychometrics
  • Not necessarily reflecting good or bad results, just consistent ones
  • Test may be reliable in one context but not another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reliability Coefficient

A
  • Quantifies reliablity ranging from 0 (not reliable) to 1 (perfectly reliable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Measurement Error

A
  • In everyday language Error means some kind of preventable mistake
  • In science the meaning is broader relating to measurement imperfection which is inevitable
    e.g. 25 could be 24.9978 or 25.1232
  • Small fluctuations can be rounded but they are almost never trivial
  • Noticeable differences are routinely observed - Think building a steel bridge in hot climate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True Scores

A
  • Can never be observed directly
  • Useful fiction that allows us to understand reliability
  • At best true scores are guessed by averaging many measurements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Repeated Measurements

A
  • Repeated measurement can have problems
  • Time between testing has an effect
  • some states are in constant flux so they might average differently at different times
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Carryover Effects

A
  • An effect of being tested in one condition on participants’ behavior in later conditions
  • The practice effect , where participants perform a task better in later conditions because they have had a chance to practice it.
  • The fatigue effect happens when repeated measurement causes results to diminish due to fatigue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Construct Score

A
  • A theoretical variable we beleive exists such as depression agreeableness or reading ability
  • Testing for these is flawed so it can never be a True Score
  • Long term averages can still produce close to True Score flaws and all
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True Scores

A
  • We can never observe True Scores directly
  • The concept helps us understand reliability
  • High reliability does not mean high validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Concept of Reliability

A
  • True Score is the long term average of many measurements
  • No Carryover Effects
  • T = A True Score
  • X = Measurement is called Observed Score
  • E = Measurement Error
  • If the observed score is moslty found by the measurement error the test is unreliable
  • Better if the true score is found by the True Score

X = T + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

variance (o squared)

A
  • Useful to describe test score variability
  • Standard Deviation Squared
  • Can be broken into components
  • True Scores are stable and give consistency to tests

Reliability referesd to the proportion of total variance attributed to T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measurement Error

A

Chapter 5 - Page 297

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measuring Psychological Constructs

A
  • Constructs can’t be observed
  • Can be inferred from what we observe
  • Observe behaviour
  • Observe responses to self report scales
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Characteristics of a Typical Scale/Sub-Scale

A
  • Statements or questions designed to measure a construct behaviour
  • Fixed choide responses - consistent across scale
  • Responses are correlated because they asses the same thing
  • responses averaged to find an overall score
  • Some items are reverse coded
  • Strong psychometric properties - reliableitly validity & factor structure
  • Normative or Standardised data collected from a wide range of people
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Commercial Scales

A
  • Pay per use
  • Published by commercial publishers
  • Commonly used for clinical or applied purposes like recruitment or diagnosis
  • Sometimes used in research
  • Expensive
  • Detailed normative data
  • MMPI, NEO-PI, Beck Depresion Inventory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Non-Commercial Scales

A
  • Free to use
  • Published in books, journals articles or online
  • Often used for research purposes
  • Unlikely to be published with Normative Data
  • MINI-IPIP, Person Environment Fit Scale
  • Typically used by research students
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Uni Dimensional Scales

A
  • Measures only one construct
  • All items are intercorrelated
  • All items averaged to derive overall score
    e.g. Relationship Satisfaction Scale, Beck Depression Inventory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Multi Dimensional Scales

A
  • Measures Multiple scales
  • Each construct has a sub-scale and is a variable
  • Each sub-scale is intercorrelated
  • Each sub-scale is averaged to derive the score
  • Adding up scores for multiple constructs is meaningless
  • Sub-scales do not correlate
  • Separate reliability and validity data calcullated for each sub-scale.
    e.g.: MINI-IPIP (Big 5); Person Environment Fit (three fit dimensions).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define Reliability

A
  • How consistent are the tools we use to measure a construct
  • Does it produce the same results over time
  • Unreliable measures cannot be trusted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Four types of Reliability

A
  1. Internal Consistency
  2. Test-Retest
  3. Alternate Forms
  4. Inter Rater
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Internal Consistency

A
  • Consistency amongst items.
  • Responses to all the items on a sub-scale should be similar/consistent
  • if all items on the sub-scale measure the same thing then people would respond similarly to them
21
Q

Test-Retest

A
  • Consistency over time
  • Scale scores at time 1 should be very similar to their scores at time 2.
22
Q

Alternate-Forms

A
  • Consistency over equvalent versions
  • Scale scores on version 1 should be similar to their scores on version 2.
23
Q

Alternate-Forms

A
  • Consistency over equvalent versions
  • Scale scores on version 1 should be similar to their scores on version 2.
24
Q

Inter-Rater

A
  • Consistency over Observers/Raters
  • Multiple observers/researchers should provide similar accounts of the same event or behaviour
25
Q

Internal Consistancy - Split Half Method

A
  • A measure is split in half and averages for first half correlate with averages of second half
  • Often multiple ways to split a scale
  • Each different split will have a different reliability
26
Q

Internal Consistency - Crohnbach’s Alpha

A
  • Solves the problem of multiple split-half results
  • Reports average of all possible split-half reliabilites
  • Increasing the number of related items on a scale increases it’s internal consistency
  • Only remove items if it substantially improves Chronbach’s Alpha
  • Good is between 0.5 - 0.7
27
Q

Acceptable Internal
Consistency

A
  • Estimates of 0.7 or over are good
  • Higher reliability estimates needed for diagnostic tools
28
Q

Effect of Unreliable Measures - Attenuation

A
  • If a measure is unreliable, it’s correlations with other variables are attenuated.
  • That means they’re reduced in correlations
  • True Correlation is always 0.6 between variables
29
Q

Why is reliablity important

A
  • Unreliable measures cannot be trusted.
  • Unreliable measures make relationships between constructs difficult to detect.
  • larger the error the less reliable the measure
  • Observed Score = True Score + Error
30
Q

Test-Retest Reliability

A
  • If construct is stable, we should arrive at the same result over multiple tests
  • Assumes life circumstances haven’t changed dramatically between tests
  • Both sets of data need to come from the same respondents
  • As time between tests increases the reliability decreases
  • test retest coefficient is not interpretable without knowing the interval between tests
31
Q

Acceptable Test-Retest Reliability

A

Test-Retest coefficients of 0.7 or over are generally considered fine.
Need to be interpreted with consideration to:
* The length of the test-retest interval.
* The stability of the characteristic being measured (e.g., a trait vs a state).
* The internal consistency of the measure.
* The impact of practice effects (use alternate forms?)

32
Q

Test-Retest Practice Effects

A
  • Abilities Tests are troublesome because people know the answers after the first test
  • Alternate froms of the test are a good idea but difficult to create
33
Q

Validity

A
  • Does a test measure what it is intended to measure?
  • Measure can be reliable but not valid
  • Not all vlidity involve statistics
  • measures are valid for specific purposes
  • Validity is not inherent to a characteristic
  • Evidence for validity builds up over time
  • Different authors discuss validity in different but overlapping ways
  • This can be confusing
34
Q

Four Types of Validity

A
  1. Face
  2. Content
  3. Criterion
  4. Construct
35
Q

Face Validity

A
  • How much does measure look like it really measures what it says its measuring?
    e.g. appear to measure extraversion/sociability
36
Q

Content Validity

A
  • Samples the full range/breadth of a factor
  • Is it covering everything we think it should be covering
  • Items cover all aspects of extraversion
    e.g., talkative AND adventurous AND active etc
37
Q

Criterion Validity

A
  • Related to phsysiological or behavioural manifestations of factor
  • Measured in the present and in the future
  • Concurrent and Predictive measure
    e.g. Measure predicts current base rate cortical arousal levels; as well as future sociable behaviour at parties, work, school etc
38
Q

Construct Validity

A
  • Related to other convergent measures of the same factor
  • Not related to discriminant measures
  • Does it behave consistently with theoretical predictions
    e.g. Sales people have higher scores on this measure than accountants.
39
Q

Face Validity - more

A
  • How much a test looks like it is measuring what it says it is measuring.
  • Crude, but can influence motivation to take the test seriously etc.
    *
40
Q

Content Validity - More

A
  • How much a measuring instrument covers a sample of the behaviours to be measured
  • e.g. Extraversion Captures:
    Active = No
    Assertive = No
    Energetic = No
    Outgoing = yes
    Talkative = yes
    Gesturally expressive = No
    Gregarious= yes
41
Q

Criterion Validity - more

A
  • How much scores on a measure predict a behavioural or physicological criterion
  • Is it related to things we expect it to be related to?
42
Q

Two types of Criterion Validity

`

A
  1. Concurrent
  2. Predictive
43
Q

Concurrent Validity

A
  • A type of Criterion Validity
  • Correlation between scores of Extraversion measure and base rate physiological arousal
  • Both measures taken at the same time.
  • Introverts have higher base line arousal than extroverts
44
Q

Predictive Validity

A
  • A type of Criterion Validity
  • Good predictive validity if it can predict your preferences
    e.g. Extraversion measure can predict if you wish to study alone or with others
45
Q

Construct Validity

A
  • How much a measure actually measures the construct it claims to measure
  • Conceptualises a theory perspective
    e.g. How well does IQ Test measure Intelligence?
    How well does choice between toys gun/doll reflect aggression?
  • the more abstract the construct ther harder it is to have construct validity
46
Q

Two Main Types of Construct Validity

A
  1. Convergent
  2. Discriminant
47
Q

Convergent Validity

A
  • A type of Construct Validity
  • Strong relationship between the test and another similar test
  • e.g. Allen Extraversion Questionairre & Extraverson subscale fro NEO-PI
48
Q

Discriminant Validity

A
  • A type of Construct Validity
  • No or weak relationship between the current test and another measure of something different
    e.g. Allen Extraversion Questionnaire & Beck Depression Inventory
  • Can also be examined using factor analysis and in testing theoretical predictions