Week 3 Flashcards

1
Q

Reliability

A
  • Synonym for dependability and consistency
  • Refers to consistency of measurement in psychometrics
  • Not necessarily reflecting good or bad results, just consistent ones
  • Test may be reliable in one context but not another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reliability Coefficient

A
  • Quantifies reliablity ranging from 0 (not reliable) to 1 (perfectly reliable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Measurement Error

A
  • In everyday language Error means some kind of preventable mistake
  • In science the meaning is broader relating to measurement imperfection which is inevitable
    e.g. 25 could be 24.9978 or 25.1232
  • Small fluctuations can be rounded but they are almost never trivial
  • Noticeable differences are routinely observed - Think building a steel bridge in hot climate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True Scores

A
  • Can never be observed directly
  • Useful fiction that allows us to understand reliability
  • At best true scores are guessed by averaging many measurements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Repeated Measurements

A
  • Repeated measurement can have problems
  • Time between testing has an effect
  • some states are in constant flux so they might average differently at different times
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Carryover Effects

A
  • An effect of being tested in one condition on participants’ behavior in later conditions
  • The practice effect , where participants perform a task better in later conditions because they have had a chance to practice it.
  • The fatigue effect happens when repeated measurement causes results to diminish due to fatigue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Construct Score

A
  • A theoretical variable we beleive exists such as depression agreeableness or reading ability
  • Testing for these is flawed so it can never be a True Score
  • Long term averages can still produce close to True Score flaws and all
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True Scores

A
  • We can never observe True Scores directly
  • The concept helps us understand reliability
  • High reliability does not mean high validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Concept of Reliability

A
  • True Score is the long term average of many measurements
  • No Carryover Effects
  • T = A True Score
  • X = Measurement is called Observed Score
  • E = Measurement Error
  • If the observed score is moslty found by the measurement error the test is unreliable
  • Better if the true score is found by the True Score

X = T + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

variance (o squared)

A
  • Useful to describe test score variability
  • Standard Deviation Squared
  • Can be broken into components
  • True Scores are stable and give consistency to tests

Reliability referesd to the proportion of total variance attributed to T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measurement Error

A

Chapter 5 - Page 297

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measuring Psychological Constructs

A
  • Constructs can’t be observed
  • Can be inferred from what we observe
  • Observe behaviour
  • Observe responses to self report scales
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Characteristics of a Typical Scale/Sub-Scale

A
  • Statements or questions designed to measure a construct behaviour
  • Fixed choide responses - consistent across scale
  • Responses are correlated because they asses the same thing
  • responses averaged to find an overall score
  • Some items are reverse coded
  • Strong psychometric properties - reliableitly validity & factor structure
  • Normative or Standardised data collected from a wide range of people
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Commercial Scales

A
  • Pay per use
  • Published by commercial publishers
  • Commonly used for clinical or applied purposes like recruitment or diagnosis
  • Sometimes used in research
  • Expensive
  • Detailed normative data
  • MMPI, NEO-PI, Beck Depresion Inventory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Non-Commercial Scales

A
  • Free to use
  • Published in books, journals articles or online
  • Often used for research purposes
  • Unlikely to be published with Normative Data
  • MINI-IPIP, Person Environment Fit Scale
  • Typically used by research students
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Uni Dimensional Scales

A
  • Measures only one construct
  • All items are intercorrelated
  • All items averaged to derive overall score
    e.g. Relationship Satisfaction Scale, Beck Depression Inventory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Multi Dimensional Scales

A
  • Measures Multiple scales
  • Each construct has a sub-scale and is a variable
  • Each sub-scale is intercorrelated
  • Each sub-scale is averaged to derive the score
  • Adding up scores for multiple constructs is meaningless
  • Sub-scales do not correlate
  • Separate reliability and validity data calcullated for each sub-scale.
    e.g.: MINI-IPIP (Big 5); Person Environment Fit (three fit dimensions).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define Reliability

A
  • How consistent are the tools we use to measure a construct
  • Does it produce the same results over time
  • Unreliable measures cannot be trusted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Four types of Reliability

A
  1. Internal Consistency
  2. Test-Retest
  3. Alternate Forms
  4. Inter Rater
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Internal Consistency

A
  • Consistency amongst items.
  • Responses to all the items on a sub-scale should be similar/consistent
  • if all items on the sub-scale measure the same thing then people would respond similarly to them
21
Q

Test-Retest

A
  • Consistency over time
  • Scale scores at time 1 should be very similar to their scores at time 2.
22
Q

Alternate-Forms

A
  • Consistency over equvalent versions
  • Scale scores on version 1 should be similar to their scores on version 2.
23
Q

Alternate-Forms

A
  • Consistency over equvalent versions
  • Scale scores on version 1 should be similar to their scores on version 2.
24
Q

Inter-Rater

A
  • Consistency over Observers/Raters
  • Multiple observers/researchers should provide similar accounts of the same event or behaviour
25
Internal Consistancy - Split Half Method
* A measure is split in half and averages for first half correlate with averages of second half * Often multiple ways to split a scale * Each different split will have a different reliability
26
Internal Consistency - Crohnbach's Alpha
* Solves the problem of multiple split-half results * Reports average of all possible split-half reliabilites * Increasing the number of related items on a scale increases it's internal consistency * Only remove items if it substantially improves Chronbach's Alpha * Good is between 0.5 - 0.7
27
Acceptable Internal Consistency
* Estimates of 0.7 or over are good * Higher reliability estimates needed for diagnostic tools
28
Effect of Unreliable Measures - Attenuation
* If a measure is unreliable, it's correlations with other variables are attenuated. * That means they're reduced in correlations * True Correlation is always 0.6 between variables
29
Why is reliablity important
* Unreliable measures cannot be trusted. * Unreliable measures make relationships between constructs difficult to detect. * larger the error the less reliable the measure * Observed Score = True Score + Error
30
Test-Retest Reliability
* If construct is stable, we should arrive at the same result over multiple tests * Assumes life circumstances haven't changed dramatically between tests * Both sets of data need to come from the same respondents * As time between tests increases the reliability decreases * test retest coefficient is not interpretable without knowing the interval between tests
31
Acceptable Test-Retest Reliability
Test-Retest coefficients of 0.7 or over are generally considered fine. Need to be interpreted with consideration to: * The length of the test-retest interval. * The stability of the characteristic being measured (e.g., a trait vs a state). * The internal consistency of the measure. * The impact of practice effects (use alternate forms?)
32
Test-Retest Practice Effects
* Abilities Tests are troublesome because people know the answers after the first test * Alternate froms of the test are a good idea but difficult to create
33
Validity
* Does a test measure what it is intended to measure? * Measure can be reliable but not valid * Not all vlidity involve statistics * measures are valid for specific purposes * Validity is not inherent to a characteristic * Evidence for validity builds up over time * Different authors discuss validity in different but overlapping ways * This can be confusing
34
Four Types of Validity
1. Face 2. Content 3. Criterion 4. Construct
35
Face Validity
* How much does measure look like it really measures what it says its measuring? e.g. appear to measure extraversion/sociability
36
Content Validity
* Samples the full range/breadth of a factor * Is it covering everything we think it should be covering * Items cover all aspects of extraversion e.g., talkative AND adventurous AND active etc
37
Criterion Validity
* Related to phsysiological or behavioural manifestations of factor * Measured in the present and in the future * Concurrent and Predictive measure e.g. Measure predicts current base rate cortical arousal levels; as well as future sociable behaviour at parties, work, school etc
38
Construct Validity
* Related to other convergent measures of the same factor * Not related to discriminant measures * Does it behave consistently with theoretical predictions e.g. Sales people have higher scores on this measure than accountants.
39
Face Validity - more
* How much a test looks like it is measuring what it says it is measuring. * Crude, but can influence motivation to take the test seriously etc. *
40
Content Validity - More
* How much a measuring instrument covers a sample of the behaviours to be measured * e.g. Extraversion Captures: Active = No Assertive = No Energetic = No Outgoing = yes Talkative = yes Gesturally expressive = No Gregarious= yes
41
Criterion Validity - more
* How much scores on a measure predict a behavioural or physicological criterion * Is it related to things we expect it to be related to?
42
Two types of Criterion Validity | `
1. Concurrent 2. Predictive
43
Concurrent Validity
* A type of Criterion Validity * Correlation between scores of Extraversion measure and base rate physiological arousal * Both measures taken at the same time. * Introverts have higher base line arousal than extroverts
44
Predictive Validity
* A type of Criterion Validity * Good predictive validity if it can predict your preferences e.g. Extraversion measure can predict if you wish to study alone or with others
45
Construct Validity
* How much a measure actually measures the construct it claims to measure * Conceptualises a theory perspective e.g. How well does IQ Test measure Intelligence? How well does choice between toys gun/doll reflect aggression? * the more abstract the construct ther harder it is to have construct validity
46
Two Main Types of Construct Validity
1. Convergent 2. Discriminant
47
Convergent Validity
* A type of Construct Validity * Strong relationship between the test and another similar test * e.g. Allen Extraversion Questionairre & Extraverson subscale fro NEO-PI
48
Discriminant Validity
* A type of Construct Validity * No or weak relationship between the current test and another measure of something different e.g. Allen Extraversion Questionnaire & Beck Depression Inventory * Can also be examined using factor analysis and in testing theoretical predictions