Exam 1 Flashcards
Meaning of psychometrics
Psycho: breath, spirit, soul (greek root)
Metric: measure, size, distance (greek root)
Importance of studying measurement (5 reasons)
Minimize subjectivity of judgment
Make more precise statements
Quantify your observations
Can never be sure that measurement is perfect
Assess the degree of error (measurement itself can cause error and participant/researchers can introduce bias)
Our first test done
Hospital when we are born
Apgar test
5 categories with a score ranging from 0-2 (Appearance, pulse, grimace, activity, respiration)
7-10 is a normal score
Empirical thinking
Knowledge that isn’t based on the bible
Will lead to truth
Francis Galton
Founder of psychometrics
Obsessed with observations and measurements
Degree of association between 2 elements (Pearson’s R, correlation)
Recognition of individual differences: understanding the ways in which people differ, how do we calculate those differences, what causes those differences
2 types of individual differences
Trait differences
State differences
Trait differences
Resistant to change over time
Refers to behavior in general
Often easier to measure with questionnaires
Ex: extraversion, IQ, depression, anxiety
State differences
Subject to change over a short period of time
Refers to behavior at the moment
Easy to measure with tasks and questionnaires
Ex: sleepiness, hunger, depression, anxiety
Why we study psychometrics
Ensure reliable and valid measures
Application
Questionnaires are good
Ensure reliable and valid measures
Essential to sound science
How else can we identify individual or cultural differences
How else can we assess traits
Application
Good judgments require good measures
Questionnaires are good
They make good dependent variables
Help eliminate errors as covariates, controls, or experimental groups
Methods of measurements
Stimulus-centered scaling: psychophysics, the relation of physical, directly measurable, stimuli to perception (ex: sound perception)
Subject-centered scaling: estimating the subjective presence, absence, or degree of a construct
Levels of measurement
Nominal
Ordinal
Interval
Ratio
Nominal
Numbers are assigned as labels only, doesn’t mean anything
Numbers could easily be words
Ex: coding sex with numbers
Ordinal
Numbers are not only to label, they rank individuals
Use numbers in a meaningful way
Degree of change isn’t fixed between numbers
Ex: ranking of height between individuals
Interval
Numbers are labels, reflect ranks, and tell us exactly how much more of something we have now
No true zero
Ex: temperature scale
Ratio
Numbers are labels, reflect ranks, tell us exactly how much more of something we have now, and we have a true zero
Numbers can’t be negative
Distribution of data
Normal distribution
Standards scores can relate to 2 things
Common units
Common understanding
Common units
Compare numbers measured with different units, scaling method standardize to a common (Z) unit
Common understanding
Compare results across participants we norm reference them
Good is relative, a raw score on a measurement rarely has meaning
4 steps to develop standard scores
Provide a new measure to a very large sample
Verify that the data represents the full range of scores
Determine the distribution of the scores
Break down scores into psychologically meaningful groups
Scales
Different measurement system
Survey or questionnaire
Imply a unitary construct
Scale development
Construct identification: deciding what characteristics are needed, what is the thing you are interested in, how to measure it entirely
Make items: literature review, see what scales already exist, look for potential methods and items, make some questions
Pick a response format: what response format is most effective, only one format for all questions?
Pilot test: find which items work the best, reduce items and repeat, verify the utility of items/methods
Making good items
Cannot be certain to we make good items
Weight the constructs and make the appropriate number of questions for each
Look for consequences of the questions, sources of errors
Reverse-word questions?
Redundant questions?
4-8 questions for a simple concept, 30 for more complex, 12 is a good number, don’t go over 100
Reverse wording
Items that reflect the opposite of our chosen construct
Reverse-score to break the habitual response pattern
Can be problematic because the opposite can be hard to define
Can introduce biases in answers
Redundant
Use to strengthen a scale
Rephrasing items will bring out the common cause, reduce idiosyncrasies in how a particular wording may reflect the construct
Avoid using bad redundancy: rephrasing are completely different, differences in grammatical structure
Pick a response format
Decides how a participant gives you data and what kind it will be
Dichotomous: 2 parts, 2 options (true or false)
Semantic differential: using adjectives, opposite adjectives at the ends of the scale, 5 discrete options, no in-between
Thurstone scale (1928)
Construct: Parental aspirations for children’s success
Law of comparative judgment: tries to produce proper interval data
Difference between different options, an equal amount of change, carefully calibrated, suppose to only agree with one, problem when you agree with more because the score is supposed to be only 1, can’t use the data if that happens
Guttman scale (1944)
Construct: Parental aspirations for children’s success
Tried to make it a little easier to construct questions
Possible to get uninterpretable questions for participants
Cam agree with more than one statement, response og first item will influence the answers of the other items
Change in responses: looking for the change, follow a pattern to figure out where the change is
Analog scale
Precursor to digital scale
High level of detail
Easier to use that scale on your computer, harder to measure when directly on paper
Similar to semantic differential
Likert scale (1932)
Write questions to agree or disagree
5 options
Tell them what the description of the level is
Likert-type scale
More than 5 options
Not with agree or disagree
Frequency scale
Writing good items
Never use double negatives because it can be confusing (bad for construct validity) and sometimes not implicit in the wording
Double-barreled questions
Asking 2 questions at the same time is bad because can’t know to which question the participant is answering
Can be converted into 2 items
Leading questions
Introduce bias in the answers (reduce variability and increase systematic error)
Confusion and bias are bad for validity
Participant’s biases
Yea-saying: agrees with everything
Social desirability: answer in a way that makes you look like a better person
Malingering: answer to make you look like a bad person
Balancing the scales
When we use Likert and Likert-type, we assume (incorrectly) that we have equal intervals
The assumption is at least approximately true
Differences between different levels is not always the same, individual variability, can introduce bias depending on how we label a scale
Useful caveats
Best to put low-value responses (disagree) on the left even if there is a left-side response bias
Overall evaluation questions often do not apply in psychology and are not something you should consider a critical part of a good questionnaire
Costs of poor measurements
Using a poor measurement can be worst than having no measurement at all
Validity: will be poor
Correlations: hard to find the expected degree of association between 2 things, leading to potential mistakes
Boredom proneness scale
A 28-item questionnaire that measures negative experiences of boredom as it arises when the situation seems to lack meaning, interest, or challenge, self-regulatory problem
Originally believed to be a unitary construct measured with true or false
Recently believed to be 2 sub-constructs (internal and external) and measured with a 7-point Likert-type scale
What do we do with questionnaires
Rarely measure constructs with a single term
Combine each item to get a single, more comprehensive, raw score
Total: sum of the individual’s response
Mean: average of individual item response
Raw scores have little interpretative value
Norm-referencing
Way to improve the interpretation of raw scores and give better feedback on performance
Good performance is a relative term (relative to people completing the same measure)
Examine the distribution
Frequency distributions
How many people got that score
Group de scores
Can become a histogram
Percentile ranks
Percentile: normative way of describing performance on a test
How many people would score lower than you on that measure
Not equal intervals (no equal interval between)
Calculating percentiles
Within a sample
Across a sample
5th percentile: 5% are below you
How to talk about percentiles
Important to describe accurately
60th percentile: 60% of people would have obtained a lower score than you
90th percentile: score was in the top 10%, well-above-average