Quantitative Flashcards
Goal of research
design studies carefully to make alternative interpretations implausible
Methods are about designing a study so that if a particular finding is obtained we can reach a conclusion
Illusory Correlation
cognitive bias that occurs when we focus on two events that stand out and occur together
How do we know things
feelings, intuition, AUTHORITY (Expert), reasoning (logic)- assumption has to be true.
How do we know things part II
Empiricism- idea that knowledge is based on observations SCIENCE- empiricism and reasoning
Process of science
Hypothesis-new hypothesis- theory building- body of knowledge
Goodsteins Evolved Theory of Science
1) data play a central role
2) cientists are not alone- observations reported to other scientists and the public
3) science is adversarial- can be falsified or supproted
4) peer reviewed
Tenets of Science
Empiricism
Replicability
Falsifiability- can be testable
Parsimony- simple account
Hypothesis gains support
Hypothesis can not be proved
Extend literature
take idea further
remove confounds and improve generalizability
behavioral science goals
describe behavior, predict behavior, explain behavior, determine the causes of behavior
causation
temporal, and covariation, and elimination of alternatives
causation
temporal, and covariation, and elimination of alternatives
causation
temporal, and covariation, and elimination of alternatives
Efficacy vs effectiveness
does intervention produce expected result in ideal circumstances/// degree of benefit in clinical settings
Construct Validity
adequacy of there operational definition
Internal Validity
Ability to draw conclusions about causal relationships
Integrity of experiment
Ability to draw casual link between IV and DV
Mediating Variables:
psychological processes that mediate
the effects of the situational variable
on a particular response
COnstruct vs Variable
the idea then what’s used to test it
Operational Definitions
Set of defined and outlined procedures used to measure and manipulate variables
A variable must have operational definition to be studied empirically
Allows others to replicate!
Construct validity
Adequacy of the operational definition of variables
Does the operational definition reflect the true theoretical meaning of the variable?
Nonexperimental method
Variables are observed as they occur naturally
If they vary together, there is a relationship (correlation)
Reduction of internal validity
Experimental Control
Extraneous variables are kept constant
Every feature of the environment is held constant except the manipulated variable
Strong internal validity requires:
Temporal precedence
Covariation between the two variables
Elimination of plausible alternative explanations
Issues When Choosing A Method
Often the higher the internal validity, the lower the external validity (generalization)
Harder to generalize when strict experimental environment
Reliability and validity of measurement
Not to be confused with internal or external validity of a study
However, reliability and validity of measurement affects internal validity of a study
Measured score = “true” score (real score on the variable) + measurement error
Reliability of Measures
Consistency or stability of a measure of behavior
We expect measures to give us the same result each time
Should not fluctuate much
Test-retest reliability- same individuals at two points in time
Practice effects – literally more practiced 2nd time
Maturation – simply that subjects change because time has passed
Alternate forms reliability
Individual takes 2 different forms of the same test
Also at 2 different times
Internal consistency reliability
Generally measures whether several items that propose to measure the same general construct produce similar scores
Assessed using responses at only one point in time
In general, the more the number of questions, the higher the reliability
3 common measures of internal consistency:
Item total
Split –half reliability
Cronbach’s alpha (α)
Item-total
Correlation between an individual item and the total score without that item
For example, if you had a test that had 20 items, there would be 20-item total correlations. For item 1, it would be the correlation between item 1 and the sum of the other 19 items
Helpful in identifying items to remove
Or in creating a short-from
Split –half reliability
Correlation of the total score on one half of the test, with the total score on the other half
Randomly divided items
Spearman-Brown split-half reliability coefficient
We want > .80 for adequate reliability
However, for exploratory research, a cutoff as low as .60 is not uncommon
Cronbach’s alpha
How closely related a set of items are as a group
How well all the items “hold together”
Simply put:
Average of all possible split-half reliability coefficients
Expressed as a number between 0 and 1
Generally want > .80, (in practice > .70 considered ok)
By far the most common measure you will see reported
Interrater reliability
Correlation between the observations of 2 different raters on a measure
Measured by: Cohen’s Kappa
By convention > .70 is considered acceptable
Construct Validity
To what extent does the operational definition of a variable actually reflect the true theoretical meaning of the variable?
Does the measure reflect the operational definition?
Ex. Depression
DSM-5 criteria
BDI sxs
Face validity
Content of the measure appears to reflect the construct being measured
Very subjective
Easy for a participant to “fake”
Content Validity
Extent to which a measure represents all facets of a given construct
Subject matter experts may be part of the process
Criterion
Measures how well one measure predicts an outcome for another measure
Criterion
Predictive Validity
Scores on the measure predict behavior on a criterion measured at a future time
Ex: GRE -> grad school success
Concurrent Validity
Relationship between measure and a criterion behavior at the same time
Criterion- Convergent Validity
Convergent Validity
Scores on the measure are related to other measures of the same construct
i.e., if we test two measures that are supposed to be measuring the same construct and show that they are related
The scores should “converge”
Ex: BDI & CES-D
Criterion- Discriminant Validity
Scores on the measure are not related to other measures that are theoretically different
i.e., if we test two measures that are not supposed to be related, and show that in fact, they are unrelated
The scores should “discriminate” between constructs
Ex: Narcissism and Self-esteem
Reactivity of Measures
Measurement is reactive if awareness of being measured changes an individual’s behavior
e.g., self-monitoring, wearing a fitbit
Hawthorne effect / observer effect
Productivity and working conditions
sensitivity
Ability of a test to correctly identify those with the condition (true positive rate)
Proportion of people with the condition who will have a positive result
specificity
Ability of the test to correctly identify those without the condition (true negative rate)
Proportion of peoplewithoutthe condition who will have anegativeresult
Nominal
Categories with no numeric scales
Ordinal
Rank ordering
Numeric values limited
Interval
Numeric properties are literal
Assume equal interval between values
Ratio
Zero indicates absence of variable measured
Types of Variables: Discrete/categorical
Consist of indivisible categories
i.e., don’t do math on them, only frequency or percent
Answers “how many” questions
Usually naturally occurring groups or categories (not always)
Continuous/dimensional
Infinitely divisible into whatever units Ex: time or weight
Scores that provide information about the magnitude of differences between participants in terms of the amount of some characteristic
Converting variables
A continuous variable can be converted into a categorical one
Ex: level of anxiety symptoms (0-100)
Controlling extraneous/confounding variables
Key to experimental method
Want least ambiguous interpretations of results
Manipulate IV and hold all other variables constant, either by experimental control or randomization
Confounding variable
Varies along with the independent variable
Cannot determine which variable is responsible for the effect
Ex: exercise vs. video’s effect on mood
Windows vs. window-less room
Internal validity
Reminder: Internal validity is ability to draw conclusions about causal relationships from the data
Results can be attributed to the effect of the independent variable (IV) and not confounding variables
More we control confounding variables, the more we strengthen internal validity
Posttest-only design
Obtain two equivalent groups of participants
Introduce the independent variable
Measure the dependent variable
Assume the effect is of the IV on the DV
Selection Bias
When people selected to conditions differ in an important way
This is why we prefer to recruit all our participants first, and then randomize them!
Basically anything other than randomization may lead to bias in some way
Even when we do randomize, we usually want to make sure the groups are equivalent on important variables…
Pretest-posttest design
Pretest is given to each group
Help assure that groups are equivalent at the beginning of the experiment
Pros to adding a pre-test
Can look at change in DV from pre to post
Can assure the randomization worked
i.e., we started with “equivalent groups”
Compare the pre measures between the groups
Especially helpful if have a small sample size
Sometimes used to select participants for the experiment
Like screening for people who score above a cutoff on depressive symptoms
Then they get randomized to groups
Pre test and attrition
May have started with equivalent groups, but then people start dropping out….
Drop out may be random…
but what if it is due to something non-random, i.e. systematic?
Like patients with “worse” symptoms dropped out
Pretest allows us to see if those who dropped out were different in some way from those who remained in the study
Cons to adding a pre-test
Time consuming
Might sensitizes participants to what is being studied
Affects the way participants react to manipulation/intervention
Practice effects
When same pretest is used to post-test (measure DV)
Disguise pretest if possible
Different form
Embed in other questions
Solomon 4-group design
A design that tests for effects of a pretest
Half receive posttest only
Other half receive pretest and posttest
Between-subjects design/Independent groups design:
> = 2 groups/conditions
Participants participate in only one group
Comparisons are made between different group/condition of participants
Within-subjects design /Repeated measures design:
Participants experience all groups/conditions
Comparisons are made within the same group of participants
Within types
Time
Pre-post
i.e. how did the group of people do from pre to post?
T1 T2 T3 T4
Pre, post, 6-month follow-up, 1-year follow-up
Condition
All participants go through all conditions
Ex. Taste test challenge
Condition: 1) Pepsi, 2) Coke, 3) RC Cola
Participant tastes all three.
Repeated Measures Design: Pros
Fewer participants
- Each participant serves as their own “control”
- Extremely sensitive to statistical differences (between-group tends to have more random error)
- When you use the same participant, you automatically are controlling for a large amount of potential confounding factors: such as any demographic/historical differences
Repeated Measures Design: Cons
Order effects:
Order of presenting the conditions/treatments affects the dependent variable
Practice (learning) effects:
Performance improves because of the practice gained from previous tasks, i.e., repeated measurement of the DV
Counterbalancing
All possible orders of intervention/condition are included
Can help with order and practice effects (depending on measurement)
Repeated Measures Design: Cons (cont)
Fatigue effect:
Performance deteriorates because the participant becomes tired, bored, or distracted from previous tasks
Carryover effect:
Effect of the previous treatment carry over to influence the response of the next treatment
Time Between Repeated Measures
Longer time interval between measures
Sometimes called “washout”
Helps with fatigue and potential carryover effects
However longer intervals can lead to more attrition….
Matched Pair Design
Method of assigning participants to conditions in an experiment based on a participant characteristic
Goal is to achieve the same equivalency of groups
Participants are grouped together into pairs based on some variable they “match” on e.g., age, gender, SES
Then, within each pair, participants are randomly assigned to different treatment groups
Populatoin
All possible individuals of interest
sample
The group of people you actually study that are drawn from the population
Reasoning for sampling
How you select your sample affects external validity
In general, the larger the sample, the closer it is to estimating the population
Sample size matters
Must consider the cost/benefit of increasing sample size
Inferential statistics
inferences and predictions about a population based on a sample of data taken from the population in question
Because a sample is typically only a part of the whole population, sample data provide only limited information about the population.
This is why sampling is so important!
Probability sampling
utilizes some form of random selection
assures that a member of your population has equal probability of being chosen
Nonprobability sampling
Just about all other methods
Also known as a “convenience sample”
A sample that is not drawn randomly
Usually consists of participants who are readily available to the researcher
Stratified Sampling
Stratified sampling
Used to ensure that the proportional representation of groups in the sample is the same as in the population.
Example:
Population of psychology grad students may be 88% female,10% male, 2% non-binary. How would you sample?
Sampling bias
Systematic differences between the characteristics of a sample and a population
Leads to underrepresentation of many types of people
Limits the potential generalizability of results
Sampling Error
The discrepancy between a sample statistic and its population parameter
Straightforward manipulations
Ex:
Assignment to groups/conditions
Treatment vs. control: therapy or pill
2 different signs in the bathroom
Different instructions to participants
Staged manipulations
Might be when trying to create some psychological state (like anger, anxiety)
Frequently use a confederate or accomplice
Frequently uses deception
Ex:
Milgram obedience study
Sneezing confederate
Strong IV manipulations
Maximizes the difference between groups and increases the chances that the IV will have an effect that is statistically significant
Early stages of research
Want to show that a relationship exists
External validity?
Might create situations which never naturally occur
Measuring the Dependent Variable- Self-report measures:
Simple ask them!
Used to measure aspects of human thought and behavior
Attitudes, judgements, emotional states, intended behaviors, etc.
Ex: How often do you exercise? How often? How long?
Measuring the Dependent Variable Behavioral measures:
Direct observations of behavior
Ex: Observe how much exercise one does
FitBit – accelerometer/pedometer data
the fitbit observes for you!
Measuring the Dependent Variable
Physiological measures:
Recordings of responses of the body
Ex: FitBit – Heart rate monitor data
blood lactate concentration
Measuring the Dependent Variable
Physiological measures:
Recordings of responses of the body
Ex: FitBit – Heart rate monitor data
blood lactate concentration
Measuring the Dependent Variable
Ceiling effect:
Maximum level is quickly reached
Task too easy?
Scale of 1-5 -> most responses are 4 or 5
Maybe would have seen a better spread if scale was 1-10
Floor effect:
Reduces variability in data and difficult to detect any differences – just not captured by the DV measure
Floor and ceiling
This also happens when there isn’t much room for improvement in your sample
Ex:
Low depression scores to start, no where to go
A1C
Stress management
1550 SAT score
Measuring the DV: Likert scale
Most people use “Likert-like” or “Likert-type” scales
“True” Likert scale
containsseveral items
response levels are arranged horizontally and anchored with consecutiveintegers and verbal labels
connotemore-or-lessevenly-spaced gradations, bivalentandsymmetricalabout a neutral middle
Participant expectations-Demand characteristics
Participants respond/behave how they think is expected
Especially problematic if they know the hypothesis
Control by
deception, filler items, asking about perception of purpose of research, blinding!
Placebo effect
Used to control placebo effect
i.e., just taking a pill makes a difference
Compare placebo pill with active pill effects – needs to above and beyond placebo response
Balanced-placebo design
When specifically looking for effect of expectations
Nocebo
Anoceboresponse occurs when participant’s symptoms are worsened by the administration of an inert, sham, or dummy (simulator) treatment due to negative expectations of treatment or prognosis
Experimenter
Subtle biases of how experimenter interprets and records behaviors, or how they interact with participants
Controlling the expectancy problem
Train experimenters
Run all participants at some time so experimenter’s behavior is the same
Take the human out of the equation - Automate procedures
print, video directions
Controlling demand characteristics and expectancy problem
Single and double blind