Test 2 Modules Flashcards
Repeated Measures ANOVA
Comparing more than two groups when the
IV is manipulated within subjects.
F Ratio: MSbetween/Mswithin
If between is bigger than within the f ratio
should be big enough to reject null
hypothesis.
If Sphericity tests violated (significant) than we…
use a a greenhouse-geisser correction.
If the standard error bars overlap significantly in a estimated marginal means plot than this indicates that…
If the standard error bars overlap significantly indicate that the two group means are not likely to be statistically significant.
Mean Square and Sums of Squares from the output
Mean Square Within = sums of squares within/df within = 32.28/22 = 1.47
Mean Square Between = sums of squares between/df between = 22.39/2 = 11.19
F Ratio = MS Between/Mean Square Within = 11.19/1.47 = 7.63
The residual is the MSwithin.
sums of squares for the extraneous variability.
11 df because (N =12; N – 1 = 11; one group of participants in within-subjects design)
Despite the variability amongst participants people who tended to do well in one condition tended to do well in the others (still an effect of condition despite some people having better memory than others).
Because we are doing a repeated measures ANOVA and not a One Way ANOVA I can section out the variability in the data due to subject difference (rather than sampling etc.) and throw it out because we know where it comes from. This will mean the residual will be smaller and we are more likely to get a bigger F and detect smaller differences between conditions!
SSwithin/residual =SStotal – SSbetween – Between Subjects Effects
This means that the MSresidual is SSwithin/df = small number (i.e., 1.47)
More powerful design and controls for extraneous variables by throwing out the overall variability between subjects we do not care about and allows us to focus on variability between groups explained by our IV.
Degrees of Freedom:
o 3 conditions (k)
o 12 participants (n)
o 36 observations (k x n =12x 3 = 36)
o 35 degrees of freedom total
(observations – 1)
o 2 for my IV (# of groups – 1 = 3-1 = 2)
o 11 for my between subjects effect (n-1; 12-
1 =11)
o 22 left over for my residual (totadfl-IVdf-
BSdf = 35-2-11= 22)
This ONLY tells me that there is a significant difference between our conditions it doesn’t tell me which groups differ (higher/lower mean than the others)
o Solution:
o Run a post hoc test i.e., tukey (no real
hypothesis about the direction of the
difference and we need to correct for
the multiple comparisons we are
making; 3 groups).
o We are looking at the tukey corrected p-
value!
o We can see that only the auditory and
visual conditions differ from the
combined condition.
the reason we do post hocs (tick all that apply)
> because we do not have a specific
hypothesis on how the means differ
to determine which means are
significantly different without inflating
false-positive rates
determine which means are significantly
different if the f is significant
determine which means are significantly
different if the F is non-significant
> because we do not have a specific
hypothesis on how the means differ
to determine which means are
significantly different without inflating
false-positive rates
determine which means are significantly
different if the f is significant
Statistically, a within-subjects design is more powerful (gives a bigger F) than between-subjects design
MS residual is bigger
MS residual is smaller
MS between is bigger
MS between is smaller
MS residual is smaller
Three Effects/patterns being tested in Factorial ANOVAS
Main effect
Main effect
Interaction
*all with their own F-Ratio, P-value, effect
sizes but the same 2x df
*all independent to one another (i.e., any
combination is possible)
ANOVA output with p-values tells us if there is a statistically significant difference but not between which groups. We need…
Do we look at all of them?
Post hocs or contrasts.
No. We look at our hypothesis and our graph to see which groups to compare.
post hocs help us describe…
interactions
Do we report main effects that are qualified by significant interactions?
No. It would be misleading to report it as significant when we no the effect is only true some of the time (i.e., its effect depends on the level of the second IV).
We still report it in our write up BUT we say main effect of drug was qualified by a significant interaction between drug and therapy…
To interpret the interaction we split the IVsinto 2 levels, and compared the effect of of the other variable at each level. If I choose to split therapy, I would examine the effect of drug in CBT conditions and in the waitlist conditions. If I did this, which two rows of the post hoc table would I want to look at? (select 2)
> Waitlist Placebo vs Waitlist Prozac
CBT Placebo vs CBT Prozac
CBT Prozac vs Waitlist Prozac
CBT Placebo vs Waitlist Placebo
> Waitlist Placebo vs Waitlist Prozac
CBT Placebo vs CBT Prozac
hint if I split “therapy” into CBT and Waitlist
then the post hocs would be (CBT: both
drugs; Waitlist: both drugs).
If I split “drug” into placebo and Prozac then we would look at (Placebo: both therapies; Prozac; both therapies)
Problems with group designs which make small n designs better:
• So far we have looked at comparing group
means.
• Looking at this sample data we may
conclude that our intervention was
successful because we’ve seen a reduction
in scores overtime.
• However, what tends to happen in group
designs is that the effects of an individual
are hidden within group means.
• Whilst most participants seem to reduce in
aggression overtime there are a couple
who either have no change or increase
their aggression overtime.
• At a group level this is not an issue (some
with a strong effect, weak effect or no
effect) but we may ask ourselves what can
we do to make this intervention more
effective for those people?
• We could look more closely at those it
worked very well for; is this an individual
effect, third variable or effectiveness of the
IV?
When do we use small-N designs?
When do we use small-N designs?
§ To establish a causal effect of IV on DV
within a small number of participants
§ For example:
• Research question concerns a very small
sample (special needs children, clinical
populations, prison)
• Situations where we cannot recruit a
sufficiently-powered sample (hard to find
populations or special populations; clinical,
prison, children)
• When we expect substantial variability in
individual responses (if you find two distinct
subgroups within a single sample where
inferential statistics would average the
means and rid the effect but the effect in
itself is worth studying).
Establishing causality
(in a group study)
(in a small n design)
Establishing causality (in a group study) • If we see a change in means (two groups) with small variability, and we are confident that this change is caused by our IV, then we begin to infer a causal relationship.
Establishing causality
§ A small-N design establishes causal
relationships through replicating the effect
of IV on the DV
§ Need to find evidence of three things:
• Consistent (systematic) change in DV as IV
is manipulated, with little variability
• Direct replication of the IV’s effect within
the participant (not another subject factor
etc.; direct replication or systematic
replication can achieve this; same person
same context or different person same
context to test for consistency of the IV-DV
effect within the context)
• Systematic replication of the IV’s effect
across participants or contexts
*If we can replicate the IV-DV effect multiple
times either directly or systematically than
we can infer causation
Basic components of a small-N design
(A) Data Collection:
Basic components of a small-N design
(A) Data Collection:
RQ: Does my intervention result in fewer
aggressive responses to the caretaker?
• Participants: Single participant exhibiting
aggressive behaviour
• The goal is to test and intervention to see if
we can decrease the child’s aggressive
behaviour
• DV: Counting the instances of verbal
aggression towards the caretaker during
playtime
• IV: My intervention is to verbally praise the
child when they are engaging in non-
aggressive interactions during playtime.
• Keeping track of child’s progress with a
graph (responses on y-axis, trials on x-axis)
o Begin with baseline condition
(behaviour without intervention; multiple
trials to have ”normal” behaviour to
compare the intervention to) (A)
o Intervention phase (several
observations/trial) which we compared to
the baseline to test whether their target
behaviour increases, decreases or stays
the same.
o Series of trials/observations which happen
under the same conditions are called a
“phase” (i.e., baseline and intervention
phase). (B)
*Goal is to establish whether intervention
effects behaviour relative to baseline
How do we decide how many observations we need?
Enough to find a consistent pattern or
stability in their responding. How can we
check this?
How do we decide how many observations we need?
Enough to find a consistent pattern or
stability in their responding. How can we
check this?
(A) Has their behaviour reached stability in
its level?
• Magnitude or height of the data (average
the observations within a phase and see
where it intersects with y-axis; does it
cluster differently in the baseline relative to
the intervention phase?)
• On average are the baseline points higher
than the average intervention points?
• We build a band around the observations
for the average score. Like in a group
means analysis all observations will not fall
within the band. They are variable. The
variability can vary.
(B) Has their behaviour reached stability in
its trend?
• Do the data point follow a similar trend (the
general direction of the behaviour overtime
across trials; increase, decrease or flat or
inconsistent)
• Draw a straight line through the data points
so ½ are above and ½ are below the trend
line.
• Is the trend different in the baseline and
intervention phase? Direction and slope.
*we conduct observations till we can identify
a stable trend or level within the data.
Stable behaviour has low variability and
means that when we change phases were
can be confident that the changes in
participants behaviour is due to the IV and
not other variables.
What if behaviour does not reach stability?
If we switch phases before it has reached
stability then it makes it very hard to
identify if the intervention had an effect
because it too is highly variable.
We cannot be confident the behaviour
change is due to IV.
How can we address unstable data (trend
or level)?
• The first solution is to keep taking
observations at baseline till a clear trend or
level emerges (over days or weeks). The
number of baselines needed to reach
stability will vary across participants!
How do we decide when to change phases?
- Internal Validity (stability)
• Do I have enough observations to be
confident that their behaviour is stable and I
can identify the effects of IV on behaviour.
• For example, would not want to stop at trial
three because it switched from a
downwards trend to flat, to increase and
decrease (more variable than originally
assumed) - Ethical Concerns
• How did they behave at baseline and what
is the nature of their behaviour?
• What trend does the child’s behaviour
have? If it’s going in a downwards trend we
would need less observations because we
do not want too many that we do not have
confidence that the IV had an effect or if its
following a natural downwards trend.
• Concerns internal validity and Ethical
concerns.
• If the behaviour is already getting better on
their own is it ethical to intervene.
• If the behaviour is highly threatening or
harmful (self-harm or compulsive eating of
non-consumable food). Depending on the
type of behaviour it maybe unethical to
have a lengthy baseline period. Intervention
needs to be started as soon as possible, in
some cases, at a single point.
• If the intervention worsens a harmful
behaviour it would be unethical to continue
taking observations and would need to stop
the intervention immediately.
• The second option is to look for trends
across multiple observations within a phase
and see if they correspond with any
extraneous variables which could be
removed or minimised (person present or
context etc.)
§ Establishing causality
• Control over other variables is achieved by
• Establishing a baseline for the behaviour
without intervention (provides a control
group to compare our intervention to; can
help address some threats to internal
validity that if present should appear in
BOTH conditions; maturation, observation
effects)
• Collecting multiple observations until we
see consistency in behaviour (internal
validity; keep track of extraneous variables
and compare them to any behaviour
fluctuation changes to ensure behaviour
change is due to IV)
• Replicating the change in DV with the
introduction of the intervention (minimising
effect is due to confounds)
o Is one AB relationship enough to
demonstrate this?
o Nope, ABA or ABAB is a better design!
o AB designs cannot control for history
effects.
o It doesn’t establish consistency of the
effect of this change within the participant.
Gorilla study:
• If people are busy focusing on the people
in white people often do not notice the
gorilla walking through the scene.
• Variables: BMI and number of steps (a data
set for men and another for women =
legend) continuous
• Conditions: 3x hypothesis or hypothesis
free
• Q: what would you conclude from this data
set?
- Did people visualize the data? If graphed
would be a picture of a gorilla. People in the
hypothesis condition were less likely to find
the gorilla.
- It’s to say that deductive (day) and inductive
(night; exploration) methods are both
important. Data exploration can inform
future deductive studies.
Figures:
- Translate numerical data points into a visual
graph which allows us to find patterns in
our data and understand it better. - Visualizing data helps us see what patterns
we expect to find and then our inferential
statistics support our intuitions.
OneWay ANOVA
(4) Figures
- Estimated marginal means in ANOVA
– shows mean, standard error. Can add
individual data points. Useful for quickly
seeing the pattern in the data, spread,
weird data points (outliers) - Histogram
– shows the frequency of data points in
each “bin” of a measured variable.
– Density plot: A smooth line that
estimates the shape of the histogram.
– Useful for seeing the shape of the
distribution (skew, kurtosis, bimodal,
normal etc). - Box plots
– lots of info in one plot
– Median (bar)
– Interquartile range (IQR: 25th to 75th
percentile – middle 50% of the data)
– Range (whiskers are 1.5*IQR)
– Skew (if the whiskers are unequal
lengths)
– Outliers (outside the whiskers – check
these for errors)
– Can ask for this in a violin plot: an
overlay of the shape of the distribution
over the box plot. how much overlap of
individual data is there? Even if the IV is
significant there will still be a lot of
individual overlap.
– 25% percent of scores sit above and
below the blue box. If the whiskers are
not equal in length, then this would
indicate that the data is skewed (more
data either above or below the IQR). - Bar Graph
- Bar plot – shows means and SEs in a
simple, clean format - SE error bars
- I lose information in this format, outliers
shape of distribution, but they are very
clean and effective way to communicate
mean group differences.
- Bar plot – shows means and SEs in a
*differrent figure types have different roles in
helping us visualize our data
Factorial ANOVA
2 (Drug: Prozac, Placebo) x 2 (Therapy: CBT, Waitlist) Factorial ANOVA
(4) Figures
- Estimated Marginal Means
- Estimated marginal means for the
interaction term – captures means in all
conditions, easy to see pattern of main
effects and interactions, SE bars help you
estimate significant differences
- Main effect of therapy, no main effect of
drug and an interaction. If main effect of
drug is present it would be because it is
qualified by the interaction! - Histogram
- Histogram and density plots – see the
distributions in each condition.
- Plots for each group presented in a 2 x 2
table
- Outlier, normal and skewed data in these - Box Plot
- Boxplots – easily see pattern of pairwise
comparisons, overlap of conditions, and
outliers.
- Outlier in CBT-Prozac group (need to
check data to see if valid data point) and
has the highest gain
- Better picture of the group distribution
- The CBT Placebo whisker is only on one
side indicating it’s a skewed group
- Waitlist Placebo is skewed
- Can add the violin line of best fit over top - Bar Plot
- 2 x 2 bar plot – clean presentation of
means and SEs. Identify main effects and
interactions when clusters or colours of
bars show different patterns
Using Figures and Data Visualisation to Communicate your Research
*Titles, abstract and figures are what
scientists read first of scientific literatures.
Which convey lots of information in a
compact form.
Three rules when designing figures (alongside APA)
Three rules when designing figures (alongside APA)
1. Use the right type of figure for your data
(association/causal claim,
continuous/categorical)
2. Make the findings easy to see (best visual
representation to highlight patterns in data)
3. Don’t mislead the reader (intentionally or
unintentionally)
Not all figures have to be APA style!
- presentations may need simpler figures than
manuscripts
- in professional reports, format for maximum
clarity for your readers
- there are many instances where people use
graphs and the type of graph and level of
detail required depends on your audience.
Designed to be as clear as possible for the
reader.
Figures for:
- Association Relationships
- Clustered Bar Graph
- Line Graph
- Association Relationships
- Scatterplot – plots the association between
two continuous variables. Each data point
is an observation or person. - X-axis – predictor
- Y-axis – outcome
- Regression line – best fit that explains the
relationship between variables. - Rarely used in experimental research (i.e.,
causal claims usually) - Each point is an observation, this shows a
(-) correlation between sleep and
grumpiness
- Clustered Bar Graph
- comparing group means; nominal IV and
DV is continuous
- Nominal (control or experimental condition)
- Clustered Bar Graph – use for factorial
designs with nominal IVs and continuous
DV. One variable on x-axis, the other within
the clusters (legend)
- Include some measure of variability (e.g.,
SD, SE).
- Identify IVs and DVs in the figure caption.
- Label axes, include legend.
- Use colour, but make sure it works in
greyscale (e.g., darker/lighter shades or
textures)
- Organise to make key comparisons easy
for your reader.
- Put the most important IV variable on the x-
axis and the effector on the legend. - Line Graph:
- use for factorial designs if one of your
variables represents an underlying
continuous dimension (dose, age, word
frequency, etc). Put continuous variable on
the x-axis, use separate lines for the other
dimension.
- Use the range that clearly shows your
effects. Significant differences should look
big, nonsignificant differences should look
small.
- Caffeine dosage is continuous because its
levels are meaningfully linked (low-medium-
high; points in-between them and in a
meaningful order). In contrast, to therapy
which is discrete categories.
- Use the range that clearly shows your
effects. Significant differences should look
big, nonsignificant differences should look
small. Include 0 on y-axis if it is scores 0-
100% correct so it makes sense but when
using reaction time, it is not appropriate.
- 0-… to capture full range of scores; would
deemphasise a significant difference so
would not include it but would if we had no
significant differences (main
effects/interactions). Design the graph to
make differences clear. If no difference do
not design it to make it seem like there is an
effect (= misleading). We can be flexible with
the scale as long as we are responsible with
it.
Identifying misleading data visualization
Manipulation of data presentation to mislead the reader:
- Popular media do not make as
sophisticated data visualizations as
scientific papers. - Single variable against time in popular
media is used frequently. - Very very few used multiple variable
graphs (other than time courses) less than
5.6% across many popular media sources
in 1900’s. - The use of data visualizations are
becoming very popular in popular media
and are more sophisticated now. - Telling deep stories in a condensed and
digestible way. Dynamic and memorable.
Downside is that it is new to people and it
is easy to be mislead by graphs that are
just colorful bullshit.