Test 2 Flashcards
Why do we use statistics?
We use statistics as checks on our own
biases and help us better answer our RQ;
To understand the shape of the data, and validate our intuitions about patterns within the data.
Do music lessons make kids smarter? Causal claim (Mozart Effect)
Schellenberg (2004)
Method:
o Over 36 weeks 4 groups of 6-year-old had
music lessons added to their course work.
Children were either taught; keyboard,
voice, drama or no lessons by qualified
instructors.
Why use three different treatment groups?
o No lessons is a control group but having
three other experimental conditions helps
us gain a better understanding of what it is
about music lessons which cause this
effect on IQ.
o Comparing drama to music lessons helps
understand if it’s something specific about
music or just being creative.
o Comparing two music groups to see if its
music in general or a specific type of
music.
Matching:
o He matched the four groups on extraneous
variables on age, family income (SES), IQ
before lessons. This is essentially a pre-test
post-test design which allows us to
compare IQ before and after the
experimental manipulation (lessons).
Results:
o Indicate that IQ gains were greater for
music lessons (keyboard and voice) relative
to control and drama lessons. This
illustrates that it is something about music
which increases IQ and not just creative
classes.
o How do we know if these effects are
meaningful? We need to use statistics to
find if these between group differences
are statistically significant and not due to
sampling error.
what do we need random assignments for?
> to create equivalent groups
to meet assumptions of t-tests and ANOVA
to rule out confounds (condition of
causation)
If we are coming three groups means why not run t-tests?
We would need multiple and the more tests we run the higher we inflate our false positive rate.
Analysis of Variance (ANOVA)
F statistic: between-groups variance (how
groups differ from each other) / Within
group variance (how people differ from
others in their group.
Comparing effect due to IV to the variance
which naturally occurs in your population.
If the null hypothesis is not true the sample
distribution for each group should not
overlap, the means should be much
different and give us a big F statistics.
We want between group variance to be
high and within groups differences (noise)
smaller.
How do we calculate variance (s2)
Null Hypothesis: All kids drawn from the
same population (i.e., 36 participants all
from the same population and no effect of
the IV on groups).
Variance: calculating each participant’s
distance from the mean. Since some with
be + and some -, they can not just be
added together because it will equal zero.
Instead, we add all the distances from the
mean all together and square root it/ n-1,
so we remove the – symbol. Bigger
number = more spread from the mean and
small variance indicates that they fall close
to the mean.
Total Sums of Squares (SStotal)
SSbetween: Total Sums of Squares Between Groups Variance
SSwithin groups: Sums of Squares within
SSbetween: Total Sums of Squares Between Groups Variance
Mean for each group comparing it to the
overall grand mean
Add the three means together, square it
to see how much each group differs from
the overall mean.
If they are all different from one another
than the SSbetween will be larger. If
they’re very similar it will be small and no
effect of IV present
SSwithin groups: Sums of Squares within
We want within group variance to be small
(noise)
How do each participant differ from their
group mean?
Sums of Squares and Mean Squares
Sums of Squares and Mean Squares
§ In ANOVA we calculate variance using a
technique called sums of squares (SS).
o SStotal = how much each participant varies
from the overall mean (squared)
o SSbetween = how much each group varies
from the overall mean (squared).
o SSwithin = how much each person varies
from their own group mean (squared).
Spread of data within one group.
o SStotal = SSb + SSw
§ Mean Squares (MS) are adjusted for n by
dividing SS by df
o dfbetween = #Groups -1
o dfwithin = #Participants - #Groups
§ MSbetween = SSb/dfb (Mean Square
Between groups = sums of squares
between divided by degrees of freedom
between)
§ MSwithin = SSw/dfw (mean square within
groups = sums of squares within divided by
degrees of freedom within).
§ F = MSb/MSw (F statistic = Mean Squared
Between divided by Mean Square Within
§ WARNING: MSw also called MSresidual or
MSerror
Mean Squares
The more people we have in each group,
the bigger the sums of square will be.
What we want to know is the average how
far are people from the mean (mean
square).
Mean Squares (MS) are adjusted for n by
dividing SS by df
o dfbetween = #Groups -1
o dfwithin = #Participants - #Groups
MSbetween = SSb/dfb
MSwithin = SSw/dfw
*Two Degrees of freedom in ANOVA (between; top and within; bottom).
ANOVA F Statistic Calculation:
F Distribution
§ F = Msbetween/Mswithin
§ We compare the F statistic to the F
sampling distribution which tells us how
often the null hypothesis can produce a F
that big.
§ All the F values are +, the Mean = 0 and the
distribution is all above 0 (one tailed; peaks
just under one).
§ The bigger the F statistic the less likely it is
being produced by the null hypothesis.
§ We want the between group variance to be
bigger than within group variance for the F
statistic to be big.
§ F of 1 tells us that the group variance is not
that much bigger than within (no effect of IV
– null is true).
§ Critical region: if the null hypothesis is true
less than 5% of the time it will produce a F
statistic 3.5 or bigger
F Distribution
F is a sampling distribution of possible F
values if the null hypothesis is true.
The exact size/shape of F depends on the
degrees of freedom
If the groups differ from each other a lot
compared to how much people (or animals)
differ from others in their condition, you get
a large F.
Reject the null if p < .05
F is always positive
No difference between one-tailed and two-
tailed F
Different df for different F distributions
§ df(x,y)
o X = number of groups – 1
o Y = total N – number of groups
§ In our example, we have 3 groups of 12 = 36
participants so the DF would be (2,33).
How does Jamovi treat subject variables in a quasi-experimental factorial ANOVA?
o One-Way ANOVA to measure three group
means. In Jamovi it doesn’t care if the IV is
manipulated within or between subjects.
Jamovi could use subject variables,
statistically all that matters is that you have
three groups. In experimental studies we
need the IV to be manipulated (not subject
variables).
o ANOVA is not JUST useful for
experimentalists its used anytime you want
to compare three group means.
Experimentalists and Non-experimentalists
use them and both use correlational or
regression as well.
Grand mean
Is useful because it corresponds with the null hypothesis (if the null is true, the % that these 3+ groups are sampled from the same population; group means all close to the grand mean).
Recap about F Ratios
o We calculate the F statistic, a ratio: between
group mean variance/within group variance
(participants around mean). We want the
between group variance to be bigger than
the within group variance to get a big F
statistic! More likely to reject null hypothesis.
o Calculate (F = means squares between/
means squares within) by calculating mean
squares. Mean square between =
SSbetween/dfbetween and Mean Sqaures
within – Sswithin/dfwithin. Jamovi does this
all for you. We then use the F statistic to
identify its corresponding p-value. We do
this by comparing it to the f sampling
distribution (dependent of df; sample size
and number of groups; under the null
hypothesis the groups come from the same
population and differences are due to
sampling error and not the IV; 5% rejection
region where there is less than 5% chance
of making a false positive and it is likely to
be IV effect present).
o Unlike T-Tests where there is 1 df (N-
#groups) ANOVA have 2x (one for numerator
and denominator; df= x,y; number of groups
minus 1 and total N minus number of groups;
e.g., 2,33 = 3 groups of 12 = 36).
A significant F tells me that my groups differ, but not how they differ.
All the F tells me is that there is a group
mean difference that is statistically
significant. It doesn’t tell me which ones
are different and in what direction. We
need to look at our descriptive statistics
and each groups mean to find this out!
Remember:
I could do t-tests to compare each group
to each other group.
o 3 t-tests
o Probability of a false positive for each one
= .05
o Probability of a false positive in one of
them = ~ .15
o i.e., .5 x the number of tests you run
Therefore, we use post hoc tests which
adjust for the number of tests you run.
Post-hoc tests
Post-hoc tests
Comparisons of pairs of means after finding
a significant F.
Used when I have no hypothesis about
how the means might differ from each
other (2-tailed; no hypothesis in the
direction of the mean group difference[s]).
o Post-hocs are like t-tests to compare 2
means, but they have been adjusted to
correct for the increased chance of Type 1
error.
o Penalises you for running multiple tests by
being stricter on the significance value so
after all of them are done the collective
false positive risk adds to .05. Generally,
.05 divided by the number of tests you run.
Note: we can do contrasts instead of post
hoc tests if we have a prediction of the
direction of the mean group difference
(one-tailed).
Post-Hoc on IV (class):
o No correction = t-test comparison without
corrections
o Tukey (general fits most situations; not too
strict or too loose)
o Tick effect size (how big is mean group
difference; cannot be answered with the p-
value! Since it is now two mean group
difference test the effect size we use is
cohens d).
df does not change (unlike a t-test where
the df would be 22 =12 x 2 – 2; ANOVA 33
= 12 per group x 3 – 3). Ptukey tells us
significance of group difference and
cohens d tells us how big this difference is.
These cohens d’s are big effect size. We
identify direction of group differences by
looking at the data! Which group mean is
higher? Not p-value or +/- of cohens d!
• Only do post-hocs if the interaction is
significant.
• Check for the specific means you want to
compare (use the graph to help you
decide).
• No corrections necessary as long as your
comparisons are relevant to your
hypothesis.
Planned Comparisons/Contrasts (instead of post hocs!)
• Sometimes, a certain comparison is
critical to testing my hypothesis.
• Just do it! (just don’t do too many of
them, and make sure they are justified by
the hypothesis).
Factorial Designs
2 or more independent variables
• Each variable can be manipulated within- or
between-subjects
• Each variable can have 2 or more levels
(that’s what makes them categorical!)
• Some variables can be subject variables
(quasi-experiment; i.e., male or female
jurors; a good way to test for the
generalisability of the findings to other
groups)
Why add variables?
Why add variables?
i.e., factorial designs
1. It is efficient. Assess 2 or more causes at
once.
2. To refine a theory (because it depends…
e.g., in Stroop effect)
3. To isolate a particular process of interest.
4. To assess change over time (e.g. in a pre-
test/post-test design; mindfulness vs
Pilates).
5. To increase external validity (extend to
other populations, stimuli, situations…
subject variables in negotiation phase)
Note: Interaction and moderations are the
same thing. Moderations look at
continuous predictors but
experimental ANOVA use categorical
(levels of IV; continuous).
Hypothesis in Factorials
can be:
> interaction only (i.e., no main effect, but
effect dependant on the level of the other
IV)
> main effects & interaction
*always want to know about an interaction
Variables plotted on graph
The DV always goes on the y-axis but either IV can go on the x-axis or the key.
We decide what IV goes on the X-axis by referring back to our research question to see what makes the most sense.
Main effects are ___s and interactions are ___s of _____s
• Main effects are the averages.
• To look at the main effect of drugs we
compare the both groups in placebo to
find the average score and compare both
groups of the Prozac group to find the
average. Is one average higher than the
other? Yes, Prozac appears to work better
than placebo.
• To look at the main effect of both CBT
groups and compare it to the average
improvement in both waitlist groups. Is one
average higher than the other? Yes, the
CBT group improved more than the waitlist
control group.
• Interactions are differences between
differences
• We calculate the difference in means
between the two groups of each level of
IVs (will be different between each IV).
Then calculate the difference between the
differences of each level of the IV (i.e.,
difference of differences between Prozac
and placebo and difference of differences
between the CBT and waitlist should equal
the same value either way you do it; tells
us the magnitude of the interaction).
Difference between Oneway ANOVA and ANOVA?
• OneWay ANOVA (1x IV with three
Or more levels)
• ANOVA (more than 2x IV with 2 or more levels)
ANOVA with 3x groups have…
2x df, 3 f statistics,
IF a significant main effect has more than 2 levels, you need to do a…
post-hoc test to determine where the differences are (just like oneway ANOVA).
What are the advantages of combining two IVs in the same study?
• Using the same data, running an ANOVA
with 2x IVs rather than 1x the p-values and
F ratios are different!
• “Drug” has a larger effect size and lower p-
value when Therapy is included in the
model!
• Why?
o Look at the residuals.
o Larger residuals (denominator;
unexplained variance) when only one IV is
included which will reduce the size of the F-
statistic and make p-value larger and the
partial eta squared smaller.
o Some of that variability can be explained
by IV if included in the model (moved from
denominator to numerator)
o This only works if the added IV are related
to the IV (increases power); if not it will
only make it worse.
- Adding (useful) factors decreases the
residuals, increasing F, decreasing p.
Interaction Write Up:
& Main Effects
Describing a significant Interaction:
1. Split one of the IVs into levels
2. Compare the effect of the other IV at each
level
3. Explain how the differences are different
• For participants in CBT, the drug produced
improvement over placebo. However, in
those on the waitlist, the drug was no more
effective than placebo.
•For participants taking the drug, CBT was
more effective than staying on the waitlist.
For participants taking the placebo, CBT
was no more effective than the waitlist.
• The effectiveness of the drug depended on
therapy.
•The effectiveness of therapy depended on
the drug.
What about the main effects?
• Main effect of drug isn’t meaningful.
• Main effect of therapy isn’t meaningful.
• These main effects are qualified by the
interaction.
ANOVA with within-subject variables
ANOVA with within-subject variables (both variables are manipulated within-subjects) the only difference is the use of statistics (repeated measures ANOVA; do not care about overall differences between people, just differences in individuals between conditions).
2 x 2 factorial (within-subjects) use a repeated measures ANOVA (we use it anytime there is a within-subjects variable; only difference is if you call it a mixed or within-subjects factorial).
Jamovi doesn’t have the theoretical variables in it, it only has the operationalized to levels. We need to tell Jamovi what we are measuring.
Assumption of sphericity only matters …
Homogeneity of variance …
Assumption of sphericity only matters in within-subjects design with three or more levels (if two levels and two means than you only have one pairs of variances which means you can not violate sphericity if you are comparing 3+ pairs of means).
Homogeneity of variance (if the variance within groups is different, SD’s; can possibly violate this in within-subjects design with two levels!) .
Small-N designs
Establishing causality when we do not have group means to compare
When do we use small-N designs?
When do we use small-N designs?
• To establish a causal effect of IV on DV
within a small number of participants
- Research question concerns a very small
sample (not enough people to sample from)
- Situations where we cannot recruit a
sufficiently-powered sample (too small for
inferential statistics)
- When we expect substantial variability in
individual responses (group means isn’t
useful for highly variable responses; the
mean doesn’t end up describing most
participants!)
• A small-N design establishes causal
relationships through replicating the effect
of IV on the DV (to prove consistency)
- Consistent change in DV as IV is
manipulated, with little variability (level or
. trend)
- Direct replication of the IV’s effect within
the participant (exact same participants,
conditions and context)
- Systematic replication of the IV’s effect
across participants or contexts (different
participants, context or conditions)
• Control over other variables is achieved by
- Establishing a baseline for the behaviour
without intervention (acts as a control
group)
- Collecting multiple observations until we
see consistency in behaviour (more
confident in the IV effect on DV; helps
establish that the DV is under the control of
the IV and supports claims of causality!)
- Replicating the change in DV with the
introduction of the intervention (comparing
change in DV from baseline-intervention
change)
Is one AB relationship enough to
demonstrate this? What is the problem with
inferring causality from a single-phase
change?
An AB design is not rule out history
effects? Extraneous variables which may
have caused the change in behaviour, and
provides and alternative explanation for the
results. Something that happens at the
same time as the IV and provides
alternative explanations.
Solution = reversal design