Effect Size Flashcards
why is effect size useful?
experiments intend to find the effect of your IV on a DV
what does a p value tell us?
measures the probability of obtaining the observed results, assuming that the null hypothesis is true
- tells us whether an effect exists
what does the p value not tell us?
the size of the effect
what does effect size measure?
indicates the proportion of the variance explained
what effect size is used for correlation and regressions?
R and R^2
what effect size is used for T-tests?
Cohenâs d
what effect size is used for one way ANOVAs?
Eta Squared
what effect size is used for factorial anovas (where the effect is made up of different variables)?
Parietal Eta Squared
how do you calculate cohenâs d?
differences between the mean as a function of (/divided by) the standard deviation
what is Cohenâs d?
standardised score representing the difference between the group mean
d = m1 - m2 / đ
D is the difference in means -> scaled by the standard deviation
what is cohenâs d an example of?
a standardised score (conducted with means rather than individual scores)
what are the benefits when we scale by standard deviation in cohenâs d?
- D does not dependent on a sample size (you can compare cohenâs d for a small pilot study and the full experiment because itâs standardised and divided by the standard deviation)
- allows you to compare studies effect with other studies in literature
what is cohenâs d for ANOVA (where there are multiple groups)?
difference between largest and smallest group mean scaled by standard deviation
cohenâs d for anovaâs calculations
d = mean(max) - mean (min) / đ
what are the assumptions for cohenâs d for ANOVAs calculation?
- standard deviation is assumed to be constant across groups
- only true for samples with met ANOVA assumptions (homogeneity of variance assumption -> assessed using Leveneâs test)
There are various conventions to calculate the overall standard deviation for an ANOVA. What are these?
- averaging group SD
- taking the smaller SD (more conversation)
- pooling the variance (looking at different combinations and using the outcome of that)
what is eta squared?
proportion of variance explained by your experiment
(variance thatâs explained by your variance / total variance of the model)
- used for one-way ANOVAs
why is eta squared only used for one way ANOVAs?
cause only one variable and one effect size
what is the equation for eta squared?
Eta(n)2 = SSeffect / SStotal
how to calculate eta squared?
divide the sum of squares for the effect by the total sum of squares
what is partial eta squared?
proportion of variance that is uniquely explained by each variable -> finding out how strong / big the effect of your variables
- factorial ANOVAs used (tells you in estimates of effect size ANOVA table)
why is partial eta squared used for factorial ANOVAs?
more than one variance and effect size
what does p value measure?
the significant percentage of variance that is explained
what is the equal for partial eta squared?
Partial eta2 = SSeffect / SSeffect + SSerror(/residual)
what is partial eta squared??
ratio of variance associated with an effect, plus that effect and its associated error variance
what are (partial) eta squared scaled between?
0 (none of the variance) and 1 (100% of the variance)
how do you report effect size?
alongside other statistical information:
F(df)=âŚ., p=âŚ., n2 =âŚ
can you have a significant effect, yet a really small eta squared?
yes -> in this case, think about how meaningful this is and what else could be having an effect on your dependent variance (if there is only a small significant effect)
what is power analysis?
used to determine whether your design is appropriate and may find the effect youâre looking for
- can be confident about the result you are looking for -> by how powerful your design is
when is a power analysis run?
usually before you start collecting data
what does the âpowerâ of a statistical test do?
- ability to detect an effect when it is actually there (powerful enlightenment to be able to see an effect)
- ability of a test to correctly reject the null hypothesis (can confidently say when zooming in there is nothing there)
what does power dependent upon / influenced on?
sample size, criteria for significance and effect size
sample size
the more participants = increased power and chance of finding a significant effect (if there is one)
* useful because if we know how many subjects we need to detect an effect size at a given power
what is usually considered a good level of power?
around 0.8
- can help you understand that you may need like at least 17 subjects to achieve a significant effect at a power level of 0.8
[minimum number of participants to test in order to find an effect)
(power depending on) effect size
find an effect we know is there in reality
* smaller effect size need more participants to achieve a higher power -> interaction between number of participants, power and effect size [if we know at least two of these, then we can estimate the other that we are trying to find]
(power depending on) power of significance [alpha level, a]
as you increase/decrease alpha level, it will influence how powerful your design is
what do we want to do about power?
make sure we have enough power to detect an effect
( be confident in that result, or be confident that their isnât a result -> reject null hypothesis it indeed shows a null result)
what are the dangers of an underpowered study?
- unpowered (too few participants)
- lack of power to detect effect -> type 2 error (false negative)
- increased chance of type 1 error (false positive) -> canât be confident if the same size is smaller etc.
- If we estimate power across many published studies it is often worryingly low
- e.g. Button et al (2013) estimated the median power of studies is around 0.08-0.31
- Low power explains failure to replicate
- current replication crisis in Psychology (and other disciplines) -> and therefore we donât know whoâs right
power analysis also carries ethical issues. what ethical issues can this carry?
- Expensive
- Inconvenient
- Boring
- Uncomfortable
- Painful
- Dangerous
- Even Fatal (animal research)
what is a requirement for ethical approval now?
demonstrating your study has sufficient power
* no justification for running a study if it doesnât stand a reasonable chance of being informative esp if painful, dangerous etc
- wasting peopleâs time is unethical too including causing someone discomfort if you are not confident in your result
what should you look into, in order to be relatively confidence about your findings?
pilot studies
what are four ways in which we can estimate effect size?
- Guess
- Pilot Study
- Find Previous Research
- Find or Conduct a Meta-analysis
- Guess
- good if thereâs much literature on your phenomenon
- you can guess / estimate the effect size of your experiment by using cohenâs heuristic
BUT itâs not terribly satisfying nor super informative :/ [garbage in, garbage out model -> based on nothing] -> but can be used to say that even if the effect is tiny, we had enough participants to find an effect
- Pilot Study
- Fewer participants to estimate the effect size
- does not matter if study comes out significant -> still use it to get an estimate of r
- you can use this estimate to project how many participants you should test
- BUT estimates of effect sizes are largely unreliable with such small samples :/
- Find previous research
- literature search -> studies investigating similar phenomenons
- use their results to work out an expected effect size for your experiment
- these studies will not use the exact same size :/
BUT a-priori power estimate are not exact - You canât predict the future!
- Better than guessing!
- Find or conduct a meta-analysis
- literature search
- if there are many previous studies you can calculate an average effect size across all of their results
- common in drug trials
- often a meta-analysis will have already been published that you can take effect sizes from or work them out yourself from them
what is a problem with power analysis?
GIGO: Garbage in -> Garbage Out technique
* If you make up the numbers you enter -> what you get out is not meaningful
* estimates are not exact because you canât predict the future
* for really complex designs (i.e. factorial), it is unlikely you will have sufficient precision in your estimates of effect size
- main effect & interactions
- power analysis may not be meaningful -> gets lower and lower when you get through really complicated designs -> should be interpreted with caution
what are the three types of power analysis?
- A priori
- Sensitivity
- Post Hoc
A priori
Calculate how many participants are required for a study (-> given effect size, power, alpha)
Sensitivity
Calculate minimum required effect size detectable (-> given power, sample size and alpha)
Post-Hoc
Calculate observed power (-> given effect size, alpha and sample size)
How to conduct a power analysis using G*Power
- Test Family: F Tests (everything ANOVA giving you an F ratio) [obvs T instead for T tests]
- Statistical Tests -> different options for the ones we are looking at -> we will typically look at ANOVA (select the one which is most appropriate)
- Select the Type of Power Analysis (Post Hoc, A Priori or Sensitivity)
- Enter the study design in the input parameters section
- find the output -> i.e. if only 0.3, not significant really
what will the critical F value tell us?
the minimum F ratio required to reach significance
Powerful analysis is useful in which situations?
- estimating how many participants you should recruit
- determining the power of a study design
- working out the minimum effect size a study could detect reliably
*may be stuff on cheat sheet online which is helpful