effect sizes Flashcards

1
Q

why do we need to report effect sizes:

A

An objective and standardised measure of the magnitude
of an effect.
Reporting effect sizes in research is crucial for several reasons. Effect sizes provide a standardized measure of the magnitude of an effect or relationship, offering a clearer picture of the practical significance of results beyond the binary distinction of statistical significance (e.g., p-values). Here are the key reasons:

  1. Understanding Practical Importance
    Effect sizes quantify the strength or size of an effect, making it easier to assess whether the observed effect is meaningful or trivial in a real-world context.
    For instance, while a study might find a statistically significant difference, the actual effect might be too small to matter in practice.
  2. Comparability Across Studies
    By standardizing outcomes, effect sizes allow comparisons across different studies, even when they use different scales or metrics.
    This is critical in meta-analyses, which aggregate results from multiple studies to draw broader conclusions.
  3. Complementing Statistical Significance
    P-values only indicate whether an effect exists (or is unlikely to have occurred by chance) but do not reflect its size or relevance.
    Effect sizes provide additional information to interpret the importance of findings, especially when large sample sizes can make small, negligible effects statistically significant.
  4. Transparency and Reproducibility
    Including effect sizes promotes transparency by providing a fuller understanding of the data and its implications.
    It facilitates reproducibility, as future researchers can replicate studies and understand the expected magnitude of effects.
  5. Informing Policy and Practice
    For evidence-based decision-making, policymakers, educators, clinicians, and other practitioners need to understand how much of a difference an intervention or variable makes.
    Effect sizes guide these stakeholders in evaluating the impact and feasibility of applying research findings.
  6. Guiding Future Research
    Effect sizes help identify promising areas for future exploration by highlighting where meaningful impacts have been observed.
    Researchers can use effect sizes to perform power analyses, determining the sample sizes needed for detecting effects in subsequent studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

benefits of reporting effect sizes

A
  • Generally, resistant to sample size influence.
  • Encourages interpreting effects on a continuum.
  • Can be used to quantitatively compare the results
    of studies completed in different settings.
    Pooling Effect Sizes (Meta-analysis)
  • Estimating the size of an effect in the population by
    pooling effect sizes from different studies that test the
    same hypotheses.
  • Can help resolve inconsistencies in findings.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

problem of effect size- ‘labels’

A

highlighted in Baguley (2009). Below is an explanation of these disadvantages in the context of Baguley’s arguments:

  1. Are the Labels Misleading?
    Problem: Labels like “small,” “medium,” and “large” (e.g., Cohen’s benchmarks) may oversimplify or mislead interpretation. An effect size categorized as “small” might still have substantial practical significance in certain contexts (e.g., medical interventions where small effects can save lives). Conversely, a “large” effect may be irrelevant in some scenarios.
    Baguley’s Argument: Pre-defined benchmarks (“canned labels”) should be avoided. Effect sizes should be interpreted relative to the research context, not arbitrary thresholds.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

problem of effect size: one size fits all

A
  1. One Size Fits All
    Problem: Standardized effect sizes (like Cohen’s 𝑑d or Pearson’s 𝑟r) are often treated as universally applicable, yet they may not suit all research goals or comparisons. For instance, they can obscure practical or theoretical importance due to standardization against sample variability rather than absolute values.
    Baguley’s Argument: There is no single metric that meets all needs, such as gauging effect importance, enabling comparisons, or aiding secondary analyses. Simple or unstandardized effect sizes (e.g., mean differences) are often more transparent and interpretable for practical purposes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

problem off effect size- considering questionable research practices

A
  1. Doesn’t Consider Potentially Questionable Research Practices
    Problem: Standardized effect sizes can be inflated by questionable practices like p-hacking, selective reporting, or range restriction. These practices distort reliability and comparability, making standardized metrics less trustworthy.
    Baguley’s Argument: Correcting standardized effect sizes for issues like measurement error or restricted range is theoretically possible but practically challenging and error-prone. Researchers should provide adequate descriptive statistics to allow for independent recalculations and evaluations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are BAUGLEYS suggestions

A

Baguley’s Recommendations:

Prefer Simple Over Standardized Effect Sizes:
Simple metrics (e.g., mean differences) directly reflect observed data and avoid reliance on variability measures that may vary by sample.

Use Confidence Intervals (CIs):
Confidence intervals convey uncertainty, making results more transparent and robust than point estimates.

Avoid Misleading Benchmarks:
Interpret effect sizes in the context of the study rather than relying on generic labels or thresholds.

Provide Descriptive Statistics:
Adequate descriptive statistics allow readers to compute alternative effect size metrics that suit their needs.

Favor Corrected Estimates:
When using standardized effect sizes, favor corrections for reliability or study design artifacts when feasible, but weigh the effort and added value.

Challenges with Standardized Effect Sizes:
Obscuring Practical/Theoretical Importance:
Standardization may focus attention on statistical metrics rather than the real-world implications of findings.

Comparability Issues:
Assumptions underlying standardization often fail due to differences in scales, designs, or populations.

Complexity of Adjustments:
Efforts to adjust for factors like measurement error or study design differences are often impractical, potentially leading to inconsistencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

changing sig values

A

PROPOSAL TO CHANGE SIG VALUE- Benjamin et al., 2017
* Historical Context:
* Ronald Fisher acknowledged the arbitrariness of the 0.05 threshold for statistical significance.
* Current research involves more scientists asking more questions, often with lower prior odds of success, necessitating stricter thresholds.
* Proposal:
* Lower the P-value threshold for claims of new discoveries to 0.005 for improved reproducibility.
* This change addresses standards of evidence, not policies for publication or action.
* Results with 0.005 < P < 0.05 should still be reported as suggestive evidence and merit publication if rigorously conducted.
* Implementation Considerations:
* Emphasize quality and transparency in research as stricter thresholds are adopted.
* Monitor researchers’ behavior to ensure higher standards do not compromise transparency or quality.
* Journals play a critical role in facilitating the transition to new thresholds.
* Interpretation and Communication:
* Researchers and readers should adapt their interpretation of findings to the new significance threshold.
* The proposed change aims to improve how evidence is understood and communicated, fostering greater accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

advantages and disadvantages of changing P value

A

Strengths of Changing to 𝑝<0.005p<0.005:

Reduction in False Positives:
Lowering the threshold to 𝑝<0.005
p<0.005 reduces the likelihood of Type I errors (false positives), which is critical in fields like medicine and clinical trials where incorrect conclusions could have significant consequences.

Improved Reproducibility:
A stricter threshold increases the robustness of findings and may improve reproducibility across studies. This is especially important given the replication crisis in fields like psychology and biomedical sciences.

Encourages Larger Sample Sizes:
To achieve 𝑝<
0.005p<0.005, studies may require larger sample sizes, which can improve statistical power and the reliability of results.

Distinguishing Stronger Evidence:
𝑝<0.005
p<0.005 signals stronger evidence against the null hypothesis, helping to differentiate findings with higher confidence.

Criticisms of Changing to
𝑝<0.005p<0.005:

Increased Risk of False Negatives:
Lowering the threshold increases the likelihood of Type II errors (false negatives), potentially dismissing findings that may have practical or scientific importance.

Burden on Researchers:
Requiring stricter 𝑝p-values would necessitate larger sample sizes, which may be impractical for studies with limited resources, particularly in small-scale or exploratory research.
Misuse of 𝑝

p-Values:
Critics argue that focusing on 𝑝p-values alone perpetuates “null hypothesis significance testing” (NHST) as the sole criterion for scientific validity, ignoring other critical factors like effect size, confidence intervals, and study design.

Lack of Universality:
Not all research fields require such stringent thresholds. In exploratory or early-stage research,
𝑝<0.05
p<0.05 may suffice to generate hypotheses for further testing.

Overemphasis on Significance:
Stricter 𝑝
p-values could exacerbate the problem of over-reliance on 𝑝. p-values as the sole determinant of research validity, diverting attention from other important metrics like replicability, practical significance, and study quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

article showing P value is not enough

A

Sullivan Fein 2013-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

issue with lowering P reference

A

Leung 2023
Higher Costs and Resource Demands: Achieving the necessary sample sizes to meet the stricter threshold can be financially and logistically challenging, potentially reducing the efficiency of conducting clinical trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Benjamin 2017 part 2

A

In the 2017 paper “Redefine Statistical Significance,” Daniel J. Benjamin and colleagues propose lowering the conventional p-value threshold for statistical significance from 0.05 to 0.005. This recommendation aims to enhance the reproducibility of scientific findings by reducing the likelihood of false positives. The authors argue that the traditional 0.05 threshold allows for a relatively high false positive rate, which can contribute to the replication crisis in scientific research. By adopting a more stringent p-value threshold of 0.005, the standard for claiming new discoveries would be more rigorous, thereby improving the reliability of published results.
RESEARCHGATE

To implement this change, Benjamin et al. suggest that research communities should collectively adopt the new threshold, particularly in fields where false positives are prevalent and the costs of such errors are high. They acknowledge that this shift may require larger sample sizes to achieve the necessary statistical power under the stricter threshold. However, they argue that the benefits of increased reproducibility and credibility in scientific findings outweigh these challenges. The authors also emphasize the importance of complementing this change with other methodological improvements, such as pre-registration of studies and sharing of data and code, to further bolster the robustness of scientific researc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly