Research methods Flashcards

1
Q

The misuse of NHST (Null Hypothesis of Significance Testing)

A
  • The American Statistical Association (2016) outlined principles on the misuse of p values in significance testing
    1. P-values are not measuring the probability of getting results by chance, or that a specific hypothesis is true
    1. Statistical significance is not the same as practical importance
    1. The p-value alone is not a good measure of evidence regarding model or hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Type 1 and Type 2 errors

A
  1. Type 1 = incorrectly accepting alternative hypothesis
  2. Type 2 = incorrectly accepting null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Power

A
  • The probability of finding an effect assuming one exists in the population
  • Calculated as 1-B
  • B is the probability of not finding the effect (usually 0.2 as stated by Cohen)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What effects power? 3 factors

A
  1. Effect size: an objective and standardised measure of the magnitude of an effect (larger value = bigger effect size)
    Depends on test concluded – cohen’s d, pearson r, partial eta squared (ANOVA)
  2. Number of participants: more participants = more ‘signal’, less ‘noise’. You should choose sample size depending on the expected effect size (larger effect size = fewer pp’s, smaller effect size = more pp’s)
  3. Alpha level: the probability of obtaining a Typer 1 error. We compare our p value to this criterion when testing significance
    - Other factors: variability, design, test choice
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Problems with alpha testing

A
  • If we run multiple tests, this will increase the rate at which we might get a type 1 error (family wise experimental error rate)
  • We can account for this by limiting the number of test or by using corrections such as Bonferroni correction (but this reduces statistical power)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between one and two-tailed tests

A
  • One-tailed- we hypothesise there will be a difference in scores, and we’re specific about which score will be higher (α=.05 at one end)
  • Two-tailed- We hypothesise there will be a difference in scores, but this could be in either direction (α= .025 at both ends)
  • For a one-tailed test, our p-value is half of the two-tailed p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which type of test do I run?

A
  • One-tailed tests are more powerful as a is higher
  • However, there are several caveats and considerations so in most cases, it is recommended that run a two-tailed test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Power and study design:

A
  • Within-subjects studies are more powerful than between-subjects studies
  • To run a t-test with a: two-tailed design, medium effect size, a level of 0.05, power level of 0.8
  • 1) Calculate the power we have obtained in a study post-hoc
  • 2) Calculate how many participants we need to collect for a study a priori (this can be done using statistical programs such as G*Power)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is analysis of variance?

A
  • Analysis of variance (ANOVA) is an extension of the t-test
  • it allows us to test whether 3 or more population means are the same, without reducing power
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assumptions of ANOVA

A
  • the scores were sampled randomly and are independent
  • roughly normal distribution
  • roughly equal number of participants in the groups
  • roughly equal variance for each condition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The basis of the ANOVA test

A
  • analysis of variance is a way to compare multiple conditioned in a single, powerful test
  • It was invented by Fisher (so its test statistic is F)
  • It compares the amount of variance explained by our experiment with the variance that is unexplained
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Between-groups ANOVA

A
  • The aim of ANOVA is to compare the ‘amount of variance explained by our experiment with the variance that is unexplained’
  • For between-group designs:
  • A) the explained variance is the variance between group
  • B) the unexplained is the variance within a group
  • The calculation is referred to as the mean squared (MS) error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Degrees of freedom

A
  • There are degrees of freedom associated with both variance values:
  • A) degrees of freedom between conditions
  • B) residual degrees of freedom
  • ANOVA critical values require 2 d.f. values, one for each aspect of the variance
  • We must report both
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pair-wise comparisons

A
  • ANOVA tells us whether groups differ or not
  • How do we know which particular conditions?
  • Run the multiple comparisons (those we were trying to avoid0
  • Some of these are ‘planned comparisons’, some are ‘post-hoc’
  • Correct for multiple comparisons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Versions of ANOVA

A
  1. Analysis of variance (ANOVA) – one factor ANOVA and multifactor ANOVA
  2. Multivariate analysis of variance (MANOVA) – extension of ANOVA for multiple dependent variables
  3. Analysis of covariance (ANCOVA) – extension of ANOVA to handle continuous variables (e.g. correlations)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is ANOVA based on? (for between-groups)

A

A) the variance explained by the experiment (the effect)
B) the residual (remaining) variance that cannot be explained (noise)
- For between-group design, the variance comes from only two sources:
A) variance between groups (explained)
B) variance within groups (unexplained)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Between group vs repeated measures for ANOVA

A
  • For repeated-measure design, there are three possible sources of variance:
    A) variance between conditions
    B) variance between subjects (individual differences)
    C) residual (unexplained) variance
  • In the between-group study, the variance between subjects fell under the category ‘unexplained’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the F-ratio and MS unexplained formulas?

A
  • F = MS explained / MS unexplained
  • MS explained is the variance between conditioned
  • MS unexplained is the remaining variance after accounting for individual differences
  • MS unexplained = MS total – MS explained – MS ind diffs
  • MS ind diffs is the variance between subjects within a condition
  • MS total is the variance of all subjects in all conditions
19
Q

What is multi-factorial ANOVA?

A
  • Like repeated-measures ANOVA
  • Factors can all be within-subject, all between-group or a ‘mixed’ design
  • We can have a ‘main’ effect or a variety of ‘interactions
  • Main effect = one of the factors (IV) consistently affects the (DV) in the same way
  • Interaction = the effect of one factor depends on the presence of another
20
Q

What is 2x2 ANOVA?

A
  • The multifactorial ANOVA is a single test
  • It returns multiple F values (one for each main effect to be checked and one for the interaction)
  • With one two levels, there is no need for post-hoc tests
  • So 2x2 is just a single test (no family-wise error)
21
Q

What are contingency tables?

A
  • A table of frequencies for how often an often an observation occurs in a category
  • Categories must be mutually exclusive and exhaustive
22
Q

What is the Chi-Square test?

A
  • Devised by Karl Pearson in 1900, also known as Pearson’s chi-square
  • Calculates how often a particular observation falls into a category based on how many were expected by chance
  • Null hypothesis = the frequencies observed were expected by chance
  • Alternative hypothesis = the frequencies observed reflect real differences in categories
  • Assumptions: 1. Independence (each person can only contribute to one cell of a contingency table) 2. Expected frequencies (all expected counts should be greater than 1 and no more than 20% of expected counts should be less than 5)
23
Q

Violating expected frequencies: and how to reverse it

A
  • Results in a loss of power
  • How to reverse this:
  • A) use an ‘Exact’ test instead
  • B) remove data across one variable
  • C) collapse levels of one variable
  • D) collect more data
  • E) accept the loss of power
24
Q

Chi-square by hand: one IV

A
  1. Calculate expected frequencies
  2. Calculate Chi-Square value based on observed and expected frequencies
  3. Compare Chi-Square value against a critical values table
    - To interpret a table, we need to know our degrees of freedom, and our desired alpha value (degrees of freedom = number of categories – 1)
25
Q

Chi-square by hand: two IVs

A
  • With two IVs, the difference will be in calculating the expected values in each case
  • To calculate expected frequencies for two IVs, we need to calculate expected frequencies of specific cells
26
Q

What is the binomial test?

A
  • Compares observed and expected frequencies for variables with only two levels
  • E.g. are there more pp’s in our sample from the USA than we would expect by chance?
27
Q

Inferring from samples

A
  • Statistical power: probability of seeing a true positive
  • Alpha (a): the highest acceptable risk of a false positive (typically 5%)
28
Q

Publication bias & the file drawer problem

A
  • Researchers biased towards results which support their theories
  • Significant results are more likely to be published
  • Many journals value novelty and surprising results
  • Non-significant results are often not published
  • Non-significant replications are hard to publish
  • Researchers are under pressure to find significant results
29
Q

Researcher degrees of freedom

A
  • There are many valid ways to analyse a given dataset:
  • A) different statistical tests
  • B) different variables
  • C) different rules for excluding outliers
30
Q

P-hacking & HARKING

A
  • P-hacking is a way to cheat/lie with statistics
  • For any test, we accept a 5% probability of a false positive
  • P-hacking: performing the analysis in different ways to get p<.05 and only reporting the significant results
  • This results in false positive: we cannot trust the results
  • HARKING: hypothesising after the results known
31
Q

Multiverse analysis

A
  • Run many possible analysis
  • See how many get a significant result
  • Munoz & Young (2018) analysed the data with N = 1152 regressions
  • Less than 5% had a significant effect
32
Q

Two problems: True and False positives

A
  • Significant results easer to publish, including false positives
  • Many papers are underpowered, true positive are not seen
  • Leads to many false positives in the literature
33
Q

How to solve the reproducibility crisis? 3 ways

A
  1. Open materials – share the exact materials (instructions, program, stimuli), makes it easier for others to replicate
  2. Open data – share the raw data so other researchers can perform the analysis and see how other variables/ analyses affect the results
  3. Preregistration – plan the study in advance, including materials, planned analysis (e.g. open science framework), prevents p-hacking and HARKING. Researchers can compare your pre-registration to the final study
34
Q

Why do we need to visualise data?

A
  • Makes it easier to understand datasets
  • E.g. Florence Nightingale used data visualisation to highlight British soldiers living conditions in the Crimean War (1858)
35
Q

What is the purpose of data visualisation?

A
  • Data analysis process: check that assumptions have been met and understand the relationships between variables before inferential analysis
  • Report writing and publication process: show clear relationships between variables and help the reader interpret the data in the way you want them to
36
Q

Types of graphs: 3 types

A
  1. Checking data assumptions – graphs can be useful for checking data assumptions before running statistical tests (e.g. histograms, boxplots)
  2. Summarising descriptives – graphs can also help to summarise lots of descriptive statistics (e.g. bar charts, clustered bar charts)
  3. Graphing relationships – we can use scatterplots to graph relationships between variables (and sometimes check assumptions)
37
Q

What makes a good graph? 7 things

A
  • Tufte (2001) and the American Psychological Association (2021) suggest that:
    A) images are clear
    B) units of measurement are provided
    C) axes are clearly labelled
    D) elements in the figure are clearly labelled or explained
    E) avoid distorting the data
    F) induce the reader to think about the underlying messages of the figure
    G) avoid using chartjunk (the use of unnecessary or misleading elements in the design of a graph
38
Q

Bad graphs in public health data

A
  • no x and y axis labels
  • dates on the x axis are in a random order
  • colours of bars are not consistent across clusters of bars
39
Q

What is ‘parametric’?

A
  • They are based on some commonly used parameters (e.g. standard deviation), which assume a normal distribution
  • If the data is not normally distributed, then those parameters might not be meaningful
  • Most data on a ratio or interval scale are normally distributed
  • When data are heavily skewed, the step size between points on the scale is probably not constant
  • When data are ordinal you may not have a normal distribution and parameters like SD, SEM, variance may no longer capture the data
40
Q

Non-parametric test: Mann-Whitney U

A
  • If we arrange all the data into an ascending sequence of scores:
    a) When the null hypothesis is true, we would expect the group labels then to be randomly distributed
    b) When the null hypothesis is false, we would expect the scores of the two groups to be clustered at either end of the sequence
41
Q

Non-parametric test: Wilcoxon T

A
  • Test is based on the difference between the 2 scores for each subject
  • It uses the direction (which score is greater) and magnitude of the differences (how much greater)
  • If H0 is true, then the differences in one direction will be as large as the differences in the other
  • Whenever, we have tied ranks, we take the average of the range of ranks that the ties cover and allocate this to the value of the ties
42
Q

Non-parametric test: Kruskal-Wallis H

A
  • When the null hypothesis is true, we expect a random distribution of ranks across groups
  • When the null hypothesis is false, we expect a systematic distribution of ranks across groups
43
Q

Non-parametric test: Friedman test

A
  • Rank within each subject
  • Does not tell us which conditions differ