Revision Flashcards
How to report Pearson correlation
r (230) = 0.16, p = 0.013
OR r = 0.16, p = 0.013, n = 231
How to report Spearman correlation
r s (230) = 0.1, p = .889
OR
rs = 0.01, p = .889, n = 231
How to report a t test with Cohen’s d effect size
t (207) = 3.61, p < .001, d = 0.48
How to report a Wilcoxin test
W = 5815, p = .098, n = 231
How to report ANOVA (Fishers)
F (2, 228) = 16.2, p < .001.
Skew (symmetry of distribution)
Positive is tailed right
Kurtosis (tail ends of distribution)
Negative kurtosis - Platykurtic
Normal distribution - Mesokurtic
Positive kurtosis - Leptokurtic
What is a statistical model?
- A statistical model uses maths to summarises a dataset relative to
multiple variables. - A simple description of relationships in the dataset.
- Where descriptive statistics describe the data, inferential statistics
use statistical models. These models enable you to make
inferences
about the data
, e.g. you can decide whether two variables are
associated or whether one group is bigger than the other.
What is parametric data?
- Parametric data = normal data.
- Non-parametric data = not normal or non-normal.
- So, what’s normal?
- Bell curve.
- Not too skewed (sway to left or right).
- Not too kurtotic (flat or peaky).
- No outliers (extreme values).
- Why do we care?
- Normality is an assumption of some statistical models, mathematically.
- If we violate normality and use a parametric test, we may not be able to trust the
model estimates.
When to use parametric tests on continuous data
has no outliers or they can be removed
Data is not too skewed or kurtotic
non-parametric tests used if has outliers that cannot be removed or is too skewed or kurtotic
Testing for outliers
- Box plot, very easy in jamovi.
- The thick line in the middle of the box = median.
- The box itself spans from the 25th percentile to the
75th percentile (or inter quartile range). - Whiskers indicate acceptable values (not outliers).
- Any observation whose value falls outside this
acceptable range is plotted as a dot and is not
covered by the whiskers = outlier. - Common alternative: 3 standard deviations
(SD) from the mean (+/-). (can use z scores for this)
What to do if there is an outlier
- Run a non-parametric test.
- Commonly done if it’s a “true” value. E.g. testing went well, the participant
understood task instructions, but scored very low; this performance represents that
participants ability. - Remove the value and leave as missing.
- Commonly done when working with big data sets, where you’re not going to check
participant records and have plenty of statistical power. - Remove the value and replace with nearest acceptable value.
- Commonly done in psychological studies.
- Remove value and replace with mean.
- Historical, not commonly done these days.
Testing skew and kurtosis
- Shapiro-Wilk test. Very easy in jamovi.
- Takes into account both skew and kurtosis.
- W statistic.
- Maximum value of 1 = data looks “perfectly normal”.
- The smaller the value of W the less normal the data are.
- pvalue (of W statistic).
- Typically, <.05 = non-normal data.
- Therefore, ≥.05 = normal data.
Parametric tests
Pearson correlation
T-test
(between groups or within groups)
ANOVA
(IRM, we’ll work with between
groups)
Non-parametric tests
Spearman correlation
Wilcoxon test
(2 groups/conditions)
Parametric v non-parametric tests
- There are generally non-parametric versions of all parametric tests.
- We do non-parametric tests when our data is not normally
distributed. - We do parametric tests when our data is normally distributed.
- Parametric tests have more statistical power, so, they are preferred
and are generally the default set of tests.
Degrees of freedom
- Important to the mathematical
calculations of parametric and non-parametric tests. - Based on the quantities of data in your model, e.g. participants or factors. In the
models we will use in IRM, degrees of freedom (df) will mostly be the number
of participants - 1. - For the most part, a higher df = more statistical power.
Sampling
Population-based sample
Representative of the population.
E.g. random sample of Medicare numbers.
Convenience samples
Not representative of the population.
E.g. clinic-based or through social media advertisement.
Psychology has a WEIRD sampling problem
- The vast majority of published psychological research
is on western, educated, industrialised, rich, and
democratic (WEIRD) samples. - Generalisation is limited when using these samples.
They are not the norm. - WEIRD populations represent as much as 80 percent
of study participants, but only 12 percent of the world’s
population.
Cross-sectional versus longitudinal
- Cross-sectional designs capture data at one point in time.
- Longitudinal designs capture data at more than one point
in time.
Experimental versus observational
- Observation designs do not manipulate any variables.
- Experimental designs manipulate a variable (termed
condition); participants are assigned to one condition at
random. - Quasi-experimental designs do not manipulate any
variables participants are assigned to a condition based on non-random criteria.
Within-subject versus between-subject
- Between-subject designs collect data from participants
relative to one condition. - Within-subject designs collect data from participants
relative to more than one condition (usually all
conditions). This design is also called repeated
measures. - A design can be mixed, with both between- and within
subject assessments.