Statistical models 2 Flashcards
What is parametric data?
- Parametric data = normal data.
- Non-parametric data = not normal or non-normal
- Bell curve.
- Not too skewed (sway to left or right).
- Not too kurtotic (flat or peaky).
- No outliers (extreme values).
Why is it important to identify parametric data
- Normality is an assumption of some statistical models, mathematically.
- If we violate normality and use a parametric test, we may not be able to trust the
model estimates.
Steps to identify if you need to use parametric test on data
Has an outlier?
Can outlier be removed?
Data is skewed or kurtotic?
Can the data be transformed?
Testing for outliers
- Box plot, very easy in jamovi.
- The thick line in the middle of the box = median.
- The box itself spans from the 25th percentile to the
75th percentile (or inter quartile range). - Whiskers indicate acceptable values (not outliers).
- Any observation whose value falls outside this
acceptable range is plotted as a dot and is not
covered by the whiskers = outlier. - Common alternative: 3 standard deviations
(SD) from the mean (+/-).
What to do when you have an outlier
- Run a non-parametric test.
- Commonly done if it’s a “true” value. E.g. testing went well, the participant
understood task instructions, but scored very low; this performance represents that
participants ability. - Remove the value and leave as missing.
- Commonly done when working with big data sets, where you’re not going to check
participant records and have plenty of statistical power. - Remove the value and replace with nearest acceptable value.
- Commonly done in psychological studies.
- Remove value and replace with mean.
- Historical, not commonly done these days
Shapiro-Wilk test: Testing skew and kurtosis
- Shapiro-Wilk test. Very easy in jamovi.
- Takes into account both skew and kurtosis.
- W statistic.
- Maximum value of 1 = data looks “perfectly normal”.
- The smaller the value of W the less normal the data are.
- p value (of W statistic).
- Typically, <.05 = non-normal data.
- Therefore, ≥.05 = normal data.
Parametric tests
Pearson correlation
T-test - 2 groups
(between groups or within groups)
ANOVA - >2 groups
(IRM, we’ll work with between
groups)
Non-parametric tests
Spearman correlation
Wilcoxon test
(2 groups/conditions)
Kruskall-Wallis test
(3 or more groups)
Parametric v non-parametric tests
- There are generally non-parametric versions of all parametric tests.
- We do non-parametric tests when our data are not normally
distributed. - We do parametric tests when our data are normally distributed.
- Parametric tests have more statistical power, so, they are preferred
and are generally the default set of tests.
Degrees of freedom
- Important to the mathematical
calculations of parametric and non-parametric tests. - Based on the quantities of data in your
model, e.g. participants or factors. In the
models we will use in IRM, degrees of
freedom (df) will mostly be the number of participants - 1. - For the most part, a higher df = more statistical power.
The independent samples t-test
In psychology, this tends to correspond to two different groups of participants, where each group corresponds to a different condition in your study. For each person in the study you measure some outcome variable of interest, and the research question that you’re asking is whether or not the two groups have the same population mean.
Homoscedastic
Even distribution/variance across correlation line
Homogenous or equal variance.
- Refers to homogeneity of variance.
- A common assumption for parametric statistical models.
- Most commonly tested using Levene’s test (F):
- p<.05 (less than) violated homogeneity of variance.
- p≥.05 (greater than) all is fine, you have not violated homogeneity of variance.
- When you violate the assumption of homogeneity of variance, use
Welch’s (t-test) Test.
Heteroscedastic
Uneven distribution/variance across correlation line
Heterogenous or unequal variance.
Independence (in context of parametric tests)
- Observations are independent, i.e. no two observations in a dataset
are related to each other or affect each other in any way. - A common assumption for parametric statistical models.
- You need to check this using logic. There’s no test for it.
Pearson correlation
- Pearson correlation coefficient is referred to a as r.
- Assesses linear (straight line) relationships between two variables.
r = 0 no relationship
r = 1 perfect positive relationship
r = -1 perfect negative relationship
Correlations are also measures
of effect size!