Statistical models 2 Flashcards

1
Q

What is parametric data?

A
  • Parametric data = normal data.
  • Non-parametric data = not normal or non-normal
  • Bell curve.
  • Not too skewed (sway to left or right).
  • Not too kurtotic (flat or peaky).
  • No outliers (extreme values).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is it important to identify parametric data

A
  • Normality is an assumption of some statistical models, mathematically.
  • If we violate normality and use a parametric test, we may not be able to trust the
    model estimates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Steps to identify if you need to use parametric test on data

A

Has an outlier?
Can outlier be removed?
Data is skewed or kurtotic?
Can the data be transformed?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Testing for outliers

A
  • Box plot, very easy in jamovi.
  • The thick line in the middle of the box = median.
  • The box itself spans from the 25th percentile to the
    75th percentile (or inter quartile range).
  • Whiskers indicate acceptable values (not outliers).
  • Any observation whose value falls outside this
    acceptable range is plotted as a dot and is not
    covered by the whiskers = outlier.
  • Common alternative: 3 standard deviations
    (SD) from the mean (+/-).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What to do when you have an outlier

A
  • Run a non-parametric test.
  • Commonly done if it’s a “true” value. E.g. testing went well, the participant
    understood task instructions, but scored very low; this performance represents that
    participants ability.
  • Remove the value and leave as missing.
  • Commonly done when working with big data sets, where you’re not going to check
    participant records and have plenty of statistical power.
  • Remove the value and replace with nearest acceptable value.
  • Commonly done in psychological studies.
  • Remove value and replace with mean.
  • Historical, not commonly done these days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Shapiro-Wilk test: Testing skew and kurtosis

A
  • Shapiro-Wilk test. Very easy in jamovi.
  • Takes into account both skew and kurtosis.
  • W statistic.
  • Maximum value of 1 = data looks “perfectly normal”.
  • The smaller the value of W the less normal the data are.
  • p value (of W statistic).
  • Typically, <.05 = non-normal data.
  • Therefore, ≥.05 = normal data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parametric tests

A

Pearson correlation
T-test - 2 groups
(between groups or within groups)
ANOVA - >2 groups
(IRM, we’ll work with between
groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Non-parametric tests

A

Spearman correlation
Wilcoxon test
(2 groups/conditions)
Kruskall-Wallis test
(3 or more groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parametric v non-parametric tests

A
  • There are generally non-parametric versions of all parametric tests.
  • We do non-parametric tests when our data are not normally
    distributed.
  • We do parametric tests when our data are normally distributed.
  • Parametric tests have more statistical power, so, they are preferred
    and are generally the default set of tests.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Degrees of freedom

A
  • Important to the mathematical
    calculations of parametric and non-parametric tests.
  • Based on the quantities of data in your
    model, e.g. participants or factors. In the
    models we will use in IRM, degrees of
    freedom (df) will mostly be the number of participants - 1.
  • For the most part, a higher df = more statistical power.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The independent samples t-test

A

In psychology, this tends to correspond to two different groups of participants, where each group corresponds to a different condition in your study. For each person in the study you measure some outcome variable of interest, and the research question that you’re asking is whether or not the two groups have the same population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Homoscedastic

A

Even distribution/variance across correlation line
Homogenous or equal variance.

  • Refers to homogeneity of variance.
  • A common assumption for parametric statistical models.
  • Most commonly tested using Levene’s test (F):
  • p<.05 (less than) violated homogeneity of variance.
  • p≥.05 (greater than) all is fine, you have not violated homogeneity of variance.
  • When you violate the assumption of homogeneity of variance, use
    Welch’s (t-test) Test.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Heteroscedastic

A

Uneven distribution/variance across correlation line
Heterogenous or unequal variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Independence (in context of parametric tests)

A
  • Observations are independent, i.e. no two observations in a dataset
    are related to each other or affect each other in any way.
  • A common assumption for parametric statistical models.
  • You need to check this using logic. There’s no test for it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Pearson correlation

A
  • Pearson correlation coefficient is referred to a as r.
  • Assesses linear (straight line) relationships between two variables.

r = 0 no relationship
r = 1 perfect positive relationship
r = -1 perfect negative relationship

Correlations are also measures
of effect size!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pearson correlation Assumptions

A
  • Linear relationship (straight line).
  • Parametric data/normality.
  • Homogeneity of variance (homoscedasticity).
  • Independence.
  • At least one variable needs to be continuous.
  • The other variable can be continuous or dichotomous.
17
Q

T-tests

A
  • When you want to compare two means.
  • If “group 1” is larger than “group 2” the t statistic will be positive; if
    “group 2” is larger then the t statistic will be negative.
  • T-tests are not measures of effect size.
  • We traditionally use Cohen’s d to measure effect size for t-tests.

p value less than 0.05 is statistically significant

T value:
This is the result of the t-test formula, which compares the means of two groups (or a sample mean to a population mean) relative to the variability in the data (standard error).
A higher absolute t-value indicates a larger difference between groups (or means) relative to the variance, which suggests a stronger deviation from the null hypothesis.

18
Q

Assumptions of t-test

A
  • Parametric data/normality.
  • Independence.
  • Homogeneity of variance (homoscedasticity).
  • DV needs to be continuous.
  • IV needs to be dichotomous (groups or time points).
19
Q

Between group (independent) t-tests

A

If the two means are from different people.

20
Q

Within group (dependent or paired) t-tests

A

if the two means are from the same people at different times.

21
Q

Spearman’s rank order correlation

A

Treat data as an ordinal scale and rank each variable in order. e.g. That is, student 1 did the least work out of anyone (2 hours) so they get the lowest rank (rank = 1). Student 4 was the next laziest, putting in only 6 hours of work over the whole semester, so they get the next lowest rank (rank = 2).

22
Q

Cohen’s d

A

effect size that tells us how large the difference between groups of data. The strength of the effect.

A d of 0.5 indicates that the two group means differ by 0.5 standard deviations.
A d of 1 indicates that the group means differ by 1 standard deviation.
A d of 2 indicates that the group means differ by 2 standard deviations.

A value of 0.2 represents a small effect size.
A value of 0.5 represents a medium effect size.
A value of 0.8 represents a large effect size.

23
Q

Error in statistical models…

A

Is important to measure, report and interpret

24
Q

Data = x + y, where x and y are…

A

Model and error

25
Q

Statistical significance is not…

A

A measure of effect size

26
Q

What three factors affect statistical power?

A

Sample size, effect size, type 1 error rate