8. Comparing means: assumptions and transformations Flashcards

1
Q

what are the assumptions of of statistical interference from a normal distribution

A

Data are sampled at random
■ for response variables conditioned on explanatory variables

Samples are independent.

The difference between observations and predictions are normally distributed.

The mean and variance of errors are independent of the explanatory variable(s).

One source of unmeasured random variance.

Variance among groups is equal
■ and if not, then you use an adjustment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what methods can you use when your response variable does not have a
normal distribution

A

First check how much it deviates from normality.
- use a normal quantile plot
- and a Shapiros-Wilk’s test

Then you can either:

  • ignore the violations of the assumptions
  • transform the data
  • use a non-parametric method
  • use a permutation test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a Shapiros-Wilk’s test do?

A

Evaluates the goodness of fit of a normal distribution

Can quantify deviation from normality
■ Doesn’t tell you whether the data is normally distributed tho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When is it sensible to ignore the violated assumptions of a normal distribution?

A
  • when using robust statistics (Central limit theory)
  • if the shape of distribution is similar
  • when using an accepted adjustment for difference in var/s.d (welch’s t-test)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is central limit theory?

A

Sum of mean of large random sample from a non-normal population is approximately normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we transform data and when do we do each?

A

log
- ratios and products of variables

nat-log
- ratios and products of variables
- skewed freq distribution
- group with larger mean also has larger s.d
- data spans several orders of mag

arc-sin
- proportions

sqr-root
- counts and right-skewed

Reciprocal
- right skewed

Exponential
- left skewed

sqr
- left skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a non-parametric method do?

A

calculates probabilities in a way that does not depend on normality of response variable

less powerful though

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are two non-parametric methods?

A

Sign test

Mann-Whitney U-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you do a sign test?

A

Turn the difference in data points
into a binomial data set

  • calc difference
  • assign ‘+’ or ‘-‘ based on if it is > or < 0
  • count number of ‘+’ and ‘-‘
  • H0 expects #’+’ == #’-‘
  • use binomial distribution to calculate p-value for test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you do a Mann-Whitney U-test?

A
  • order all the data into smallest to largest
  • give each a ranks starting with 1
    (if same rank use the average e.g. 3 and 4 become 3.5)
  • calculate rank sum of each group
  • calculate u-statistic for each group
  • larger the u statistic is used as the test static
  • compare to cv from a table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the assumptions of the Mann-Whitney U-test?

A
  • data is randomly sampled
  • tests if data has different distributions
    (not robust to test for same central tendencies)
  • distribution is same shape
  • low power due to not using all data
    (greater type 2 error)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a permutation test (what else is it known by)?

A

use of a computer to repeatedly randomly sample your sample
to produce a null distribution with a large sample size

aka bootstrapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps of a permutation test?

A

Create response variable that are randomly re‐ordered.

Calculate the measure of association for the
permuted sample
● (e.g. the difference in means, medians, etc.)

Repeat the permutation process many times
● at least 1000 or more to create a null
distribution

Compare to observed value of test static calculated from original data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly