Ch. 13 Flashcards

1
Q

What to do when the assumptions are not true?

A
  1. Ignore the violations
  2. Transform the data
  3. Use nonparametric method
  4. Use permutation test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is data not likely to be normal, graphically?

A

(pg. 371)
When distribution is strongly skewed or strongly bimodal, or has outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Normal Quantile Plot

  • What is it?
A

Compares each observation in the sample w/ its quantile expected from the standard normal distribution

If it is normally distributed, points should roughly be in a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Shapiro-Wilk test

What is it?

A

Shapiro-Wilk test evaluates goodness of fit of a normal distribution to a set of data randomly sampled from population (null hypothesis is that it is normal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Recommended methods of evaluating assumption of normality?

A
  • Can do statistical test (ex. Shapiro-Wilk test), but has false sense of security
  • IDEALLY should use graphical methods & common sense to evaluate (frequency distribution histograms, normal quantile plots)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Robust

Def’n?

A

A statistical procedure is robust if the answer it gives is not sensative to violations of the assumptions of the method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Normal Approximation - Ignoring violations

  • For what types of tests?
  • Threshold of “ignorance”?
A
  • For tests that use the mean (robustness due to the central limit theorem)
  • Ignorance threshold rule of thumb is n > 50(ish)
    • Also need to consider the shape of the distributions; skew needs to be similar, and no outliers. See pg. 376
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When can assumptions of equal standard deviations can be ignored?

A
  • If n > 30 for each group, and n is similar for both groups (approximately), then SD can be ignored even w/ greater than 3x difference
  • When:
    • n is not approximately equal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data transformation def’n

A

Data transformation changes each measurement by the same mathematical formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Purpose of a transform?

A
  • To attempt to make SD more similar and to improve the fit of the normal distribution to the data.

NOTE: This tranform will affect all the data AND the hypotheses equally; i.e. everything gets tranformed the same way. Also, can’t just do ln[s] for example; have to re-calculate starting with the mean of all the logs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Examples of possible transformations?

(indicate top 3)

A
  • Log transform
  • Arcsine transform
  • Square-root transform
  • Square transform
  • Antilog transform
  • Reciprocal transform
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Log tranform - what does it do?

A

Converts each data point to its logarithm

Ex. Y = ln[X]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What to do if you want to try log transform but data has zero?

A

Try Y’ = ln[Y + 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Log transform - When is it useful?

A
  • When measurements are ratios or products of variables
  • When frequency distribution is right skewed
  • When group that has the larger mean (when comparing 2 groups) also has the larger SD
  • When data spans several orders of magnitude

See pg. 378 for details

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Arcsine tranformation

  • What does it look like?
  • What is it needed for?
  • How does it fix things?
A
  • p’ = arcsin[sqrt(p)]
  • Used for proportions
  • Makes it closer to a normal distribution and also makes SD’s similar

Note: convert percentages into decimal proportions first b4 application

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Square-root tranformation

  • What does it look like?
  • What is it needed for?
  • How does it fix things?
A
  • Y’ = sqrt( Y + 1/2)
    • sometimes Y + 0 or Y + 1)
  • Used for count data
  • Effect similar to log; makes SD similar for comparisons where the larger mean is also the the higher SD
    • If effects same as log; use either or
17
Q

Square tranformation

  • Transform?
  • When to use?
A
  • Y’ = Y^2
  • When frequency distribution is skewed left
    • Only usable if all Y have same sign
    • If all negative, try multiplying all by -1 first
18
Q

Antilog transform

  • Transform?
  • When to use?
A
  • Y’ = e ^Y
  • Use when square transform doesn’t work on left-skewed data
19
Q

Reciprocal tranform

  • Transform?
  • When to use?
A
  • Y’ = 1/Y
  • When data is skewed right
    • Only usable if all Y have same sign
    • If all negative, try multiplying all by -1 first
20
Q

Calculating CI with transform?

A

Use the transformed values, then once range is calculated, it is best to convert BACK into original scale by back-transform (i.e. invert the transformation).

See pg. 382

21
Q

Valid transforms . . .

A
  • Require same transform applied to each individual
  • Have 1-to-1 correspondence to OG values
  • Have monotonic relationship w/ OG values (large values stay larger)
22
Q

Nonparametric methods - def’n

A

Nonparametric methods make fewer assumptions than standard parametric methods do about the distribution of variables

  • Achieves this by ranking the data
  • AKA “distribution-free” methods”
23
Q

Ranking Data points

A
  • Rank in smallest to largest.
  • Ties are resolved by averaging what the ranks would be if it was sequential, then assigning the next rank up to the next largest individual
    • ex. 5 (Rank: 1), 6, 6, 8
    • 6’s would be 2 and 3, so average 2 + 3, = 2.5 (“midrank”)
    • So the rank of 6 is 2.5 and 2.5, and 8 is the next rank after the highest one (3), i.e. 8 is ranked 4

See pg. 391

24
Q

Sign-test

  • What is it?
  • What is its parametric equivalent(s)?
A
  • Compares the median of a sample to a constant specified in the null hypothesis. Makes NO assumptions about distribution of measurement in population
  • Equivalents are one-sample or paired t-tests
25
Q

Problems with sign-test?

A
  • Low power compared to t-test
    • likely to NOT be able to reject a null
    • Impossible to reject null if n is less than/equal to 5
  • Requires large sample sizes
  • Requires omission of data that exactly equal the hypothesized null median
26
Q

Mann-Whitney U-test

  • What does it do?
  • Parametric equivalent?
A
  • Compares distributions of 2 groups. Does not require as many assumptions of 2 sample t-test
    • if distributions same, will test centeral tendencies using ranks (medians, means)
  • Replaces two-sample t-test
27
Q

Assumptions of the Mann-Whitney U-test?

A
  • Both samples are random samples
  • Both populations have the same shape of distribution
28
Q

Effect of assumptions on Type 1 Error rates?

A
  • If assumptions of a given test (parameteric or nonparametric) are met, then prob[Type 1 Error] = a
    • when not met, type 1 error becomes larger than a
    • this is why parametric tests are not used when assumptions are violated
29
Q

Effect of assumptions on Type 2 Error rates?

A
  • Parametric tests use ranks, thus uses less info
    • less info means less power, less power means lower probability of rejecting false null hypotheses (i.e. increase Type 2 error)
    • Reduced power of nonparametric tests is irrelevant when parametric test assumptions are violated
30
Q

Relative powers of Mann-Whitney U-test and sign test

A

At best (i.e. with large samples),

  • Mann-Whitney is about 95% as powerful as two-sample t-test
  • sign test has about 64% power of t-test (much lower power)
    • thus is a last resort method

Power decreases with smaller sample sizes.

31
Q

Permutation test - def’n

A

Generates a null distribution for the association between two variables (“measures of association”) by repeatedly and randomly rearranging the values of one of the two variable in the data.

*Randomization is without replacement

Sometimes called “randomization test”

32
Q

When can permutation tests be used?

A

Can be done for ANY test of association between two variables

(categorical and numerical, numerical x2, categorical x2)

33
Q

Assumptions of the Permutation Test?

Power of the Permutation Test?

A
  • Must be a random sample
  • For permutation tests comparing means and medians, the distribution must have the same shape in every population
    • Robust to violations in this when sample sizes are large, more than Mann-Whitney
  • with small sample size, has less power than parametric tests (but still more powerful than Mann-Whitney)
  • Similar power to parametric when n is large