Normal Approximation - Ignoring violations For what types of tests? Threshold of "ignorance"?

For tests that use the mean (robustness due to the central limit theorem) Ignorance threshold rule of thumb is n \> 50(ish) Also need to consider the shape of the distributions; skew needs to be similar, and no outliers. See pg. 376

Arcsine tranformation What does it look like? What is it needed for? How does it fix things?

p' = arcsin[sqrt( p )] Used for proportions Makes it closer to a normal distribution and also makes SD's similar Note: convert percentages into decimal proportions first b4 application

Ch. 13 Flashcards by Debon Lee

What to do when the assumptions are not true?

Ignore the violations
Transform the data
Use nonparametric method
Use permutation test

How well did you know this?

Not at all

Perfectly

When is data not likely to be normal, graphically?

(pg. 371)
When distribution is strongly skewed or strongly bimodal, or has outliers

How well did you know this?

Not at all

Perfectly

Normal Quantile Plot

What is it?

Compares each observation in the sample w/ its quantile expected from the standard normal distribution

If it is normally distributed, points should roughly be in a straight line.

How well did you know this?

Not at all

Perfectly

Shapiro-Wilk test

What is it?

Shapiro-Wilk test evaluates goodness of fit of a normal distribution to a set of data randomly sampled from population (null hypothesis is that it is normal)

How well did you know this?

Not at all

Perfectly

Recommended methods of evaluating assumption of normality?

Can do statistical test (ex. Shapiro-Wilk test), but has false sense of security
IDEALLY should use graphical methods & common sense to evaluate (frequency distribution histograms, normal quantile plots)

How well did you know this?

Not at all

Perfectly

Robust

Def’n?

A statistical procedure is robust if the answer it gives is not sensative to violations of the assumptions of the method.

How well did you know this?

Not at all

Perfectly

Normal Approximation - Ignoring violations

For what types of tests?
Threshold of “ignorance”?

For tests that use the mean (robustness due to the central limit theorem)
Ignorance threshold rule of thumb is n > 50(ish)
- Also need to consider the shape of the distributions; skew needs to be similar, and no outliers. See pg. 376

How well did you know this?

Not at all

Perfectly

When can assumptions of equal standard deviations can be ignored?

If n > 30 for each group, and n is similar for both groups (approximately), then SD can be ignored even w/ greater than 3x difference
When:
- n is not approximately equal

How well did you know this?

Not at all

Perfectly

Data transformation def’n

Data transformation changes each measurement by the same mathematical formula

How well did you know this?

Not at all

Perfectly

Purpose of a transform?

To attempt to make SD more similar and to improve the fit of the normal distribution to the data.

NOTE: This tranform will affect all the data AND the hypotheses equally; i.e. everything gets tranformed the same way. Also, can’t just do ln[s] for example; have to re-calculate starting with the mean of all the logs.

How well did you know this?

Not at all

Perfectly

Examples of possible transformations?

(indicate top 3)

Log transform
Arcsine transform
Square-root transform
Square transform
Antilog transform
Reciprocal transform

How well did you know this?

Not at all

Perfectly

Log tranform - what does it do?

Converts each data point to its logarithm

Ex. Y = ln[X]

How well did you know this?

Not at all

Perfectly

What to do if you want to try log transform but data has zero?

Try Y’ = ln[Y + 1]

How well did you know this?

Not at all

Perfectly

Log transform - When is it useful?

When measurements are ratios or products of variables
When frequency distribution is right skewed
When group that has the larger mean (when comparing 2 groups) also has the larger SD
When data spans several orders of magnitude

See pg. 378 for details

How well did you know this?

Not at all

Perfectly

Arcsine tranformation

What does it look like?
What is it needed for?
How does it fix things?

p’ = arcsin[sqrt(p)]
Used for proportions
Makes it closer to a normal distribution and also makes SD’s similar

Note: convert percentages into decimal proportions first b4 application

How well did you know this?

Not at all

Perfectly

Square-root tranformation

What does it look like?
What is it needed for?
How does it fix things?

Study These Flashcards

Y’ = sqrt( Y + 1/2)
- sometimes Y + 0 or Y + 1)
Used for count data
Effect similar to log; makes SD similar for comparisons where the larger mean is also the the higher SD
- If effects same as log; use either or

Square tranformation

Transform?
When to use?

Study These Flashcards

Y’ = Y^2
When frequency distribution is skewed left
- Only usable if all Y have same sign
- If all negative, try multiplying all by -1 first

Antilog transform

Transform?
When to use?

Study These Flashcards

Y’ = e ^Y
Use when square transform doesn’t work on left-skewed data

Reciprocal tranform

Transform?
When to use?

Study These Flashcards

Y’ = 1/Y
When data is skewed right
- Only usable if all Y have same sign
- If all negative, try multiplying all by -1 first

Calculating CI with transform?

Study These Flashcards

Use the transformed values, then once range is calculated, it is best to convert BACK into original scale by back-transform (i.e. invert the transformation).

See pg. 382

Valid transforms . . .

Study These Flashcards

Require same transform applied to each individual
Have 1-to-1 correspondence to OG values
Have monotonic relationship w/ OG values (large values stay larger)

Nonparametric methods - def’n

Study These Flashcards

Nonparametric methods make fewer assumptions than standard parametric methods do about the distribution of variables

Achieves this by ranking the data
AKA “distribution-free” methods”

Ranking Data points

Study These Flashcards

Rank in smallest to largest.
Ties are resolved by averaging what the ranks would be if it was sequential, then assigning the next rank up to the next largest individual
- ex. 5 (Rank: 1), 6, 6, 8
- 6’s would be 2 and 3, so average 2 + 3, = 2.5 (“midrank”)
- So the rank of 6 is 2.5 and 2.5, and 8 is the next rank after the highest one (3), i.e. 8 is ranked 4

See pg. 391

Sign-test

What is it?
What is its parametric equivalent(s)?

Study These Flashcards

Compares the median of a sample to a constant specified in the null hypothesis. Makes NO assumptions about distribution of measurement in population
Equivalents are one-sample or paired t-tests

Problems with sign-test?

* Low power compared to t-test * likely to NOT be able to reject a null * Impossible to reject null if n is less than/equal to 5 * Requires large sample sizes * Requires omission of data that exactly equal the hypothesized null median

Mann-Whitney U-test * What does it do? * Parametric equivalent?

* Compares distributions of 2 groups. Does not require as many assumptions of 2 sample t-test * if distributions same, will test centeral tendencies using ranks (medians, means) * Replaces two-sample t-test

Assumptions of the Mann-Whitney U-test?

* Both samples are random samples * Both populations have the same shape of distribution

Effect of assumptions on Type 1 Error rates?

* If assumptions of a given test (parameteric or nonparametric) are met, then prob[Type 1 Error] = a * when not met, type 1 error becomes larger than a * this is why parametric tests are not used when assumptions are violated

Effect of assumptions on Type 2 Error rates?

* Parametric tests use ranks, thus uses less info * less info means **less power**, less power means **lower probability of rejecting false** null hypotheses (i.e. increase Type 2 error) * Reduced power of nonparametric tests is irrelevant when parametric test assumptions are violated

Relative powers of Mann-Whitney U-test and sign test

At best (i.e. with large samples), * Mann-Whitney is about 95% as powerful as two-sample t-test * sign test has about 64% power of t-test (much lower power) * thus is a last resort method Power decreases with smaller sample sizes.

Permutation test - def'n

Generates a null distribution for the association between two variables ("measures of association") by repeatedly and randomly rearranging the values of one of the two variable in the data. \*Randomization is without replacement Sometimes called "randomization test"

When can permutation tests be used?

Can be done for ANY test of association between two variables (categorical and numerical, numerical x2, categorical x2)

Assumptions of the Permutation Test? Power of the Permutation Test?

* Must be a random sample * For permutation tests comparing means and medians, the distribution **must have the same shape in every population** * Robust to violations in this when sample sizes are large, more than Mann-Whitney * with small sample size, has less power than parametric tests (but still more powerful than Mann-Whitney) * Similar power to parametric when n is large

Ch. 13 Flashcards

(33 cards)