1. Samples are random samples of ppn 2. Ppns are independent 3. Underlying ppn is normally distributed 4. Ppn has equal variances - If non-equal variance, use Welch-Anova

L6 to L10 - Data Analysis Flashcards by Zhen Heng Lim

The general procedure for hypothesis testing

Define Problem
- Number of groups, independence, type of data, tail test
State Hyp
Compute Test Stats
Find p-value
Compare p-value with a
State Conclusion

How well did you know this?

Not at all

Perfectly

State the assumptions of Parametric Tests

Samples are drawn from NORMALLY distributed ppn

2. All samples have equal variances

How well did you know this?

Not at all

Perfectly

A crossover study compared the LDL cholesterol levels after a 2-week diet with oat bran cereal and corn flakes respectively, in 14 individuals with hypercholesterolemia. At a significance level of 0.05, is there a difference in LDL levels between the 2 diets?

Assuming data is normal, state the type of study to be carried out, the Hypothesis and the assumptions

Study: Paired-samples t-test
Hypothesis:
- H0: µd = 0, MEAN difference btwn LDL after oat and corn = 0
- H1: µd ≠ 0
Assumptions
- Samples are drawn from NORMALLY distributed ppn
- All samples have equal variances
- *µd is normally distributed

How well did you know this?

Not at all

Perfectly

For any tests involving all continuous data, what must be done before selecting the appropriate test?

Normality Test

How well did you know this?

Not at all

Perfectly

Define the problem and state the test(s) to carry out, assuming data is continuous

Two different medications, tablets A and B were administered to two groups of patients to compare their ability to increase INR after one day. At a significance level of 0.05, is there a difference in the increase of INR in one day between the two tabs?

Define the problem:

How many and which samples being compared: Tablets A and B
INDEPENDENT samples
Outcome of interest: Increase in INR
Type of data to be analysed: Continuous data
Tail: Two-tailed

Tests:

Independent Samples t-test
Normality test
Variance test

How well did you know this?

Not at all

Perfectly

Advantage of Levene’s test over F-test for equality of variance

F-test: Assumption that ppn from which samples are obtained must be NORMAL must be met
F-test: Compare variance of only 2 groups

Levene’s test: can use regardless of those (more than two groups, ppn no need normal)

How well did you know this?

Not at all

Perfectly

Hypotheses for F-test of equality of variance

H0: Variance are EQUAL
H1: Variance are UNEQUAL

How well did you know this?

Not at all

Perfectly

A clinical trial was conducted in 2 groups of study participants to compare tx A and B onset to relieving headache. Assume that data normally distributed. At α = 0.05, p = 0.02.

Tx A: Mean onset 34 min, SD 3.75, sample size 15
Tx B: Mean 25.8, SD 7.43, Sample 10

Variance test: p < 0.01

State the likely stats test used in this study, and formulate a conclusion for the study

Test: Independent-samples t-test with UNEQUAL VARIANCE

Conclusion: At a significance level of 0.05, there is a statsig difference in the MEAN onset to relieving headache between tx A (34.0 +/- 3.75 mins) and tx B (25.8 +/- 7.43 mins) (p = 0.02).

How well did you know this?

Not at all

Perfectly

What are the other tests to be carried out for INdependent-samples t-test

Equality of variance test

2. Normality test

How well did you know this?

Not at all

Perfectly

One way ANOVA (OWA) is for what kind of comparison? State its Hypotheses

Comparing data between >2 independent groups and analyses variance

H0: Ppn means corresponding to random samples are equal
H1: Not equal (at least one is not equal)

How well did you know this?

Not at all

Perfectly

For >2 independent groups, why should multiple independent t-test not be carried out?

Multiplicative rule of (1-a) causes increased probability of type I error and decreases probability of making correct conclusion

How well did you know this?

Not at all

Perfectly

Assumptions of OWA

Samples are random samples of ppn
Ppns are independent
Underlying ppn is normally distributed
Ppn has equal variances*

If non-equal variance, use Welch-Anova

How well did you know this?

Not at all

Perfectly

For F test of ANOVA, what is the equation?

F = Sb/Sw

Also, Mean squares (variance) = Sum of squares/df

Sb = BETWEEN GROUP variance
Sw = WITHIN GROUP variance

How well did you know this?

Not at all

Perfectly

Briefly describe the principles of one-way anova

Analyses within group variation (individual eans vs ppn means) and Btwn group variations (underlying ppn means VS overall means)

If Larger btwn grp variation, it can be implied that underlying population means are different

How well did you know this?

Not at all

Perfectly

A study compared the pulmonary function (forced expiratory volume in one second, FEV1) for patients with coronary artery disease from 3 different hospitals. At a significance level of 0.05, is there a difference in FEV1 among these patients?

Define the Problem. State all statistical tests you will carry out to arrive at the final stats test to use to analyse the data

How many samples: 3
Outcome of Interest: FEV1
Data to be analysed: Continuous data

Other tests:

Normality Test
Equality of variance using LEVENE’S test (F test not for >2 samples)

If Data is normal, and variance is equal, use OWA

How well did you know this?

Not at all

Perfectly

A study compared the increase in breathing rate of three groups of 30 participants in three different types of exercises (Running, weightlifting, Skipping).

Normality test shows that p > 0.05 for all three samples.

Equality of variance via levene’s test shows that p < 0.05 for running and skipping, but p > 0.05 for weightlifting

State:

The normality test that is used
Statistical test to be used to analyse the data

Normality test: Shapiro-Wilk for all three groups (n < 50)
Stats test to be used:
– Breathing rate: Continuous Data
– All data are normally distributed
– Not all variance are equal
Hence use WELCH-ANOVA

How well did you know this?

Not at all

Perfectly

A study compared the increase in breathing rate of three groups of 30 participants in three different types of exercises (Running, weightlifting, Skipping). At a significance level of 0.05, the p-value was found to be 0.0422.

Assuming OWA was used to analyse data, state the conclusion of this study

At a significance level of 0.05, not all MEAN increase in breathing rate for the three groups of participants are the same.

How well did you know this?

Not at all

Perfectly

Purpose(s) of Post-hoc tests in OWA

Identify the groups with differences, while controlling the overall probability of making type I error on predetermined alpha

How well did you know this?

Not at all

Perfectly

Main difference between post-hoc tests

How conservative they are

More conservative = reduce stats power (greater chance for type II error) but controls type I error better

How well did you know this?

Not at all

Perfectly

List and rank the post-hoc tests in terms of conservativeness

Bonferroni Adjustment
Least sig. difference (LSD) test
Turkey’s test
Scheefe’s procedure
Dunnett’s test

Conservativeness
4 > 1 > 3 > 2
5: Special case

How well did you know this?

Not at all

Perfectly

Adjustment of significance level for Bonferroni Adjustment. What is its advantage and disadvantage?

Study These Flashcards

a/m, where m = no. of pairwise comparison to be carried out

Good: Widely applicable to any stats test
Bad: Very conservative, thus stats power much reduced

State the purpose of Dunnett’s test and its advantage

Study These Flashcards

Used when comparing one group to each other, but not comparing others to each other (i.e. control vs others only)

Most powerful if doing so

How does LSD adjust the significance level? State the advantages and disadvantages of LSD

Study These Flashcards

LSD: Does not control overall significance level (not conservative)

Good: Greater stats power (less chance to miss real diff)
Bad: Greater chance for false positive (type I error)

Advantage of Scheffe’s procedure

Study These Flashcards

Very flexible: can be applied for more complicated comparisons for large number of group

A study compared the pulmonary function (forced expiratory volume in one second, FEV1, units L) for patients with coronary artery disease from 3 different hospitals. At a significance level of 0.05, there is a difference between groups. After post-hoc tests using bonferroni's adjustment, it was found that - 1 VS 2: sig. = 0.042 - 2 VS 3: sig. = 0.0823 - 1 VS 3: sig. = 0.123 - Hospital 1: Mean 2.6262, SD 0.49617 - Hospital 2: Mean 3.0325, SD 0.52324 - Hospital 3: Mean 2.8227, SD 0.43626 Formulate a conclusion from the information given

After post-hoc test: There is a statsig difference between the MEAN FEV1 for 1-2, but no statsig difference btwn the MEAN FEV1 for 1-3, 2-3 The MEAN FEV1 for hospital 1 (2.63 +/- 0.50) is statistically significantly lower than that of hospital 2 (3.03 +/- 0.523) (p = 0.042)

Purpose of repeated measures ANOVA (RMA)

For studies that analyses changes in a measure on the SAME GROUP of subjects over DIFFERENT CONDITIONS. Same hypothesis and test stats as OWA (e.g. over time, over distance, over age, etc.) (it's like extension of paired t-test)

When should Non-parametric tests be used?

When Assumptions of parametric tests are not met i.e. 1. Samples drawn from non-normal distributed populations, underlying distributions of samples are NOT NORMAL 2. Variances between samples are NOT the same

Advantages of NPT

1. Not as restrictive than PT 2. Use ranks instead of actual values hence: - Less sensitive to measurement error and outlying values - Suitable for ordinal data - Perform quickly

Disadvantages of NPT

1. If PT assumptions met, NPT is less powerful E.g. PT need 19 for particular power, NPT require 20 (PT = 0.95 NPT)

A study compared the age of 10 and 13 individuals attending Chinese calligraphy class with those attending Taiji (or Tai Chi) class. None of the individuals attended both classes. At a significance level of 0.05, is there a difference in the age between the 2 classes? Given the following normality tests results: 1. Calligraphy - KST: p = 0.056 - SWT: p = 0.009 2. Taiji - KST: p = 0.200 - SWT: p = 0.129 State the test to be used by defining the problem. Also, state the hypotheses

# Define Problem: 1. 2 Independent samples 2. Outcome: Age - Normality test using SWT: Calligraph p < 0.05, hence non-normal distribution 3. Two-tail Stats test: Wilcoxon rank sum test (non normal, two independent groups) Hypotheses: H0: No difference in MEDIAN age H1: Difference in MEDIAN age

For WRST, what is the minimum sample size? If sample size is below the minimum, what to do?

- Min. sample size: 10 | - Too small: Use distribution tables for small samples OR use exact significance from SPSS

General assumptions of NPTs

- Samples are random samples of ppn | - Underlying ppn independent

A preclinical study compared the rate of sedation among rats, of which each rat was administered a known sedative (control), or a high dose or low dose of an experimental compound. If a rat did not fall asleep within 10 min of the drug injection, the time to sleep was arbitrarily assigned a value of 15 min. At a significance level of 0.05, is there a difference in the rate of sedation among the 3 groups? Given that not all time to sleep are normally distributed, define the problem and state the stats tests you will use to analyse the problem. State the hypotheses too

# Define the problem: 1. Three independent samples 2. Outcome: Sleep time - Continuous, non-normal 3. Tail: two tail Test to use: Kruskal-Wallis test (KWT) ( > 2 independent group, data non-normal) Hypotheses: H0: Median time to sleep all same H1: Not all Median same OR Median times to sleep for at least two of three ppn are different

Post-hoc tests applies for which NPT?

KWT - Use Bonferroni adjustment and perform WRST - Everything else is similar to OWA

30 participants were asked to rate the sweetness of two syrups (A & B) using 5-point likert scale (1 = Not sweet at all, 5 = extremely sweet). At a significance level of 0.05, is there a difference in the rating of the sweetness between the two syrups? Define the problem and suggest a stats test to analyse the data. State the Hypotheses as well

# Define problem 1. Two paired samples (individuals tasted both) 2. Outcome: Sweetness, ordinal data (likert scale) 3. Tails: 2-tailed Stats test: Wilcoxon signed-rank test (WSRT) (for paired samples when PT cannot use) Hypotheses: H0: Difference in MEDIAN rating of sweetness = 0 H1: MEDIAN difference in sweetness ≠ 0

Data presentation for NPT

Median (IQR) E.g. 4.00 (2.25 - 6.00) (note: IQR is a range, and not like sd where you +/-)

Assumptions for Chisq tests and Fisher's Exact test (FET). When to use FET?

1. Samples are random samples of their ppn 2. All observations are independent (i.e. one cell per subject) 3. For 2x2 table: Expected count must be ≥5 4. For large contingency table: - Expected count ≥1 - No more than 20% of cells be <5 Use FET if 3 & 4 are not met

Given the frequency of nominal data for two independent groups fshowing OBSERVED count (in a 2x2 table): ``` a = 30 b = 98 c = 40 d = 172 ``` State the stats test to be used to analyse this data. Explain your answer

Chisq test, since: - Nominal Data - For 2x2 table, EXPECTED count for each cell must be ≥5

A study investigated whether the amount of monosodium glutamate (MSG) in a meal was associated with the occurrence of headache among 240 healthy volunteers. The results are as follows: a = 52, b = 28, c = 40, d = 40, e = 26, f = 54 State the stats test to be used for this study, and state the Hypotheses to be tested

Stats Test: Chisq Reason: For large contingency table, - Expected count ≥1 - No more than 20% of cells be <5 Hypotheses: - H0: No assoc btwn amount of MSG in a meal and occurance of headache - H1: Got assoc OR H0: All the PROPORTIONS of individuals having headache among these who took meal with high, med and low amounts of MSG respectively are the same.

Given the following contingency table for nominal data for OBSERVED count: a = 11, b = 1, c = 10, d =4 State the stats test to be used and explain your choice

FET - Nominal data - Assumption NOT MET: expected count ≥5 for 2x2 table

Unit of analysis for McNemar's test

MATCHED PAIRS

Distinguish between concordant and discordant pairs in McNemar's test

- Concordant pairs: Matched pairs with SAME outcome for each intervention (E.g. +ve, +ve). Gives NO information about differences and is excluded in analysis - Discordant pairs: not concordant, and used in analysis

Distinguish between the contingency table for MNT and FET/Chisq

Table headers for MNT: - The intervention, along with both possible outcomes - This ensures that pairing is taken into account

The three ways to formulate hypotheses in MNT

1. Association 2. Proportions 3. Number of pairs (unique to MNT)

Unique assumption of MNT

Each observation in first sample has CORRESPONDING OBSERVATION in second sample

Why is chisq not suitable for paired samples?

Assumption violated: Observations are NO LONGER INDEPENDENT since each subject exposed to both interventions

A study was conducted to compare two versions of an allergy test, Test A and Test B, applied to 100 persons. Each person was subject to both versions of the allergy test. The test reagents were applied at the same time at different sites for each person, and either a positive or negative reaction was observed and recorded for each test. 64 persons had positive reaction to Test A, while 58 persons had positive reaction to Test B. 24 persons had negative reaction to both tests. At a significance level of 0.05, is there a difference between the proportion of positive reactions for Test A and that for Test B? State the likely statistics test used. Given p = 0.362, formulate a conclusion for this test

Test: MNT (2 paired samples, nominal data) Conclusion: There is no association between the test used and the reaction observed (p = .362) OR There is no sig diff between the PROPORTION of persons with positive reaction to A (64%) and that for B (58%) (p = 0.362)

What type of tests are Chisq, FET and MNT?

They are NPTs as well, but for nominal data

Data presentation of Chisq/FET/MNT

n(%) i.e. frequency (proportion)

To analyse nominal data for more than 2 independent samples, what stats test can be used?

Chisq OR Fisher-freeman-halton test

A preclinical study was conducted to investigate the effect of a 4-week treatment with Compound X on the survival of 20 rats bearing chemically-induced tumours. Among the 12 rats that were treated with Compound X, 11 survived while 1 died. Among the 8 rats that were not treated with Compound X, 5 survived while 3 died. At a significance level of 0.05, is there an association between treatment with Compound X and survival? - State the stats test to be used

FET 1. Data type: nominal 2. Two independent samples 3. Contingency table - Construct contingency table - Find expected counts for each cell - Assumption for Chisq violated (some cells <5) - Hence use FET

L6 to L10 - Data Analysis Flashcards

(51 cards)