L6 to L10 - Data Analysis Flashcards

1
Q

The general procedure for hypothesis testing

A
  1. Define Problem
    - Number of groups, independence, type of data, tail test
  2. State Hyp
  3. Compute Test Stats
  4. Find p-value
  5. Compare p-value with a
  6. State Conclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

State the assumptions of Parametric Tests

A
  1. Samples are drawn from NORMALLY distributed ppn

2. All samples have equal variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A crossover study compared the LDL cholesterol levels after a 2-week diet with oat bran cereal and corn flakes respectively, in 14 individuals with hypercholesterolemia. At a significance level of 0.05, is there a difference in LDL levels between the 2 diets?

Assuming data is normal, state the type of study to be carried out, the Hypothesis and the assumptions

A
  1. Study: Paired-samples t-test
  2. Hypothesis:
    - H0: µd = 0, MEAN difference btwn LDL after oat and corn = 0
    - H1: µd ≠ 0
  3. Assumptions
    - Samples are drawn from NORMALLY distributed ppn
    - All samples have equal variances
    - *µd is normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For any tests involving all continuous data, what must be done before selecting the appropriate test?

A

Normality Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the problem and state the test(s) to carry out, assuming data is continuous

Two different medications, tablets A and B were administered to two groups of patients to compare their ability to increase INR after one day. At a significance level of 0.05, is there a difference in the increase of INR in one day between the two tabs?

A

Define the problem:

  1. How many and which samples being compared: Tablets A and B
  2. INDEPENDENT samples
  3. Outcome of interest: Increase in INR
  4. Type of data to be analysed: Continuous data
  5. Tail: Two-tailed

Tests:

  • Independent Samples t-test
  • Normality test
  • Variance test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Advantage of Levene’s test over F-test for equality of variance

A
  1. F-test: Assumption that ppn from which samples are obtained must be NORMAL must be met
  2. F-test: Compare variance of only 2 groups

Levene’s test: can use regardless of those (more than two groups, ppn no need normal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hypotheses for F-test of equality of variance

A

H0: Variance are EQUAL
H1: Variance are UNEQUAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A clinical trial was conducted in 2 groups of study participants to compare tx A and B onset to relieving headache. Assume that data normally distributed. At α = 0.05, p = 0.02.

Tx A: Mean onset 34 min, SD 3.75, sample size 15
Tx B: Mean 25.8, SD 7.43, Sample 10

Variance test: p < 0.01

State the likely stats test used in this study, and formulate a conclusion for the study

A

Test: Independent-samples t-test with UNEQUAL VARIANCE

Conclusion: At a significance level of 0.05, there is a statsig difference in the MEAN onset to relieving headache between tx A (34.0 +/- 3.75 mins) and tx B (25.8 +/- 7.43 mins) (p = 0.02).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the other tests to be carried out for INdependent-samples t-test

A
  1. Equality of variance test

2. Normality test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

One way ANOVA (OWA) is for what kind of comparison? State its Hypotheses

A

Comparing data between >2 independent groups and analyses variance

H0: Ppn means corresponding to random samples are equal
H1: Not equal (at least one is not equal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For >2 independent groups, why should multiple independent t-test not be carried out?

A

Multiplicative rule of (1-a) causes increased probability of type I error and decreases probability of making correct conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions of OWA

A
  1. Samples are random samples of ppn
  2. Ppns are independent
  3. Underlying ppn is normally distributed
  4. Ppn has equal variances*
  • If non-equal variance, use Welch-Anova
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For F test of ANOVA, what is the equation?

A

F = Sb/Sw

Also, Mean squares (variance) = Sum of squares/df

Sb = BETWEEN GROUP variance
Sw = WITHIN GROUP variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Briefly describe the principles of one-way anova

A

Analyses within group variation (individual eans vs ppn means) and Btwn group variations (underlying ppn means VS overall means)

  • If Larger btwn grp variation, it can be implied that underlying population means are different
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A study compared the pulmonary function (forced expiratory volume in one second, FEV1) for patients with coronary artery disease from 3 different hospitals. At a significance level of 0.05, is there a difference in FEV1 among these patients?

Define the Problem. State all statistical tests you will carry out to arrive at the final stats test to use to analyse the data

A
  1. How many samples: 3
  2. Outcome of Interest: FEV1
  3. Data to be analysed: Continuous data

Other tests:

  • Normality Test
  • Equality of variance using LEVENE’S test (F test not for >2 samples)

If Data is normal, and variance is equal, use OWA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A study compared the increase in breathing rate of three groups of 30 participants in three different types of exercises (Running, weightlifting, Skipping).

Normality test shows that p > 0.05 for all three samples.

Equality of variance via levene’s test shows that p < 0.05 for running and skipping, but p > 0.05 for weightlifting

State:

  • The normality test that is used
  • Statistical test to be used to analyse the data
A
  • Normality test: Shapiro-Wilk for all three groups (n < 50)
  • Stats test to be used:
    – Breathing rate: Continuous Data
    – All data are normally distributed
    – Not all variance are equal
    Hence use WELCH-ANOVA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A study compared the increase in breathing rate of three groups of 30 participants in three different types of exercises (Running, weightlifting, Skipping). At a significance level of 0.05, the p-value was found to be 0.0422.

Assuming OWA was used to analyse data, state the conclusion of this study

A

At a significance level of 0.05, not all MEAN increase in breathing rate for the three groups of participants are the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Purpose(s) of Post-hoc tests in OWA

A

Identify the groups with differences, while controlling the overall probability of making type I error on predetermined alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Main difference between post-hoc tests

A

How conservative they are

  • More conservative = reduce stats power (greater chance for type II error) but controls type I error better
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

List and rank the post-hoc tests in terms of conservativeness

A
  1. Bonferroni Adjustment
  2. Least sig. difference (LSD) test
  3. Turkey’s test
  4. Scheefe’s procedure
  5. Dunnett’s test

Conservativeness
4 > 1 > 3 > 2
5: Special case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Adjustment of significance level for Bonferroni Adjustment. What is its advantage and disadvantage?

A

a/m, where m = no. of pairwise comparison to be carried out

  • Good: Widely applicable to any stats test
  • Bad: Very conservative, thus stats power much reduced
22
Q

State the purpose of Dunnett’s test and its advantage

A

Used when comparing one group to each other, but not comparing others to each other (i.e. control vs others only)

  • Most powerful if doing so
23
Q

How does LSD adjust the significance level? State the advantages and disadvantages of LSD

A

LSD: Does not control overall significance level (not conservative)

  • Good: Greater stats power (less chance to miss real diff)
  • Bad: Greater chance for false positive (type I error)
24
Q

Advantage of Scheffe’s procedure

A

Very flexible: can be applied for more complicated comparisons for large number of group

25
Q

A study compared the pulmonary function (forced
expiratory volume in one second, FEV1, units L) for patients with coronary artery disease from 3 different hospitals. At a significance level of 0.05, there is a difference between groups.

After post-hoc tests using bonferroni’s adjustment, it was found that

  • 1 VS 2: sig. = 0.042
  • 2 VS 3: sig. = 0.0823
  • 1 VS 3: sig. = 0.123
  • Hospital 1: Mean 2.6262, SD 0.49617
  • Hospital 2: Mean 3.0325, SD 0.52324
  • Hospital 3: Mean 2.8227, SD 0.43626

Formulate a conclusion from the information given

A

After post-hoc test:

There is a statsig difference between the MEAN FEV1 for 1-2, but no statsig difference btwn the MEAN FEV1 for 1-3, 2-3

The MEAN FEV1 for hospital 1 (2.63 +/- 0.50) is statistically significantly lower than that of hospital 2 (3.03 +/- 0.523) (p = 0.042)

26
Q

Purpose of repeated measures ANOVA (RMA)

A

For studies that analyses changes in a measure on the SAME GROUP of subjects over DIFFERENT CONDITIONS.

Same hypothesis and test stats as OWA

(e.g. over time, over distance, over age, etc.)

(it’s like extension of paired t-test)

27
Q

When should Non-parametric tests be used?

A

When Assumptions of parametric tests are not met i.e.

  1. Samples drawn from non-normal distributed populations, underlying distributions of samples are NOT NORMAL
  2. Variances between samples are NOT the same
28
Q

Advantages of NPT

A
  1. Not as restrictive than PT
  2. Use ranks instead of actual values hence:
    - Less sensitive to measurement error and outlying values
    - Suitable for ordinal data
    - Perform quickly
29
Q

Disadvantages of NPT

A
  1. If PT assumptions met, NPT is less powerful

E.g. PT need 19 for particular power, NPT require 20 (PT = 0.95 NPT)

30
Q

A study compared the age of 10 and 13 individuals attending Chinese calligraphy class with those attending Taiji (or Tai Chi) class. None of the individuals attended both classes. At a significance level of 0.05, is there a difference in the age between the 2 classes?

Given the following normality tests results:

  1. Calligraphy
    - KST: p = 0.056
    - SWT: p = 0.009
  2. Taiji
    - KST: p = 0.200
    - SWT: p = 0.129

State the test to be used by defining the problem. Also, state the hypotheses

A

Define Problem:

  1. 2 Independent samples
  2. Outcome: Age
    - Normality test using SWT: Calligraph p < 0.05, hence non-normal distribution
  3. Two-tail

Stats test: Wilcoxon rank sum test (non normal, two independent groups)

Hypotheses:
H0: No difference in MEDIAN age
H1: Difference in MEDIAN age

31
Q

For WRST, what is the minimum sample size? If sample size is below the minimum, what to do?

A
  • Min. sample size: 10

- Too small: Use distribution tables for small samples OR use exact significance from SPSS

32
Q

General assumptions of NPTs

A
  • Samples are random samples of ppn

- Underlying ppn independent

33
Q

A preclinical study compared the rate of sedation among rats, of which each rat was administered a known sedative (control), or a high dose or low dose of an experimental compound. If a rat did not fall asleep within 10 min of the drug injection, the time to sleep was arbitrarily assigned a value of 15 min. At a significance level of 0.05, is there a difference in the rate of sedation among the 3 groups?

Given that not all time to sleep are normally distributed, define the problem and state the stats tests you will use to analyse the problem. State the hypotheses too

A

Define the problem:

  1. Three independent samples
  2. Outcome: Sleep time
    - Continuous, non-normal
  3. Tail: two tail

Test to use: Kruskal-Wallis test (KWT) ( > 2 independent group, data non-normal)

Hypotheses:
H0: Median time to sleep all same
H1: Not all Median same OR Median times to sleep for at least two of three ppn are different

34
Q

Post-hoc tests applies for which NPT?

A

KWT

  • Use Bonferroni adjustment and perform WRST
  • Everything else is similar to OWA
35
Q

30 participants were asked to rate the sweetness of two syrups (A & B) using 5-point likert scale (1 = Not sweet at all, 5 = extremely sweet). At a significance level of 0.05, is there a difference in the rating of the sweetness between the two syrups?

Define the problem and suggest a stats test to analyse the data. State the Hypotheses as well

A

Define problem

  1. Two paired samples (individuals tasted both)
  2. Outcome: Sweetness, ordinal data (likert scale)
  3. Tails: 2-tailed

Stats test: Wilcoxon signed-rank test (WSRT) (for paired samples when PT cannot use)

Hypotheses:
H0: Difference in MEDIAN rating of sweetness = 0
H1: MEDIAN difference in sweetness ≠ 0

36
Q

Data presentation for NPT

A

Median (IQR)

E.g. 4.00 (2.25 - 6.00)

(note: IQR is a range, and not like sd where you +/-)

37
Q

Assumptions for Chisq tests and Fisher’s Exact test (FET). When to use FET?

A
  1. Samples are random samples of their ppn
  2. All observations are independent (i.e. one cell per subject)
  3. For 2x2 table: Expected count must be ≥5
  4. For large contingency table:
    - Expected count ≥1
    - No more than 20% of cells be <5

Use FET if 3 & 4 are not met

38
Q

Given the frequency of nominal data for two independent groups fshowing OBSERVED count (in a 2x2 table):

a = 30
b = 98
c = 40
d = 172

State the stats test to be used to analyse this data. Explain your answer

A

Chisq test, since:

  • Nominal Data
  • For 2x2 table, EXPECTED count for each cell must be ≥5
39
Q

A study investigated whether the amount of monosodium glutamate (MSG) in a meal was associated with the occurrence of headache among 240 healthy volunteers. The results are as follows:

a = 52, b = 28, c = 40, d = 40, e = 26, f = 54

State the stats test to be used for this study, and state the Hypotheses to be tested

A

Stats Test: Chisq
Reason: For large contingency table,
- Expected count ≥1
- No more than 20% of cells be <5

Hypotheses:

  • H0: No assoc btwn amount of MSG in a meal and occurance of headache
  • H1: Got assoc

OR

H0: All the PROPORTIONS of individuals having headache among these who took meal with high, med and low amounts of MSG respectively are the same.

40
Q

Given the following contingency table for nominal data for OBSERVED count:

a = 11, b = 1, c = 10, d =4

State the stats test to be used and explain your choice

A

FET

  • Nominal data
  • Assumption NOT MET: expected count ≥5 for 2x2 table
41
Q

Unit of analysis for McNemar’s test

A

MATCHED PAIRS

42
Q

Distinguish between concordant and discordant pairs in McNemar’s test

A
  • Concordant pairs: Matched pairs with SAME outcome for each intervention (E.g. +ve, +ve). Gives NO information about differences and is excluded in analysis
  • Discordant pairs: not concordant, and used in analysis
43
Q

Distinguish between the contingency table for MNT and FET/Chisq

A

Table headers for MNT:

  • The intervention, along with both possible outcomes
  • This ensures that pairing is taken into account
44
Q

The three ways to formulate hypotheses in MNT

A
  1. Association
  2. Proportions
  3. Number of pairs (unique to MNT)
45
Q

Unique assumption of MNT

A

Each observation in first sample has CORRESPONDING OBSERVATION in second sample

46
Q

Why is chisq not suitable for paired samples?

A

Assumption violated: Observations are NO LONGER INDEPENDENT since each subject exposed to both interventions

47
Q

A study was conducted to compare two versions of an allergy test, Test A and Test B, applied to 100 persons. Each person was subject to both versions of the allergy test. The test reagents were applied at the same time at different sites for each person, and either a positive or negative reaction was observed and recorded for each test. 64 persons had positive reaction to Test A, while 58 persons had positive reaction to Test B. 24 persons had negative reaction to both tests. At a significance level of 0.05, is there a difference between the proportion of positive reactions for Test A and that for Test B?

State the likely statistics test used. Given p = 0.362, formulate a conclusion for this test

A

Test: MNT (2 paired samples, nominal data)

Conclusion: There is no association between the test used and the reaction observed (p = .362)
OR
There is no sig diff between the PROPORTION of persons with positive reaction to A (64%) and that for B (58%) (p = 0.362)

48
Q

What type of tests are Chisq, FET and MNT?

A

They are NPTs as well, but for nominal data

49
Q

Data presentation of Chisq/FET/MNT

A

n(%)

i.e. frequency (proportion)

50
Q

To analyse nominal data for more than 2 independent samples, what stats test can be used?

A

Chisq
OR
Fisher-freeman-halton test

51
Q

A preclinical study was conducted to investigate the
effect of a 4-week treatment with Compound X on the survival of 20 rats bearing chemically-induced tumours.
Among the 12 rats that were treated with Compound X, 11 survived while 1 died. Among the 8 rats that were not treated with Compound X, 5 survived while 3 died. At a
significance level of 0.05, is there an association between treatment with Compound X and survival?

  • State the stats test to be used
A

FET

  1. Data type: nominal
  2. Two independent samples
  3. Contingency table
    - Construct contingency table
    - Find expected counts for each cell
    - Assumption for Chisq violated (some cells <5)
    - Hence use FET