L05 Descriptive & Inferential Statistics Flashcards

1
Q

Differentiate the two branches of statistics.

A

1) Descriptive Statistics:
- Methods for organising and summarising a set of data that help to describe the attributes of a group or population

2) Inferential Statistics:
- Statistical methods used to draw conclusions from a sample & make inferences to the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Differentiate the three types of statistical variables.

A

1) Continuous / Interval Variable:
- With real values that reflect order & relative magnitude
- e.g. age, weight, height

2) Ordinal Variable:
- With categories that are ordered / hierachial
- e.g. cancer stages, pain rating, Likert scale data

3) Nominal / Categorical Variable:
- With categories that are not ordered
- e.g. gender, race, smoking status, blood groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are nominal/categorical variables presented in a study?

A

Numerically, summarised as frequency (n) AND proportion (%) i.e. n (%)

Graphically, can be presented as pie chart, bar chart
- e.g. stacked bar chart, clustered bar chart, segmented bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are ordinal variables presented in a study?

A

Numerically, summarised as frequency (n) AND proportion (%) i.e. n (%)
Graphically, can be presented as pie chart, bar chart
- e.g. stacked bar chart, clustered bar chart, segmented bar chart

OR

Numerically, summarised as median AND interquartile range i.e. median (IQR)
Graphically, can be presented as a box-and-whiskers plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are continuous variables presented in a study?

A

Numerically, summarised as measure of central tendency (mean or median) AND measure of variability (standard deviation SD or IQR)

  • Normal distribution = mean (SD)
  • Non-normal distribution = median (IQR)

Graphically, can be presented as histogram, box-and-whiskers plot
- e.g. stacked bar chart, clustered bar chart, segmented bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Differentiate between ‘outliers’, ‘mild outliers’ & ‘extreme outliers’.

A

Outliers: Values > 1.5 x IQR below Q1 or above Q3
Mild outliers: Values > 1.5 to 3 x IQR below Q1 or above Q3
Extreme outliers: Values > 3 x IQR below Q1 or above Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List the possible types of distributions observed in histograms.

A

Normal
Positively skewed (i.e. tail skewed to right)
Negatively skewed (i.e. tail skewed to left)
Bimodal
Several peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When a box plot is presented in a vertical direction, what should you do to interpret its type of distribution along a horizontal plane?

A

Rotate box plot clockwise by 90 degrees to determine type of distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To ensure that a sample will lead to reliable and valid inferences, all statistical methods are built on the assumption that the individuals included in a sample represent a _____ sample from the underlying population.

A

random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two approaches an investigator can adopt for statistical inference? Briefly explain each approach.

A

1) Parameter estimation
- Seeks an approximate calculation of a population parameter
- e.g. By how much does this new drug reduce BP?
- Described by point estimate and interval estimate

2) Hypothesis testing
- Seeks to validate a supposition based on limited evidence, inferred using a sample from the population
- e.g. Does this new drug reduce blood pressure?
- Described by null hypothesis (H0) & alternative hypothesis (H1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define ‘standard error of the mean’ (SEM).

Explain what is the significance of SEM in inferential statistics.

A

SEM:
Standard deviation of sample means equal to the population standard deviation divided by the square root of the sample size

Significance:

  • Estimate the precision or reliability of a sample, as it relates to the population from which the sample was drawn
  • Used in the calculation of confidence intervals, which contain an estimate of the true mean for an entire population from which the sample was drawn
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which theorem states that the sampling distribution of the mean is approximately normally distributed, for a sufficiently large sample size, even if the underlying distribution of individual observations in the population is not normal?

A

Central Limit Theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the significance of the interval estimate in the parameter estimation approach of inferential statistics?

A

Also known as the confidence interval (CI).
Provides a range of reasonable values that are intended to contain the parameter of interest with a certain degree of confidence.
- e.g. 95% CI: If data collection and analysis could be replicated many times, the CI should include within it the true value of the measure 95% of the time.
- Provides information on the precision of the point estimate i.e. the narrower the 95% CI, the more precise the point estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

List the three factors influencing the width of CI.

A

1) Confidence level (e.g. 90%, 95%, 99%)
- The higher the confidence level, the wider the CI, the less precise the point estimate.

2) Sample size (n)
- The larger the sample size, the smaller the SEM value, the narrower the CI, the more precise the point estimate.

3) Standard deviation (sigma)
- The larger the SD, the wider the CI, the less precise the point estimate.

90% CI: sample mean - 1.645 [SD/root(n)] <= pop. mean <= sample mean + 1.645 [SD/root(n)]
95% CI: sample mean - 1.960 [SD/root(n)] <= pop. mean <= sample mean + 1.960 [SD/root(n)]
99% CI: sample mean - 2.576 [SD/root(n)] <= pop. mean <= sample mean + 2.576 [SD/root(n)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain what p-value means.

A

Probability that the observed results or a more extreme result would happen by chance alone, assuming H0 is true.
- The smaller the p-value, the stronger the evidence against H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define the null and alternative hypotheses, in general.

A

Null hypothesis, H0:
There is NO difference or NO relationship or NO effect.

Alternative hypotheses, H1:
There is a difference or relationship or effect.

17
Q

What is the significance of the level of significance (alpha) in hypothesis testing?

A

When p < alpha, usually equals to 0.05, it means the results show a statistically significant difference at alpha = 0.05.

When p > alpha, it means the results are NOT statistically significant difference at alpha = 0.05.

18
Q

Differentiate between Type I error, Type II error and statistical power.

A

1) Type I error = false positive
- Reject H0 when the truth is indeed no difference/effect exists
- Probability of Type I error = alpha = usually 0.05

2) Type II error = false negative
- Fail to reject H0 when the truth is a difference/effect exists
- Probability of Type II error = beta = usually 0.2

3) Statistical power
- Probability of correctly rejecting a false H0 when the truth is a difference/effect exists = 1- beta = usually 0.8

19
Q

Between CI and p-value, which variable is more informative in hypothesis testing? Explain why.

A

CI provides more information than p-value, as CI provides information on BOTH:

  • Precision of point estimate (e.g. mean difference, odds ratio etc.)
  • Statistical significance

whereas p-value only provides statistical significance.

20
Q

How does one interpret whether the results of a study is statistically significant or not, using 95% CI?

A

If 95% CI of mean difference does NOT include the value of 0:

  • p-value < 0.05 = Statistically significant
  • Otherwise, it is not.

If 95% CI of odds/risk ratios does NOT include the value of 1:

  • p-value < 0.05 = Statistically significant
  • Otherwise, it is not.
21
Q

A statistically significant result obtained from a clinical study means it has clinical significance as well. True or false? Justify why.

A

False!! NOT synonymous with each other!!

Statistical significance is heavily dependent on sample size of study.

  • With large sample size, even small treatment effects (e.g. 2 mmHg decrease in BP) can be clinically inconsequential, despite appearing as statistically significant.
  • With small sample size, even large treatment effects (e.g. 20 mmHg decrease in BP) can be clinically consequential, despite appearing as NOT statistically significant.
  • Hence, do NOT just simply look at whether the results is statistically significant or not in silo! Look at point estimate & 95% CI to interpret whether result is clinically significant or NOT.
22
Q

What are some considerations before deciding on an appropriate statistically test for hypothesis testing, when comparing between/among groups?

A

1) Number of groups being compared

2) Whether groups are independent or paired/related
- Independent: One Tx per person via parallel study
- Paired: e.g. two-Tx, two-period, two-sequence crossover study, administering same drugs simultaneously via different ROA, matched pairs based on relevant characteristics

3) Whether variables are continuous, ordinal or nominal
- For continuous data, whether they are normally distributed or not

4) Assumptions underlying a specific statistical test

23
Q

How does one assess the normality of continuous data using statistical software?

A

For n >= 50: Kolmogorov-Smirnov test of normality
For n < 50: Shapiro-Wilk test of normality

Apply hypothesis testing where
H0: sample size distribution is normal
H1: sample size distribution is NOT normal when p < 0.05

24
Q

For normally-distributed continuous data, which type of statistical test is used for comparison? Name the statistical test used in the respective settings.

A

ALL are parametric tests:
2 independent groups: Independent samples t-test
2 paired groups: Paired samples t-test
> 2 independent groups: One-way ANOVA

25
Q

For non-normally-distributed continuous data, which type of statistical test is used for comparison? Name the statistical test used in the respective settings.

A

ALL are NON-parametric tests:
2 independent groups: Wilcoxon rank-sum test (Mann-Whitney U test)
2 paired groups: Wilcoxon signed-rank test
> 2 independent groups: Kruskal-Wallis test

26
Q

For ordinal data, which type of statistical test is used for comparison? Name the statistical test used in the respective settings.

A

ALL are NON-parametric tests:
2 independent groups: Wilcoxon rank-sum test (Mann-Whitney U test)
2 paired groups: Wilcoxon signed-rank test
> 2 independent groups: Kruskal-Wallis test

27
Q

For nominal data, which type of statistical test is used for comparison? Name the statistical test used in the respective settings.

A

ALL are NON-parametric tests:
2 independent groups: Chi-square test or Fisher’s exact test
2 paired groups: McNemar’s test
> 2 independent groups: Chi-square test or Fisher-Freeman-Halton test

28
Q

What are some considerations before deciding on an appropriate statistically test for hypothesis testing, when examining & quantifying the degree of a linear relationship (i.e. correlation) between two numerical variables?

A

1) Whether variables are continuous, ordinal or nominal
- For continuous data, whether they are normally distributed or not

2) Assumptions underlying a specific statistical test

29
Q

Which type of statistical test is used to examine the correlation between two numerical continuous variables, both of which are normally distributed?

A

Parametric test: Pearson product-moment correlation

- Used when BOTH variables are continuously normally distributed data

30
Q

Which type of statistical test is used to examine the correlation between two numerical continuous variables, where one of which is normally distributed?

A

Non-parametric test: Spearman rank correlation

- Used when one or both variables are continuously NON-normally distributed or ordinal data

31
Q

Which type of statistical test is used to examine the correlation between two numerical variables, where one of which is an ordinal variable?

A

Non-parametric test: Spearman rank correlation

- Used when one or both variables are continuously NON-normally distributed or ordinal data

32
Q

What are some considerations before deciding on an appropriate statistically test for hypothesis testing, when estimating the effect of independent variable X (i.e. predictor variable) on dependent variable Y (i.e. outcome variable)?

A

1) Whether the dependent variable Y (i.e. outcome variable) are continuous, ordinal or nominal
- For continuous data, whether they are normally distributed or not

2) Assumptions underlying a specific statistical test

33
Q

When estimating the effect of an independent continuous variable X (i.e. predictor variable) on a dependent ordinal variable Y (i.e. outcome variable), it is appropriate to use a simple ordinal regression analysis. True or false?

A

True.
Estimating the effect of independent variable X on dependent variable Y depends on the type of data Y is, and NOT influenced by the type of data X is!

34
Q

Which type of statistical test is used to estimate the effect of an independent continuous variable X (i.e. predictor variable) on a dependent continuous variable Y (i.e. outcome variable)?

A

Simple linear regression analysis

35
Q

Which type of statistical test is used to estimate the effect of an independent nominal variable X (i.e. predictor variable) on a dependent ordinal variable Y (i.e. outcome variable)?

A

Simple ordinal regression analysis

36
Q

Which type of statistical test is used to estimate the effect of an independent continuous variable X (i.e. predictor variable) on a dependent nominal variable Y (i.e. outcome variable)?

A

Simple nominal regression analysis

37
Q

When will simple or multiple (multivariable) linear / ordinal / nominal regression analyses be used?

A

Simple: When there is ONLY one independent variable X

Multiple / Multivariate: When there is more than one independent variables X