LEC 4 Statistics Flashcards

1
Q

2 branches of statistics

A
  1. Descriptive statistics
    - describe the attributes of a group or population
  2. Inferential statistics
    - draw conclusion from a sample and make inference to the entire population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 types of variables

A
  1. Nominal
    - categories that are not ordered
    - categorical
    eg subject count
  2. Ordinal
    - categories that are ordered
    - no fixed interval
    eg scale & cancer stages
  3. Continuous
    - with real values that reflect order and relative magnitude
    - fixed interval
    - normal or non-normal distribution
    eg age
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describing Nominal data & graphical presentation

A
n(%)
n = frequency
% = proportion
Graphical presentation
- pie chart
- bar chart (normal, clustered, stacked & segmented)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describing Ordinal data & graphical presentation

eg Likert scale

A
n(%)
n = frequency
% = proportion
Graphical presentation :
- pie chart
- bar chart (normal, clustered, stacked & segmented)

OR

median (IQR)
Graphical presentation :
- box plot (box-and-whiskers plot)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Box plot whiskers

A

Whiskers (1.5x of IQR)

Mild outlier : 1.5-3x below Q1 or above Q3
Extreme outlier : >3x below Q1 or above Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describing Continuous data & graphical presentation

  • normal
  • non-normal
A

Graphical presentation :

  • histogram
  • box plots

Normal distribution - mean +/- SD
Non-Normal distribution - median (IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of distribution of Continuous data (5)

A
  1. Normal distribution
  2. Positively skewed (right)
  3. Negatively skewed (left)
  4. Bimodal distribution (U shape)
  5. Several peaks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measure of central tendency

A
  • mean

- median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measure of variability

A
  • SD

- IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to ensure sample will lead to reliable and valid inferences

A

Random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Types of inferential statistics (2)

A
  1. Parameter Estimation

2. Hypothesis Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Population mean

A

Mean of all sample means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Central Limit Theorem

A

For sufficiently large sample sizes, the sampling distribution of the mean is approximately normally distributed
even if the underlying distribution of individual observation in the population is not normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Factors affecting the width of CI (3)

A
  1. Confidence level
  2. Sample size
  3. Standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

p-value

A
  • Probability that the observed result or a more extreme result would occur by chance alone, assuming Ho is true
  • The smaller the p-value, the stronger the evidence against Ho
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Type l error

A

= alpha

  • false positive
  • reject Ho when Ho is true (no significant difference/no effect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Type ll error

A

= beta

  • false negative
  • failure to reject Ho when Ho is not true (there is significant difference/effect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Statistical power

A

1-beta

Probability of correctly rejecting Ho when Ho is not true (there is significant difference/effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why CI is more informative than p-value? (2)

A
  1. Precision of the point estimate
    - width of CI
  2. Statistical significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

95% CI of difference

A
  • If does not include 0, there is statistical difference
  • p<0.05

eg mean diff = 0.45, 95% CI (0.3,0.6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

95% CI of ratio

A
  • If does not include 1, there is statistical difference
  • p<0.05

eg odds ratio = 2.51, 95% CI (1.04,3.28)

22
Q

Assessing normality of continuous data (3)

& hypothesis

A

Visual inspect of histogram or box plot

Shapiro-Wilk test (n<50)
Kolmogorov-Smirnov test (n>=50)

Ho = normal distribution
H1 = not normal distribution
23
Q

(( comparing data ))

Test for :

  • continuous data (normal)
  • 2 groups
  • independent
A

Independent t-test

24
Q

(( comparing data ))

Test for :

  • continuous data (normal)
  • 2 groups
  • paired
A

Paired samples t-test

25
Q

(( comparing data ))

Test for :

  • continuous data (normal)
  • > 2 groups
  • independent
A

One-way ANOVA

26
Q

(( comparing data ))

Test for

  • continuous data (non-normal) or ordinal data
  • 2 groups
  • independent
A

Wilcoxon rank sum test (Mann-Whitney U test)

27
Q

(( comparing data ))

Test for :

  • continuous data (non-normal) or ordinal data
  • 2 groups
  • paired
A

Wilcoxon signed-rank sum test

28
Q

(( comparing data ))

Test for :

  • continuous data (non-normal) or ordinal data
  • > 2 groups
  • independent
A

Kruskal-Wallis test

29
Q

(( comparing data ))

Test for :

  • nominal data
  • 2 groups
  • independent
A

Chi-square test or Fisher’s exact test

30
Q

(( comparing data ))

Test for :

  • nominal data
  • 2 groups
  • paired
A

McNemar’s test

31
Q

(( comparing data ))

Test for :

  • nominal data
  • > 2 groups
  • independent
A

Chi-square test or Fisher-Freeman-Halton test

32
Q

Describing Ordinal data

eg Cancer stages

A
n(%)
n = frequency
% = proportion
Graphical presentation :
- pie chart
- bar chart (normal, clustered, stacked & segmented)
33
Q

Ordinal data & graphical presentation

Cancer stages = Likert scale

A

No.
Cancer stages can only be described in
- frequency (n) and proportion (%)

Likert scale can be described in both

  • frequency (n) and proportion (%)
  • median & IQR
34
Q

Skewing of graph

A

Where the tail is at

35
Q

Determine types of distribution from box plot

A
  • rotate box plot 90 degrees clockwise

- check where is the longer whiskers (right or left)

36
Q

If data given in median (Q1, Q3), how to differentiate between continuous data & ordinal data?

A
  • check the type of data collected

eg length of hospital stay vs Likert scale

37
Q

Parameter Estimation

A
  • seeks an approximate calculation of a population parameter
    eg “by how much __ reduce blood pressure?”
    (a) Point estimate
    (b) Interval estimate (confidence interval)
38
Q

Hypothesis Testing

A
  • seeks to validate a supposition based on limited evidence (hypothesis testing)
    eg “does __ reduce blood pressure?”
    (a) Null hypothesis
    (b) Alternate hypothesis
39
Q

SD of the sample means

A

SD of population divided by square root of sample size

40
Q

SD of sample means vs SD of sample scores

A

SD of sample means = SEM

SD of sample scores = sample SD

41
Q

SEM

A

Standard Error of the Mean

- quantification of the variability of the sample mean values

42
Q

Point estimate

A

eg population mean

43
Q

Interval estimate (2)

A
  • provide a range of reasonable values that are intended to contain the point estimate
  • with a certain degree of confidence (usually 95%)

eg confidence interval

44
Q

Confidence level affecting CI

A
  • larger confidence level, wider CI

- smaller confidence level, narrower CI

45
Q

(( correlation ))

Test for :
- continuous normally distributed data

A

Pearson Product-Moment Correlation (r)

46
Q

(( correlation ))

Test for :

  • continuous non-normally distributed data
  • ordinal data
A

Spearman Rank Correlation (rs)

47
Q

(( regression ))

Dependent variable is :
- continuous (normally or non-normally distributed)

A

Linear regression

  • simple
  • multiple / multivariable
48
Q

(( regression ))

Dependent variable is :
- ordinal

A

Ordinal regression

  • simple
  • multiple / multivariable
49
Q

(( regression ))
Dependent variable is :
- nominal (dichotomous/binary) variable

A

Logistic regression

  • simple
  • multiple / multivariable
50
Q

When comparing data, consider (4)

A
  1. No. of groups
  2. Independent or paired/related groups
  3. Type of data
    - nominal
    - ordinal
    - continuous (normal or non-normal)
  4. Assumptions
    - esp nominal data (chi square or fisher’s exact test)