LEC 4 Statistics Flashcards

1
Q

2 branches of statistics

A
  1. Descriptive statistics
    - describe the attributes of a group or population
  2. Inferential statistics
    - draw conclusion from a sample and make inference to the entire population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 types of variables

A
  1. Nominal
    - categories that are not ordered
    - categorical
    eg subject count
  2. Ordinal
    - categories that are ordered
    - no fixed interval
    eg scale & cancer stages
  3. Continuous
    - with real values that reflect order and relative magnitude
    - fixed interval
    - normal or non-normal distribution
    eg age
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describing Nominal data & graphical presentation

A
n(%)
n = frequency
% = proportion
Graphical presentation
- pie chart
- bar chart (normal, clustered, stacked & segmented)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describing Ordinal data & graphical presentation

eg Likert scale

A
n(%)
n = frequency
% = proportion
Graphical presentation :
- pie chart
- bar chart (normal, clustered, stacked & segmented)

OR

median (IQR)
Graphical presentation :
- box plot (box-and-whiskers plot)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Box plot whiskers

A

Whiskers (1.5x of IQR)

Mild outlier : 1.5-3x below Q1 or above Q3
Extreme outlier : >3x below Q1 or above Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describing Continuous data & graphical presentation

  • normal
  • non-normal
A

Graphical presentation :

  • histogram
  • box plots

Normal distribution - mean +/- SD
Non-Normal distribution - median (IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of distribution of Continuous data (5)

A
  1. Normal distribution
  2. Positively skewed (right)
  3. Negatively skewed (left)
  4. Bimodal distribution (U shape)
  5. Several peaks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measure of central tendency

A
  • mean

- median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measure of variability

A
  • SD

- IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to ensure sample will lead to reliable and valid inferences

A

Random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Types of inferential statistics (2)

A
  1. Parameter Estimation

2. Hypothesis Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Population mean

A

Mean of all sample means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Central Limit Theorem

A

For sufficiently large sample sizes, the sampling distribution of the mean is approximately normally distributed
even if the underlying distribution of individual observation in the population is not normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Factors affecting the width of CI (3)

A
  1. Confidence level
  2. Sample size
  3. Standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

p-value

A
  • Probability that the observed result or a more extreme result would occur by chance alone, assuming Ho is true
  • The smaller the p-value, the stronger the evidence against Ho
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Type l error

A

= alpha

  • false positive
  • reject Ho when Ho is true (no significant difference/no effect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Type ll error

A

= beta

  • false negative
  • failure to reject Ho when Ho is not true (there is significant difference/effect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Statistical power

A

1-beta

Probability of correctly rejecting Ho when Ho is not true (there is significant difference/effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why CI is more informative than p-value? (2)

A
  1. Precision of the point estimate
    - width of CI
  2. Statistical significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

95% CI of difference

A
  • If does not include 0, there is statistical difference
  • p<0.05

eg mean diff = 0.45, 95% CI (0.3,0.6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

95% CI of ratio

A
  • If does not include 1, there is statistical difference
  • p<0.05

eg odds ratio = 2.51, 95% CI (1.04,3.28)

22
Q

Assessing normality of continuous data (3)

& hypothesis

A

Visual inspect of histogram or box plot

Shapiro-Wilk test (n<50)
Kolmogorov-Smirnov test (n>=50)

Ho = normal distribution
H1 = not normal distribution
23
Q

(( comparing data ))

Test for :

  • continuous data (normal)
  • 2 groups
  • independent
A

Independent t-test

24
Q

(( comparing data ))

Test for :

  • continuous data (normal)
  • 2 groups
  • paired
A

Paired samples t-test

25
(( comparing data )) Test for : - continuous data (normal) - >2 groups - independent
One-way ANOVA
26
(( comparing data )) Test for - continuous data (non-normal) or ordinal data - 2 groups - independent
Wilcoxon rank sum test (Mann-Whitney U test)
27
(( comparing data )) Test for : - continuous data (non-normal) or ordinal data - 2 groups - paired
Wilcoxon signed-rank sum test
28
(( comparing data )) Test for : - continuous data (non-normal) or ordinal data - >2 groups - independent
Kruskal-Wallis test
29
(( comparing data )) Test for : - nominal data - 2 groups - independent
Chi-square test or Fisher's exact test
30
(( comparing data )) Test for : - nominal data - 2 groups - paired
McNemar's test
31
(( comparing data )) Test for : - nominal data - >2 groups - independent
Chi-square test or Fisher-Freeman-Halton test
32
Describing Ordinal data | eg Cancer stages
``` n(%) n = frequency % = proportion Graphical presentation : - pie chart - bar chart (normal, clustered, stacked & segmented) ```
33
Ordinal data & graphical presentation | Cancer stages = Likert scale
No. Cancer stages can only be described in - frequency (n) and proportion (%) Likert scale can be described in both - frequency (n) and proportion (%) - median & IQR
34
Skewing of graph
Where the tail is at
35
Determine types of distribution from box plot
- rotate box plot 90 degrees clockwise | - check where is the longer whiskers (right or left)
36
If data given in median (Q1, Q3), how to differentiate between continuous data & ordinal data?
- check the type of data collected eg length of hospital stay vs Likert scale
37
Parameter Estimation
- seeks an approximate calculation of a population parameter eg "by how much __ reduce blood pressure?" (a) Point estimate (b) Interval estimate (confidence interval)
38
Hypothesis Testing
- seeks to validate a supposition based on limited evidence (hypothesis testing) eg "does __ reduce blood pressure?" (a) Null hypothesis (b) Alternate hypothesis
39
SD of the sample means
SD of population divided by square root of sample size
40
SD of sample means vs SD of sample scores
SD of sample means = SEM SD of sample scores = sample SD
41
SEM
Standard Error of the Mean | - quantification of the variability of the sample mean values
42
Point estimate
eg population mean
43
Interval estimate (2)
- provide a range of reasonable values that are intended to contain the point estimate - with a certain degree of confidence (usually 95%) eg confidence interval
44
Confidence level affecting CI
- larger confidence level, wider CI | - smaller confidence level, narrower CI
45
(( correlation )) Test for : - continuous normally distributed data
Pearson Product-Moment Correlation (r)
46
(( correlation )) Test for : - continuous non-normally distributed data - ordinal data
Spearman Rank Correlation (rs)
47
(( regression )) Dependent variable is : - continuous (normally or non-normally distributed)
Linear regression - simple - multiple / multivariable
48
(( regression )) Dependent variable is : - ordinal
Ordinal regression - simple - multiple / multivariable
49
(( regression )) Dependent variable is : - nominal (dichotomous/binary) variable
Logistic regression - simple - multiple / multivariable
50
When comparing data, consider (4)
1. No. of groups 2. Independent or paired/related groups 3. Type of data - nominal - ordinal - continuous (normal or non-normal) 4. Assumptions - esp nominal data (chi square or fisher's exact test)