Statistics Flashcards

1
Q

Name two types of research

A
  1. Quantitative
  2. Qualitative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define qualitative research

A

Qualitative involves meaning, opinion, attitudes and beliefs, seeking deep information, answering complex questions, social understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define quantitative research

A

Quantitative involves numbers, proportions, statistics, testing hypotheses, looking at cause and effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name 4 types of data

A
  1. Categorical (discrete)
  2. Numerical (continuous)
  3. Calculated
  4. Censored
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of data have you got?

Smokers / Non-smokers

A

Categorical

Two possible categories

Binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of data have you got?

Married / Single / Divorced / Widowed

A

Categorical

More than two categories

No particular order

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of data have you got?

Strongly agree / Agree / Neither agree nor disagree / Disagree / Strongly disagree

OR

BPE index (0, 1, 2, 3, 4)

A

Categorical

More than two categories

Order is important, but no numerical relationship between numbers

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of data have you got?

Number of sick days taken

OR

Number of fillings

A

Numerical

On a scale

Integers only along the scale; numbers are related

Count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of data have you got?

Temperature in degrees Celsius

A

Numerical

No real zero

Can have data at any point along the scale

Interval – 20.4 degrees Celsius is not twice as hot as 10.2 degrees Celsius

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of data have you got?

Height in cm

A

Numerical

You can have zero cm

Can have data at any point along the scale

Ratio – 20.4 cm is twice as high as 10.2 cm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of data have you got?

BMI (Body Mass Index)

OR

HAD score (Hospital Anxiety and Depression score)

A

Calculated

The data have been derived from a calculation based on other measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Summarize the types of data (4 types)

A

Categorical

  1. Binary
  2. Nominal
  3. Ordinal

Numerical

  1. Count
  2. Interval
  3. Ratio

Calculated- e.g. BMI

Censored - e.g. loss to follow-up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why aren’t the mean, mode and median the same?

5, 0, 2, 5, 24, 0, 7, 1, 15, 0, 16, 2, 4, 6, 2, 3, 5, 10, 3, 5

Mean = 5.75
Median = 4.5
Mode = 5

A

The data may be “skewed”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe normal distribution

A

Mean = median = mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe skewed distribution

A

Mean ≠ median ≠ mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe parametric vs non-parametric statistics

A

Parametric Statistics

  • Normal distribution
  • Based on mean and standard deviation

Non-parametric Statistics

  • Skewed distribution or where you can’t prove it’s a normal distribution (small sample)
  • Based on median and interquartile range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can I tell if my data forms a normal distribution?

A

Mean = Median = Mode

Plot the data and judge by eye

Do a test for normality e.g. Shapiro-Wilk (small sample) or Kolmogorov-Smirnov (large sample)

Your sample should have more than 30 values to use parametric statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When do you use median?

A

Median is used when the data is skewed or there is not enough data to tell if there is skew (typically less than 30 in the sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When is hypothesis testing used?

A

Used in quantitative research to clarify what it is we are testing statistically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define null hypothesis

A

Null hypothesis (H0) assumes that there is no difference between the groups being tested

E.g. if we want to know if the IQ measurements of boys and girls are different at age 11, then we can say H0 = the IQ values of boys and girls at age 11 are not different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define alternative hypothesis

A

Alternative hypothesis (H1) - this holds if the null hypothesis can be rejected

E.g. H1 = the IQ values of boys and girls at age 11 are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When would you use one-tailed or two-tailed tests?

A

Two tailed tests – if the alternate hypothesis doesn’t have a direction
E.g. H0 = the number of fillings in men and women is the same; H1 = the number of fillings in men and women is different (could be less or more)

One tailed tests – if the alternate hypothesis does have a direction
E.g. H0 = the number of fillings in men and women is the same; H1 = men have more fillings than women

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Give one example test for each type of data

Categorical

Numerical, parametric

Numerical, non-parametric

Correlation

Dependence of variables

A

Categorical – Chi squared test

Numerical, parametric – Student’s t test

Numerical, non-parametric – Mann Whitney U test

Correlation – Pearson correlation coefficient

Dependence of variables – Simple linear regression

24
Q

Describe the importance of p-values (α) with respect to null hypothesis

A

The maximum probability of making a Type I error is α – we call this the significance level, and we usually focus on an error level of 5%

This means that there is a 5% chance that from our results we will reject a true null hypothesis and therefore get a false positive result

We can reject the null hypothesis if the p value that we calculate is less than α (5%)

At the 95% level of significance, we say that we can reject the null hypothesis if p < 5%, i.e. p < 0.05

This means there is a less than 5% chance that we have incorrectly rejected a true null hypothesis and inferred that there is an effect in whatever we have tested when there isn’t

25
Q

Describe power in statistics

A

We use power calculations to decide whether our research sample is big enough to give us meaningful data

The power is the probability of correctly rejecting the null hypothesis when it is false i.e. we detect as statistically significant a real effect

Power = 1 – β (1 minus the probability of making a Type II error, i.e. of reporting a false negative)

Should be more than 80%

26
Q

What is the ‘Gold standard’ in quantitative research

A

Double-blind randomized control trial

27
Q

Describe crossover trial

A

Same methods as RCT

Group A regime

  • Treatment
  • “Washout”
  • Placebo

Group B regime

  • Placebo
  • “Washout”
  • Treatment”
28
Q

Describe cohort study

A

Follow a sample of people over time

Longitudinal, prospective

Useful when it is not appropriate to compare treatments

29
Q

Describe case-control study

A

Compares people with and without disease

Try to identify causes

30
Q

Describe cross-sectional study

A

Surveys a sample of population at a point in time (snapshot)

31
Q

Describe case report

A

Look at rare conditions and/or novel treatments

32
Q

Recall the hierarchy of evidence (6 points)

A
33
Q

List the tests for categorical data

A

Categorical data

2 categories:

  • 1 group = Z test
  • 2 groups (paired) – McNemar
  • 2 groups (independent) – Chi squared; Fisher’s exact
  • > 2 groups – Chi squared

More than 2 categories:

  • Chi squared
34
Q

List the tests for numerical (parametric) data

A

Numerical - parametric

  • 1 group – 1 sample t test
  • 2 groups (paired) – paired t test
  • 2 groups (unpaired) – unpaired t test
  • > 2 groups – ANOVA
  • > 2 testing levels – MANOVA
35
Q

List the tests for numerical (non-parametric) data

A

Numerical – non-parametric

  • 1 group – Sign test
  • 2 groups (paired) – Wilcoxon signed rank test
  • 2 groups (unpaired) – Wilcoxon rank sum test (Mann Whitney U test)
  • > 2 groups – Kruskal-Wallis test
36
Q

Name a test of association - parametric data

A

Correlation – Pearson’s correlation

37
Q

Name a test of association - non-parametric data

A

Correlation - Spearman rank correlation

38
Q

Name 2 tests of prediction

A

Simple linear regression

Multiple linear regression

39
Q

Define standard deviation vs standard error

A

Standard Deviation = the variability of a sample
Describes the data (quantifies the scatter)

Standard Error = the variability of all the sample means
Facilitates an estimate of the mean of a population based on a sample mean (quantifies how precisely we can know the population mean)

40
Q

Describe confidence interval (CI)

A

We can be 95% confident that the population mean lies somewhere between the sample mean + or – 1.96 standard errors of the mean.

Wide confidence intervals are imprecise

Confidence intervals can be used to assess clinical importance of trial results

If confidence interval crosses 0 - then not statistically significant

41
Q

Recall and example comparing statistical significance and clinical significance?

A

Example 1 – trial of a new toothpaste

A study shows that there is a statistically significant reduction in caries (p = 0.032)

The actual reduction, though statistically significant, is 1%.

Would you now recommend changing to that toothpaste?

42
Q

Describe correlation

A

Correlation measures association between continuous variables.

For example – mortality from lung cancer and cigarette smoking

CAUTION: correlation measures association, not causation. It measures the extent of the association – how great is the effect?

43
Q

Describe correlation coefficient (r)

A

The correlation coefficient, r, is a value between -1 and +1.

It measures the strength and direction of the association

Positive r = positive association (a high score in Biology correlates with a high score in Chemistry)

Negative r = negative association (a high score in Biology correlates with a low score in Chemistry)

44
Q

Describe correlation coefficient of determination

A

Example

r = 0.662 (positive correlation)
r<sup>2</sup> = 0.438 (**coefficient of determination**)

r2 = percentage variability – e.g. how much variability in chemistry exams scores is explained by biology scores?

e.g. 0.437 = 43.8% (explained by biology scores). Therefore 56.2% of the variability is caused by other factors.

45
Q

Recall which correlation co-efficient to use for parametric and non-parametric data

A

Parametric data – use Pearson correlation coefficient

Non-parametric data – use Spearman rank correlation coefficient

46
Q

Describe purpose of linear regression

A

Purpose – to predict one variable from another

Follows on from correlation

Example – biology and chemistry scores; how well can we predict chemistry scores from the biology scores?

Involves plotting biology (independent variable) on the x axis against chemistry (dependent variable) on the y axis.

Based on y = mx + c, i.e. draws line of best fit.
Measures minimum vertical distance between each point and the line of best fit (residuals).

y = Chemistry mark, x = Biology mark m = gradient of line c = intercept
NB. If data is not linear, do transformation first (log, exponential, square, root – whatever works best)

47
Q

When is multiple linear regression (MLR) used? Recall 2 examples

A

Used in public health and epidemiology

Example 1 – looking at scores for Biology and Chemistry in male and female students. What amount of variability between Biology and Chemistry scores is due to gender?

Example 2 – we know that height and weight are correlated. How does BP correlate with height? Do MLR on height, weight and gender against BP P value – for each parameter when adjusted for the other parameters (confounding variables) Will tell us if BP is related to height, once weight and gender factors are taken out of the reckoning

48
Q

When is ANOVA (analyisis of variation) used? Recall the anaesthetics example

A

Analysis of Variance (ANOVA) used to analyse more than two continuous variables

Example - five Local Anaesthetics – looking at time to take effect (single factor)

H0: LA1 = LA2 = LA3 = LA4 = LA5

H1: at least one of the above LAs differs from the others

Test generates an F ratio and a p value

If the calculated F ratio exceeds the critical value for F, we can reject the null hypothesis – this is the case in our example, therefore at least one of the mean times to take effect is different from the other means

49
Q

What is ANOVA post-hoc analysis?

A

Post-hoc analysis – how do we find out which mean is different from the others (or more than one mean that is different)?

Option 1

  • Carry out Tukey test (or similar) to see which group(s) is/are different from the mean
  • Tukey statistic becomes the value to compare means e.g. if the difference between the means of LA1 and LA2 is greater than the Tukey statistic, then the difference between those means is significant, and vice versa

Option 2

  • Carry out 2 sample 2-way t tests between pairs of means.
  • Then apply the Bonferroni correction. This adjusts the significance from
  • *p <** 0.05 as follows:
    • ​Divide 0.05 by the number of tests being carried out
    • E.g. if you do 5 tests, the significance level becomes 0.05/5 = 0.01 – raises the “burden of proof” for any one test carried out (p must now be < 0.01 for each t test)
50
Q

Recall which ANOVA method to use when considering one or more factors

A

If one factor is being considered, use 1-way ANOVA
E.g. time taken for different LAs to take effect

If two factors are being considered, use 2-way ANOVA
E.g. time taken for different LAs to take effect in men and women

If more than two factors are being considered, use MANOVA
E.g.
time taken for different LAs to take effect in men and women of different ages, weights, liver function etc

51
Q

What is survival analysis and give 3 examples of different types

A

Measures time taken to an event, e.g. death, significant event e.g. MI, failure of a resin-retained bridge

Examples:

  1. Kaplan-Meier survival analysis
  2. Log-rank test – a non-parametric test where the null hypothesis states that the pattern of survival for two groups is the same
  3. Cox proportional hazard model – a regression technique that allows modelling to be carried out in order to adjust for potentially confounding variables
52
Q

Recall survival analysis outputs

A

Survival analysis output:

  1. Median survival time – at which the probability of survival is 0.5
  2. Survival rate – the proportion of individuals surviving longer than time t
  3. Hazard function – risk of having an event at time t
  4. Survivor plot – survival rate / time
53
Q

Describe relative risk (RR)

A

Shows the extent of difference between exposed and unexposed groups.

E.g. study results for different toothpastes and effect on gingivitis

  • Exposed (used Sparkledent) = cure 55%; disease 45%
  • Unexposed (established toothpaste) = cure 35%; disease 65%

In this case we can calculate RR for those who are cured and those who still have disease

RR cure = 55% (exposed and cured) / 35% (not exposed and cured) = 1.57

RR disease = 45% (exposed and diseased) / 65% (not exposed and diseased) = 0.69

54
Q

Describe risk difference

A

The difference between risks for cure and diseased groups

E.g. study results for different toothpastes and effect on gingivitis

  • Exposed (used Sparkledent) = cure 55%; disease 45%
  • ​Unexposed (established toothpaste) = cure 35%; disease 65%

Risk difference for cure = 55% - 35% = 20%

Risk difference for disease = 45% - 65% = -20%

From the risk difference we can calculate NNT = 1/risk difference * 100 = 100/20 = 5

55
Q

Define number needed to treat (NNT)

A

NNT = (1 / risk difference) x 100

56
Q

Describe odds ratio (OR)

A

Require the number of cases rather than percentages.

Calculates the odds of being cured against still having the disease.

E.g. study results for different toothpastes and effect on gingivitis

  • Exposed (used Sparkledent) = cure 55%; disease 45%
  • ​Unexposed (established toothpaste) = cure 35%; disease 65%

The disease odds ratio would tell us what is the chance of still having gingivitis depending on whether a person was exposed or not exposed to the new toothpaste.

Odds ratio summarises differences in numbers, whereas relative risk summarises differences in incidence rates