Statistics Flashcards

1
Q

What is quantitative data?

A

Numerical data

  • Discrete (whole number)- eg number of children
  • Continuous (usually a measurement)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give an example of nominal data?

A

Blood group, gender
Group that contains no logical order
Type of categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name types of qualitative data?

A

Categorical data

  • Nominal- contains no logical order
  • ordinal- categories have a natural order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If data is negatively skewed, what is the order of the mean, median and mode (from low to high/ L->R).

A

Mean, Median, Mode

Peak of graph further to right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If data is positively skewed (right skewed) then what order is the mean, median and mode (from left _>R)

A

Mode, median, mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the range?

A

Maximum - minimum.

Poor measure of spread as affected by outliers and dependent on sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the inter-quartile range?

A

upper quartile - lower quartile
Better than range as not influenced by outliers
3 measures- lower quartile, median and upper quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is variance?

A

Calculate deviations = difference between each observation and the mean of the data.
Square these deviations so negatives become positive
Average the squared deviations by dividing by n-1 (lose a degree of freedom, the mean has already been included)
Square root of the variance = standard deviation

Influenced by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you calculate standard deviation from variance

A

Square root of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What would you use to summarise symmetrical data?

A

Mean

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What would you use to summarise skewed data?

A

Median

Interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would you summarise categorical data?

A

Use number (%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What information does a box and whisker plot give you>

A

Median
IQ range
Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you summarise categorical data in a chart?

A

Pie chart

Bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mean and SD in a normal distribution data set?

A
Mean = 0
SD= 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the reference range and when can it be used and what does it measure?

A

Used in NORMAL DISTRIBUTION
Mean +/- 1.96 SD = often rounded to +/- 2SD = 95% data

Measure of spread of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In normal distribution data how much of data is included in mean +/- 1SD, +/- 2SD and +/- 3SD?

A

Mean +/- 1 SD = 68% data included
Mean +/- 2SD = 95% data included = reference range
Mean +/- 3SD = 99% data included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the difference between the 95% reference range and 95% confidence interval?

A

95% reference range (or normal range)

  • Mean +/- 2SD
  • Measures SPREAD of data

95% confidence interval

  • mean +/- 2 standard errors
  • Measures the ACCURACY of a sample estimate (95% probability that the interval contains true population value)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can you make positively skewed data more symmetric?

A

Calculate

  • Log (x)
  • 1/x
  • square root x

More difficult with negatively skewed date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can you check if a data set is normally distributed?

A
  • By eye - draw a histogram
  • test for normality eg Kolmogorov-Smirnov test or Shapiro-Wilk test
    If p <0.05 conclude not normal
    If p>0.05 no evidence against normal
    but small samples will have insufficient power to detect deviations from normality and for large samples normality usually less important
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is bias and how can you avoid it?

A

Bias: when the sample is selected in such a way that even with a very large sample you will not get the true answer
Avoid with a random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is precision?

A

A sample estimate is precise if different samples of the same size, selected in the same way would give answers which are close together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

WHat is a distribution defined by:

A
  • centre (mean)
  • Spread (SD)
  • Shape (i.e. normally distributed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

When will sample means be normally distributed?

A
  • the underlying data is normally distributed

- the samples are large (in which case does not matter if the data are normal or not - Central limit theorem)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is standard error of distribution of sample means?

A

SE of a distribution of sample means is a measure of the spread of those means.
It is the standard deviation of a sampling distribution
MEASURES PRECISION OF THE SAMPLE MEAN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

If there is a narrow spread of data - will the standard error by small or big?

A

Small - all means close to the true mean - precise estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

As sample size increases what happens to the standard error of means?

A

Gets smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do we calculate standard error of a distribution of sample means?

A
SE= σ/ √N
σ= SD of the population observations
N= sample size

However we don’t have data from the whole population so have to make do with SD (s) of a single sample to estimate the σ. As long as sample is large this should be a good measure.

SE estimated = s/ √N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Other than standard error of a distribution of sample means, what other types of sample estimate can you use?

A
Sample proportion
Difference between 2 means
Difference between 2 proportions
Relative risk/odds ratio
Regression coefficients

They all have different standard error formulae.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Calculate the standard error of a proportion - using categorical data. If 20% of 100 people have asthma.

A

SE (p) = √(px (1-p)/n)

SE (0.2) = √(0.2x0.8/100)= 0.04

68% CI for asthma 0.2 +/- 0.04

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a confidence interval and how do you calculate it?

A

An interval around a sample estimate within which there is a 95% probability that the true population value lies
Sample mean +/- 1.96 SEs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When would you look if 0 lies within the CI and when would you look if 1 lied within the CI?

A

If looking at difference between means and proportions - does 0 lie in the CI??
If looking at relative risk or odds ratio- does 1 lie in CI??
Then not statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Name some types of intervention studies?

A

RCT
Non- randomised clinical intervention studies
Experimental lab studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Name some types of observational studies?

A
Cohort studies
Case- control studies
Cross-sectional study
Ecological study
Case study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Describe a cohort study.

A

Usually disease free cohort followed over time and subsequent disease status recorded.
Usually prospective
Accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the advantages and disadvantages of a cohort study?

A

Accurate
Selection bias avoided
BUT..long and expensive, loss to follow up and inappropriate for rare diseases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Describe a case control study.

A

Cases who already have the disease are compared to disease free controls
Retrospective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the advantages and disadvantages of a case control study?

A

Quick and cheap
Suitable for rare diseases

BUT…

  • subject to recall bias, selection bias, assessment bias
  • relative timings can be difficult to ascertain
  • Not suitable for rare exposures
  • relative risks cannot be directly calculated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How is the association between a risk factor and disease outcome commonly summarised?

A

Relative risk

Odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

If a relative risk if >1 what does that mean?

A

RR>1 = increased risk

RR <1 = decreased risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

When can’t you use relative risk and what can you use instead?

A

Case control study - RR would not work in case control as you have picked the number of people with the disease.
Use odds ratio instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you calculate relative risk?

A

Outcome
RF Present Absent
Present a b a+b
Absent c d c+d
a+c b+d
RR = (a / a +b) / (c / c +d)
Number with risk factor + disease/ total number with risk factor divided by number without risk factor and with disease/ total number without risk factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is odds ratio and how do you calculate it?

A

Outcome
RF Present Absent
Present a b a+b
Absent c d c+d
a+c b+d

Odds of having the risk factor among the cases vs odds of having a risk factor in controls
Odds ratio = (a/c) / (b/d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is the null hypothesis?

A

Statement that there is no difference between groups in the population from which the sample has come.
ALWAYS about the population - would not make sense to hypothesis about the sample as we already known about that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is the p value?

A

Probability of obtaining sample data showing a difference as large or larger as that observed, if there is really no difference in the population from which the samples came i.e. the null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What does a p value <0.05 mean?

A

Unlikely that the sample could have come from a population where the null hypothesis is true <5% chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What does a p value >0.05 mean?

A

Is is possible that the sample could have come from a population where the null hypothesis is true -> insufficient evidence to reject the null hypothesis (NEVER say we accept the null hypothesis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Choosing the right statistical test.

Are you comparing means or percentages when looking at numerical, categorical and ordinal data??

A

Variable is numerical - you will be comparing means
Variable is categorical you will be comparing percentages
Variable is ordinal- you may use a specific test for ordinal data or you may treat the variable as categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

If you are comparing numerical data and want to compare paired groups then what is the right statistical test?

A

Paired T test
- Paired difference are normally distributed or large sample size (>100 pairs)

Wilcoxon’s signed ranks test

  • does not need normal distribution
  • NOT appropriate for ordinal data (as compares distributions not means)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is a paired group?

A

Two types:

  • when the same person provides 2 values (eg crossover trial)
  • when each person is one group has a matched control in another group (eg case control studies)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

When choosing the right statistical significance test, what questions might you ask?

A
Are you comparing means or percentages?
How many groups are you comparing?
Are the groups paired on independent?
Are the test assumptions met?
- sample size
- distributions
- equal variances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

When can you use the independent samples t-test and what assumptions does it make?

A

Comparing means of 2 independent groups
Data normally distributed (or if >50 in each group)
Normal variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What can you look out for which might show data is skewed?

A

Skewed data often summarised using medians instead of means
If mean - 2SDs takes you below minimum possible value (often zero), or mean +2SDs takes you above the max possible value then the data cannot be normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What does equal variances mean?

A

Equal distribution around the mean.

Can have normal distribution but different variance - bell is flatter or thinner but still symmetrical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

How can you test for equal variances?

A

Do a statistical test eg Levene’s test

  • if p <0.05 conclude variances not equal, if >0.05 no evidence against variances.
  • BUT if sample size small unlikely to have sufficient power and if large likely to pick up unimportant differences.

Could check for equal standard deviations. (less than a factor of 1.5 is ok)

If variances not equal then some packages perform separate variances version of t-test
Or could try transforming data (if positively skewed taking logs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

When would you use the Mann-Whitney test?

A

If assumptions for independent samples T test are not met.
I.e. non-parametric data
Can be used for numerical of ordinal data
Less powerful than the T test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What does the paired T test assume?

A

Paired differences are normally distributed (raw data can be skewed but the paired differences should be normally distributed)
If >100 pairs can drop this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

When would you use the Wilcoxon’s signed ranks test

A

Non parametric paired data
Generally less powerful than the paired t test
NOT ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

When would you use the ANOVA/analysis of variance test?

A

Normally distributed with equal variances
Used for >2 groups
P >0.05 no evidence of real difference between any pair of groups
p<0.05 there is evidence of a real difference between either some or all of the groups
Does NOT tell you which group

Needs follow up with post hoc test which tell you which groups have difference.
- compare each pair of groups
- automatically make an adjustment for multiple testing
Many tests available including Scheffe, Bonferri

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

When would you use the Kruskal-Wallis test

A

Non- parametric test
For > 2 groups
less powerful than ANOVA
Can be used for ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

When can you use the chi squared test?

A

Comparing percentages- categorical data
Between 2 independent groups

Calculate observed (O) and expected (E) frequencies
(O-E) ^2 / E
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

When can’t you use the chi squared test? What would you use instead

A
  • any cells have expected freq <1
  • > 20% cells have an expected freq < 5

Then use Fishers exact test (no min sample size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

When would you use McNemar’s test?

A

Paired groups comparing the percentages

Only valid if number of discordant partners at least 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

When would you use Chi-squared test for trend?

A

Ordinal variable- ordered groups
Large sample >30
Percentages increase/decrease linearly across groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What is the difference in null & alternative hypothesis for a one and two sided test?

A

2 sided test - difference can be in either direction
Null hypothesis: no difference between groups
Alternative hypothesis: there is a difference between groups, could be in either direction

1 sided test
Null hypothesis- no difference between groups or a difference in 1 direction
Alternative hypothesis - difference in other direction.

More likely to get a statistically significant test in a 1 sided test as have 5% at top.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

When would you use a one sided test?

A

Non-inferiority trial

Should not be used because a true difference in one directions is thought to be very unlikely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What is a significance level?

A

α = significance level of test
Usually set at 0.05
p <0.05 is significance level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What is a type 1 error?

A

Wrongly rejecting the null hypothesis when it is true.
So α (significance level of test) is the probability of making a type 1 error -usually 5%

Type 1 errors also occur in multiple testing

69
Q

What is a type 2 error?

A

Accepting the null hypothesis when it is in fact false (missing a real difference)
β = probability of making a type II error.

70
Q

What is power?

A

1 - β = power
Probability of avoiding a type II error - correctly rejecting the null hypothesis.
1 - β is usually set at 0.8-0.9 (80-90%) - for phase 3 trials would be 0.9

71
Q

When do type II errors occur?

A

When large differences observed but the sample size is small so results not statistically significant.

72
Q

When does multiple testing occur?

A

Subgroup analysis
Many outcomes or many predictors
Repeated measures data
Pairwise comparisons (>2 groups)
Repeated testing as more subjects recruited
Data- driven hypothesis
Trying different definitions of your variables until you find one that is significant

73
Q

Why is multiple testing a problem?

A

Probability of getting a non-significant result when the null hypothesis is true (i.e. getting it right) is usually 95% (1-α )

If we do 2 independent tests the probability of getting 2 non-significant tests correct is 0.95 x 0.95 = 0.90
So the probability of getting a significant test incorrectly (making a type I error) = 10%

If you perform 20 tests for which null hypothesis are all true you would expect to get 1 significant result

74
Q

How do we correct for multiple testing?

A

Adjust for it - use appropriate signficant test - a single overall measurement like repeated measurement ANOVA or post hoc tests which have inbuild adjustment
Simple manual Bonferroni correction
Report number of tests you perform- honest

75
Q

What is a Bonferroni correction?

A

Used to try to adjust for multiple testing.
Multiples the p value for each test by the number of tests performed. By increasing the p value it makes it more difficult to find signficant tests.
If p value was 0.001 and you had done 10 tests it would be corrected to 0.01.
Considered rather severe an adjustment

76
Q

What information do you need to decide on a sample size?

A
Significance level, α 
Power , 1- B
Standard deviation of data
Size of difference of clinical interest- min clinically important difference. 
Expected response

Need to allow for compliance/loss to follow up

77
Q

How do you calculate the sensitivity

A
True diagnosis         
                     \+ve       -ve
Test    +ve     a          b      a+b
           -ve      c          d      c+d
                     a+c     b+d

Sensitivity = a / a+c

True positives / all truly positive

78
Q

How do you calculate specificity?

A
True diagnosis         
                     \+ve       -ve
Test    +ve     a          b      a+b
           -ve      c          d      c+d
                     a+c     b+d

Specificity= d/ b+d

True negatives / all truly negative

79
Q

How do you calculate PPV?

A
True diagnosis         
                     \+ve       -ve
Test    +ve     a          b      a+b
           -ve      c          d      c+d
                     a+c     b+d

PPV= a/ a+b

True positives/ all that tested positive

80
Q

How do you calculate NPV?

A
True diagnosis         
                     \+ve       -ve
Test    +ve     a          b      a+b
           -ve      c          d      c+d
                     a+c     b+d

NPV= d/ c+d

true negatives/ all that tested negative

81
Q

When you have a low prevalence of disease, what is affected of sensitivity, specificity, NPV and PPV

A

PPV low as small numbers

Others will be high

82
Q

How can sensitivity and specificity be shown graphically?

A

ROC curve- look at area under curve, bigger area = better test

83
Q

What is a positive likelihood ratio?

A

Ratio of a chance of a positive result if the patient has the disease to the chance of a positive result if they do not have the disease.

Sensitivity / (1- specificity)

The higher the positive LR the better

84
Q

What is a negative liklihood ratio?

A

Ratio of a chance of a negative result if the patient has the disease to the chance of a negative result if they do not have the disease.

(1- sensitvity) / specificity

Lower the negative LR the better

85
Q

What is a SMR and what does an SMR >100 and SMR <100 indicate?

A

Standardised mortality ratio = observed deaths/expected deaths x 100

SMRs adjust for difference in age distributions of the groups being compared
SMR <100 indicates a lower death rate than expected having adjusted for age
SMR >100 indicates a higher death rate than expected, having adjusted for age

86
Q

Define incidence

A

Number of new cases over a given time period

87
Q

Define point prevalence

A

Number of existing cases at a certain point in time

88
Q

Define period prevalence

A

Existing + new cases which develop over a given time period.

89
Q

Describe the characteristics of a forest plot?

A

Used for meta analysis often
Boxes= effect size for each study - larger study= bigger box
Horizontal lines = 95% CI
Diamond= pooled effect eg relative risk
Width of diamond= 95% CI for pooled effect

Log scale often used for relative risks as can increase infinitely

90
Q

What is a meta- analysis?

A

Combination of results of several different studies investigating the same effect.
Single overall pooled estimate is obtained- often relative risk or odds ratio
Increases the power

91
Q

How do you select studies for a meta-analysis

A

Selected as part of a systematic review with pre-defined inclusion criteria. Assess study quality eg via PRISMA recording guidelines.

92
Q

What are the issues with meta-analysis?

A

Publication bias- small studies which do not show an effect are unlikely to be published. Use a funnel plot to overcome this.

Statistical heterogeneity: we can test for heterogeneity in the treatment effects beyond that expected by chance. If statistically significant then unlikely studies actually reflect a single underlying treatment effect

Clinical heterogeneity- causes statistical heterogeneity
- when studies have important differences eg population, contexts, eligibility, control, follow up

93
Q

Define incidence rate

A

Number of person-years at risk

94
Q

What is a correlation coefficient?

A

The correlation coefficient is a measure of the strength and the direction of the linear relationship between 2 numerical variables

Affected by outliers

95
Q

What do the values of pearson’s correlation coefficient lie between and what do they tell you?

A

R= -1 to +1

R +ve as x increases, y increases
R -ve as x increases, y decreases

R= 1 or -1 - perfect correlation, all points lie in a line (don’t confuse this with slope of the line- can have any slope)

R >0.8 strong correlation
R <0.2 weak correlation
R= 0 no correlation

96
Q

What is variance explained?

A

R squared x 100
Tells you how much the variation in one variable can be explained by the other.
Eg r = 0.94 … indicates a very strong positive correlation between a country’s average alcohol consumption and deaths rates from cirrhosis
0.94^2 x 100 = 88% so 88% of the variation in deaths from cirrhosis is accounted for by the variation in alcohol consumption

97
Q

Does correlation imply any causation?

A

NO

Shows an association

98
Q

What significance tests can be used for correlation and what assumptions are required?

A

Pearson’s correlation coefficient
- at least one of the variables is normally distributed

Spearman’s rank correlation coefficient
- data at least ordinal

99
Q

What can we use to try to identify if a relationship is causal?

A
Bradford Hill's criteria 
• Strength of association
• The cause must precede the effect
• Dose-response relationship
• Biologically plausible
• Consistent results from several studies
• Removing the risk factor should reduce the risk
of disease (reversibility)
100
Q

What is regression

A

If 2 variables appear to be related then linear regression fits a straight line to the data. Can predict one variable from another.

101
Q

What is the equation for a straight line in linear regression?

A

y = a + b x

x = explanatory variable (also predictor; independent)
y = outcome variable (also dependent; response)
a = the intercept (value of y when x=0)
b = the slope ( increase in y when x increases by 1 unit)

YOU MUST NOT REVERSE X and Y as would get a different line. (correlation coefficient you can swap them and it does not matter)

102
Q

What is the method of least squares regression?

A

finds the line which minimises the sum of the squares of vertical deviations of points (called residuals) from the line

103
Q

What is the null hypothesis for the significance test for regression

A

Slope = 0

No association in the population

104
Q

For a significance test of regression what assumptions do you make?

A

Residuals are normally distributed around the line
Residuals have constant variance around the line

If assumptions not met try a transformation eg log

105
Q

What is simple vs mulitple linear regression

A

Simple: one explanatory variable

Multiple regression: several explanatory variables. (3 dimensional line). Same assumptions apply

106
Q

What is equation for mulitiple explanatory variables model?

A

y = a + b1x1 + b2x2 + … + bkxk

107
Q

What is the dependent variable with simple and mulitple linear regression?

A

Numerical

108
Q

What are the dependent variables for logistic and cox regression?

A

Logistic : binary cateogorical eg hypertension of not
Use Odds ratios

Cox: time to event
Use Hazard ratios

109
Q

What are life tables?

A

Summarise survival/mortality according to age.
Only use when interested in age rather than a disease

Based on current age specific death rates
Cross sectional

110
Q

In survival data.

What is qx and px?

A

qx: probability of dying between x & (x+1) years
px: probability of surviving from age x to age (x+1years)

qx +px = 1

111
Q

In survival data what is nx, nx+1 and Px

A

nx: no of survivors at age x
nx+1 = nx * px
Px: cumulative survival probability

112
Q

What are follow up survival sudies

A

Survival of a special group eg breast cancer
Measure survival from a particular stage (age per se is not important)
At analysis some have not experienced an outcome -> censored

113
Q

Why do you censor a patient?

A

Lost to follow up
Still alive at end of study
The data contributes for as long as they have been observed

Will cause number at risk to be reduced but will not affect probability of survival or cumulative survival

114
Q

How do you calculate the probability of death, probability of survival and cumulative survival?

A

Prob death = no of deaths/ no at risk

Prob survival = 1 - probability of death
Cumulative survival = previous cumulative survival x new probability of survival

115
Q

What is the issue with censoring?

A

Assuming censoring is not self selected

If lots of people dropped out then may not be reliable

116
Q

Can you compare groups with a Kaplan Meier curve?

A

No - can’t compare survival in 2 groups using survival at a fixed point. Will be different times when they are nearer or further.

117
Q

What can you use to compare survival curves?

A

Logrank test

  • non parametric test
  • uses all survival data
  • no assumptions about shape of survival curve
  • assumes lines don’t cross over
118
Q

How does the log rank test work?

A

Assumes survival same in 2 groups = null hypothesis

Calculate expected nos of deaths & compare with observed nos.

Test this using a X^2 statistic

119
Q

How do you calculate log rank test?

A

Σ (d1-e1) ^2 / e1

120
Q

How does cox regression model work?

A

Uses a mathematical function of time to model how probability of death varies with time

Probability of death is known as the hazard & function of time t often denoted by H9t)- the hazard function

121
Q

What does the slope of the line equal to in the cox regression model?

A

The log of the hazard ratio

122
Q

If a hazard ratio is <1 what does that tell us?

A

< 1 is better
=1 chances are the same
HR= 2 - 2 x higher chance

Probability of progression in one group/ probability of progressing in other group.

123
Q

What does the cox regression model assume?

A

Hazards are proportional - risk doesnt change

Lines do not cross over

124
Q

What are systematic errors?

A

Only systematic difference between trial groups should be randomised treatment
Repeated error.

125
Q

How can you minimise bias/systematic errors?

A
Efficient and appropriate trial design
Randomisation
Blinding - pts and doctors
Using an intention to treat population
Minimise treatment and protocol deviations
126
Q

What are random errors

A

Caused by unknown unpredictable changes.

Results are estimates of a population

127
Q

How do you measure random error?

A

confidence intervals and p values

Minimise by having a sufficient sample size

128
Q

What is the aim of a Phase 1 clinical trial and how is it conducted?

A

Aim: dose finding: MTD

Conduct: 3+3, rolling 6, continual reassessment method (CRM) (need some previous human data for this usually)

Endpoints: tolerability, PK, PD, bioavailability

129
Q

What is the aim of a phase II clinical trial and how is it conducted?

A

Aim
- determine if a drug has a theurapeutic effect

Conduct

  • historically a single arm study of 20-80 pts
  • single stage design
  • two stage - Simon, Gehan design, allows trial to be terminated at end of 1st stage if clearely inactive

Endpoints

  • tumour response- quick, pCR, ORR
  • PFS
  • biomarker
130
Q

Why not do single group studies for phase II trials?

A

prone to selection bias
No real allowance for inprecision in historical estimate of response
Modest treatment effects may be lost

131
Q

What are the aims of a phase 3 trial and what is the conduct and endpoints?

A

Aim
- to determine if new treatment is better than an existing treatment

Conduct
- unbiased, reliable, clinical useful, randomised comparison

Endpoints

  • DFS, PFS< OS
  • adverse risk vs benefit profile
  • translational research- identify patients who have most/least to gain
132
Q

What types of trial design can you use in phase III trial?

A

Parallel groups- between patient comparisons

Factorial groups: >2 comparisons in same trial without necessarily increasing the size. Patient change to a different drug or have 2 interventions at same time.

Cross over: within patient comparison- each patient recieves all treatments

133
Q

What are adaptive trials?

A

Use accumulating data to decide how to modify aspects of the study without undermining the validity or integrity of the trial.

eg platform trials/ umbrella protocols/ basket trials

134
Q

What might you change in an adaptive trial?

A

Change dose of treatment
Change allocation ratio control: research
Early stopping for benefit/lack of benefit
Adding in new treatments - via randomisation or as additional cohorts

135
Q

Why randomise patients?

A

Reduce bias

Prevent confounders

136
Q

What is simple randomisation

A

Treatment allocated at random, easy and quick

But… can be an imbalance in the allocation due to chance

137
Q

What is randomisation with random permuted blocks?

A

Blocks allocate to treatment, ensures each treatment occurs a given number of times in a given series of patients.
It avoids predictable allocation but still some imbalance in prognostic factors.

138
Q

What is stratified randomisation?

A

Divide the patients into groups depending on important characteristics, then allocated equally within each strata either using simple or preferable random permuted randomisation.

139
Q

What is minimisation (in allocation of patients to trial group)?

A

Dynamic allocation method- patient is allocated dependent on the characteristics of patients who have already been allocated.
Also might incorperate a random element to avoid prediction of the next treatment (80% chance imbalance reduced and 20% chance it is increased)

140
Q

How is minimisation different to randomisation?

A

Allocation of new patients dependent on characteristics of those that went before.
Allocation lists cannot be drawn up
Treatment allocation uses balancing factors NOT stratification

141
Q

When would you use a placebo

A

If standard therapy is no therapy
Helps double blinding
Ensures benefit due to treatment not just fact they are being treated

142
Q

If you want to detect a small treatment effect how does this effect the sample size?

A

Larger sample size

143
Q

If you reduce the significance level then how does this affect sample size needed.

A

Larger sample size

If significance level 1%

144
Q

Why would you choose and intention to treat population?

A

Avoid bias: people who receive non- allocated treatment likely to be a selected subset, ignoring them excludes this type of person from treatment arm.
More pragmatic - gives an idea of the real world

145
Q

What is a per protocol population and when might you use it?

A

Usually excludes patients who have any major protocol violations and analysis is by treatment actually received.
Often used for
- safety analyses
- non-inferiority trials because data from patients who did not receive the protocol treatment tends to bias results towards equivalence and could make a truly inferior treatment appear non-inferior

BUT bias

146
Q

What is the safety population?

A

Should be defined in advance, but no standard definition
Analysis by treatment received, but can include all patients
who received some treatment (even if they were ineligible)
Sensitivity analysis conducted on ITT population and patients
with complete follow-up

147
Q

How should sub-group analysis be performed?

A

Should be prespecified in protocol to avoid data dredging.
If not pre-specified the interpret with caution

Only for hypothesis generating not for real data

148
Q

What should a trial protocol include??

A

1) Background and rationale
2) Specific objectives and purpose
3) Description of trial design
(randomised, placebo etc)
4) Registration and randomisation methods
4) Trial endpoints
5) Inclusion and exclusion criteria
6) Description of trial treatment
- treatment schedule
- dose modification procedures
7) Methods of patient evaluation
- baseline and follow up
8) Assessment of safety
- adverse event reporting
9) Required size of study,
- rationale for statistical assumptions
10) Trial progress – ‘stopping rules’
11) Data handling & record keeping
12) Ethics considerations
13) Plans for statistical analysis
- interim analyses
- monitoring of quality of data
14) Administrative responsibilities
15) Finance and insurance
16) Publication polic

149
Q

What is the process of conducting a RCT?

A

Start up phase

  • identify hypothesis
  • design trial
  • Write protocol
  • apply for funding
  • identify sponsor
  • ethics approval
  • CTA
  • centre approvals

Conduct trial:

  • recruit patients
  • manage data
  • monitor patietn safety
  • GCP

Analyse data

  • test hypothesis
  • analyse safety and efficacy data, publish results
150
Q

Where are trials submitted for ethics approval before starting

A

Research ethics commitee for approval

IRAS- integrated research application service - for combined ethics and central R&D approval

151
Q

Where do you report unexpected or serious events that happen during a trial?

A

Research ethics committee

also changes in protocol

152
Q

Once a trial is open how is ti monitored?

A

Central statistical monitoring

  • monitor recruitment rates, compliance, adverse events
  • freq depends on trial but should be done fairly regularly

Interim analysis

    • freq depends on trial
  • to look for treatment differences that are convincing and important enough to warrant stopping the trial early or changing the design
153
Q

What is the trial management group?

A

A multidisciplinary committee responsible for overseeing scientific and operational aspects of the trial.
- includes CI, co-investigators, key clinical and scientific collaborators, clinical trials unit representatives and patient representatives

154
Q

What are the roles of the trial management group?

A
  • input into trial protocol and case report forms
  • oversee ongoing conduct of trial
  • provide clinical or other expert guidance
  • develop strategies to optimise recruitment
  • promote and maintain profiel of trial during its follow up phase
    Actively contribute to interpretation and write up of results
155
Q

What is the trial steering committee?

A

Provide expert independent oversight of trial on behalf of sponsors and funders
Includes an independent chair and at least two further independent members with clinical or statistical expertise ( one must be a statistician)

156
Q

What is the role of the trial steering committee?

A

• consider protocol amendments that will significantly alter trial
design, conduct or analysis
• consider TMG strategies to improve trial conduct, e.g. recruitment
• consider recommendations of the IDMC
• consider decisions on future continuation (or otherwise) of trial
• oversee the timely reporting of trial results
• consider requests for analyses (from TMG and external groups)
not identified in the protocol or SAP

157
Q

What is the independent data monitoring committee (IDMC)?

A

Small group eg 2 clinicians and a statistician
Independent of trial organisers
Assess pre-specified interim analysis of data (results confidential).
Look at recruitment and completeness of data, side effects and interim results
Can recommend a trial is stopped -> give recommendation to TSC who makes final decision

158
Q

What is the issue with interim analysis (in RCTS) and how could this be resolved?

A

Each time you calculate the p value the more chance you have of finding a significant result.
Several statistical stopping rules or guidelines have been developed for multiple testing eg Pocock, Haybittle-Peto, O’Bien and Fleming

159
Q

What is external validity?

A

Refers to how well the outcome of the study can be generalised to the real world.

160
Q

What is internal validity?

A

Extent to which study establishes a trustworthy cause and effect. Depends largely on procedures of the study eg randomisation, blinding, protocol

161
Q

What is a cross-sectional study?

A

examines the relationship between disease (or other health related state) and other variables of interest as they exist in a defined population at a single point in time or over a short period of time (e.g. calendar year)
Main outcome obtained is prevalence

162
Q

What is an ecological study?

A

Is at population level.
Measures an outcome or risk in a population
Looks at a group, not individuals.

163
Q

WHat is the difference between a histogram and a bar chart?

A

Histogram groups the numbers into a range

164
Q

What is the sponsors role in a trial?

A

Overall responsibility for the conduct of the trial
Responsible for safety assessments
Must evaluate all SAEs and decide if they are SARs or SUSARs. Must report all SUSARs to MHRA

165
Q

WHat is the investigators role in the trial?

A
• Must record all AEs during a study
– records can be inspected by the Sponsor
• decide if an event is serious
• decide if an event is a reaction
• decide if a reaction caused by IMP
• Must immediately notify the Sponsor of
SAE/Rs (usually within 24 hrs).
166
Q

How do you calculate the expected freq for chi squared?

A

Row Total x column total / total number in both groups.

167
Q

When would you use log rank vs cox regression?

A

Log rank -> time to event, single predictor, categorical data

Cox regression-> more than one variable, continuous data

168
Q

What is a sequential trial?

A

Trial where the sample size is not defined in advance
Data evaluated as it is collected and stopped at a predefined outcome. Good when time between treatment and outcome is short.

169
Q

What is cancer registration and who is it managed by?

A

he National Cancer Registration and Analysis Service (NCRAS), part of Public Health England (PHE), is the population-based cancer registry for England. It collects, quality assures and analyses data on all people living in England who are diagnosed with malignant and pre-malignant neoplasms, with national coverage since 1971. It produces the national cancer registration dataset for England. The primary role of NCRAS is to provide near real-time, cost-effective, comprehensive data collection and quality assurance over the entire cancer care pathway. To achieve this, it receives data from across the National Health Service (NHS).
NHS Act 2006 protects HSE rights to collect cancer related data.