1b Statistical Methods Flashcards

1
Q

What is sensitivity?

A

The probability that the test will be positive if the disease is present (true positives).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is specificity?

A

The probability that the test will be negative if the disease is absent (true negatives).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does disease prevalence impact sensitivity and specificity of a test?

A

Since sensitivity is conditional on the disease being present, and specificity on the disease being absent, in theory, they are unaffected by disease prevalence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a false negative rate?

A

The probability that the test will be negative when you are actually positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a false positive rate?

A

The probability that the test will be positive when you are actually negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate sensitivity

A

Sensitivity = a/(a+c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate specificity

A

Specificity = d/(b+d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you calculate the false positive rate

A

False Positive Rate = b/(b+d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you calculate false negative rate

A

False Negative Rate = c/(a+c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the sensitivity, specificity, false positive and false negative rate in the below example?

A sample of 410 people is taken to test if BNP can diagnose heart failure. All are tested for their BNP levels and then have an echo performed to assess if they actually do have heart failure (the standard gold test).

Number of participants = 410
Number of positive findings on BNP testing = 42
The number of positive findings on echo = 103
The number of false positives when using BNP = 68

A

Place the data into a 2x2 table

Sensitivity = a/(a+c) = 35/103=0.340=34%

Specificity = d/(b+d) = 300/307=0.977=98%

False Positive Rate = b/(b+d)=7/307=0.02=2%

False Negative Rate = c/(a+c)=68/103=0.66=66%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the positive predictive rate (aka the predictive value of a positive test)?

A

The probability of the patient having the disease, given a positive test result. I.e How likely a positive result is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the negative predictive rate (aka the predictive value of a negative test)?

A

The probability of not having the disease, given a negative test result.I.e How likely a negative result is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you calculate positive predictive value?

A

Positive predictive value=a/(a+b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you calculate negative predictive value?

A

Negative predictive value = d/(c+d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does disease prevalence impact negative and positive predictive value?

A

If disease prevalence increases then the predictive value of a positive test would also increase, and the predictive value of a negative test will decrease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the positive predictive value and negative predictive values in the below example?

Results of exercise tolerance test in patients with suspected coronary artery disease:
Number of positive tests = 930
Number of negative tests = 535
Number found to truly have coronary artery disease = 1023
Number found to truly not have coronary artery disease = 442
Number of positive cases on ETT who has CAD = 815
Number of positive cases on ETT who did not have CAD = 115

A

Place the numbers into a 2x2 table

Positive predictive value=a/(a+b)=815/930 =0.88

Negative predictive value = d/(c+d)=327/535 = 0.61

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Bayes Theorem and how does it apply to medical statistics

A

Pre-test odds of disease * likelihood ratio = post-test odds of disease.

This is used when interpreting likelihood ratios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the negative likelihood ratio (LR-)

A

The decreased chance of having the disease once you have tested negative.

The chance of having a negative test result and having the disease VS. The chance of having a negative test result and not having the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the positive likelihood ratio (LR+)

A

The increased chance of having the disease once you have tested positive.

The chance of having a positive test result and having the disease VS. The chance of having a positive test result and not having the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you calculate the positive likelihood ratio (LR+)?

A

LR+=Sensitivity/(1-Specificity)
Aka (True positives / false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you calculate the negative likelihood ratio (LR-)?

A

LR-=(1-Sensitivity/(Specificity)
Aka (False negatives / True negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the difference between sensitivity, specificity, positive & negative likelihood ratios and positive & negative predictive values?

A

Sensitivity = The probability that the test will be positive if the disease is present (true positives).

Specificity = The probability that the test will be negative if the disease is absent (true negatives).

Positive likelihood ratio = The increased chance of having the disease once you have tested positive. This value is applicable to an individual patient.

Negative likelihood ratio = The decreased chance of having the disease once you have tested negative. This value is applicable to an individual patient.

Positive Predictive Rate = The probability of the patient having the disease, given a positive test result I.e How likely a positive result is true. This value is not applicable to individual patients and is dependent on prevalence.

Negative Predictive Rate = The probability of not having the disease, given a negative test result. I.e How likely a negative result is true. This value is not applicable to individual patients and is dependent on prevalence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the advantages of likelihood ratios?

A

Not affected by different populations or sample sizes
Can be used directly at the individual patient level to quantitate disease probability for an individual patient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you interpret a positive likelihood ratio (LR+)?

A

A positive likelihood ratio of 6 means that the patient having the disease has increased by approximately six-fold given the positive test result.

An LR of 10 = A significant increase the probability of a disease
An LR of 5 = A moderate increase the probability of a disease
An LR of 2 = A small increase the probability of a disease
An LR of 1 = The test makes no difference

To translate this into an actual probability of disease use Bayes’ Theorem. Bayes’s theorem with likelihood ratios require that the probability of disease is in the form of Odds rather than a percentage.

Pre-test odds of disease * likelihood ratio = post-test odds of disease.

As well as calculating this by hand, you can also use Baye’s `Nomogram.

Using this we can see someone who originally had a 40% chance of having coronary artery disease, now has an 80% chance after the test. This is done by joining 40% on the first axis with 6 on the second axis and read off the post-test probability of 80%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do you interpret a negative likelihood ratio (LR+)?

A

The negative likelihood ratio (-LR) gives the change in the odds of having a diagnosis in patients with a negative test.

The change is in the form of a ratio, usually less than 1. For example, a -LR of 0.1 would indicate a 10-fold decrease in the odds of having a condition in a
patient with a negative test result. A –LR of 0.05 would be a 20-fold decrease in the odds of
the condition.

We can then translate this into an actual probability of disease using Bayes’ Theorem. Bayes’s theorem with likelihood ratios requires that the probability of disease is in the form of Odds rather than a percentage.

Pre-test odds of disease * likelihood ratio = post-test odds of disease.

As well as calculating this by hand, you can also use Baye’s `Nomogram.

Using a negative likelihood ratio of 0.14, we can see that someone who originally had a 17% chance of disease, now has a post-test probability of approximately 3%. This means that after a negative test the woman has a 3% chance of disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the likelihood ratio in the below example and how do you interpret it?

On clinical assessment, a 50-year-old male has a 40% chance of having coronary artery disease and so isn’t to for an exercise test. This is found to be positive. It is known that a more than 1 mm depression on exercise stress testing has a sensitivity and specificity of 65% and 89% respectively for coronary artery disease when compared to the reference standard of angiography.

A

Positive likelihood ratio = 0.65/(1-0.89) = 5.9

The likelihood of this patient having the disease has increased by approximately six-fold given the positive test result.

To translate this into an absolute probability of disease one must use Bayes’ Theorem.

= Pre-test odds of disease * likelihood ratio = post-test odds of disease.

This requires the odds of disease which we do not have, and so we must use Baye’s Nomogram which converts odds to percentages while doing the calculation for us.

The initial clinical assessment found that the 50-year-old man had a 40% chance of having coronary artery disease, we join 40% on the first axis with 6 on the second axis and read off the post-test probability of 80%, i.e. the patient has an 80% chance of having coronary artery disease given the positive test result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the Methods for the Quantification of Uncertainty in an epidemiological study?

A

Standard error
Reference ranges
Confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the types of standard error?

A

Standard error of the mean
Standard error of a proportion or a percentage
Standard error of count data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the standard error of the mean?

A

The standard error of the mean of one sample is an estimate of the standard deviation that would be obtained from the means of a large number of samples drawn from that population.

E.g. If you did 100 studies on a population, and found the standard deviation of each of their means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Which factors impact standard error of the mean?

A

Population base variation - The variation between samples depends partly on the amount of variation in the population from which they are drawn. For example, a series of samples of the body temperature of healthy people would show very little variation from one to another, but the variation between samples of the systolic blood pressure would be considerable

Sample size - the more members of a population that are included in a sample the more chance that sample will have of accurately representing the population, and thus if two or more samples are drawn from a population, the larger they are the more likely they are to resemble each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the Central Limit Theorem?

A

If we draw a series of samples and calculate the mean of the observations in each, this series of means generally conform to a Normal distribution, and they often do so even if the observations from which they were obtained do not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How do you calculate the standard error of the mean?

A

SEM=SD/√n

Standard error of the mean = Standard deviation of means/Square root of sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the standard error of the mean in the below example?

A general practitioner has been investigating whether the diastolic blood pressure of men aged 20-44 differs between printer workers and farm workers. For this purpose she has obtained a random sample of 72 printers and 48 farmers and calculated the mean and standard deviations, as shown.

A

To calculate the standard errors of the two mean blood pressures the standard deviation of each sample is divided by the square root of the number of observations in the sample.

SEM=SD/√n

Printers: SEM=4.5/√72=0.53 mmHg

Farmers: SEM=4.2/√48=0.61 mmHg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do you interpret the standard error of the mean?

A

Standard error tells you how much your statistic differs from the true value of population, i.e. how precise your estimate is.

For example, imagine we have sample of respondents and their income. We can compute the mean income of our sample, but we aren’t sure about how good this estimate is. So we compute standard error of the mean, which roughly tells us how much our estimate varies around the true mean of the population. The lower the standard error, the more we can be sure that our estimate of mean income is close to what the income for entire population is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the standard error of a proportion or a percentage?

A

Just as you can calculate a standard error associated with a mean, you can also calculate a standard error associated with a percentage or a proportion.

Here the size of the sample will affect the size of the standard error but the amount of variation is determined by the value of the percentage or proportion in the population itself, and so we do not need an estimate of the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How do you calculate the standard error of a percentage or proportion?

A

SE % =√(p*q)/n)

p = one percentage
q = (100-p) = the other percentage
n = number in the sample

Note that the above formula uses percentages. If you are given proportions, you can either convert these to percentages (multiply by 100), or use the modified formula below:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the standard error of the percentage in the below example?

A senior surgical registrar in a large hospital is investigating acute appendicitis in people aged 65 and over. As a preliminary study he examines the hospital case notes over the previous 10 years and finds that of 120 patients in this age group with a diagnosis confirmed at operation 73 (60.8%) were women and 47(39.2%) were men.

A

SE % =√(p*q)/n)

SE% = √(39.2*60.8)/120) = 4.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the Standard error of count data?

A

Standard error can also be calculated for count data, where you are given a number of events over set period of time.

For example the number of cardiac arrests in an A&E department every year, or the number referral rate from primary care to a specialist service per 1,000 patients per year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How do you calculate the standard error of count data?

A

Standard Error of Count Data = √λ

Where λ = The count

For example, a GP in a busy practice sees 36 patients in a given day. The standard error is therefore √36 = 6.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is the standard error of count data in the below example?

A GP sees 36 patients in a given day.

A

Standard Error of Count Data = √λ = √36 = 6.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is Pooling standard errors of two groups and why might it be done?

A

One way of comparing two groups is to look at the difference (in means, proportions or counts) and construct a 95% confidence interval for the difference.

As part of this process, we are required to calculate a pooled standard error of the two groups. The formulae required are similar to those used normally to calculate the standard error of the mean/proportion/percentage/count, however, each calculation within the square root is done twice, once for each group, before the square root is applied. This can be seen by comparing the formulae below:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you calculate a pooled standard error of the mean?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How do you calculate a pooled standard error of the proportion/percentage?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How do you calculate a pooled standard error of the count?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are confidence intervals?

A

A range of values that’s likely to include a population value with a certain degree of confidence. It is often expressed as a % whereby a population mean lies between an upper and lower interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How are confidence intervals formed?

A

Confidence intervals come from combining the concepts of standard deviation, the central limit theorem and the standard error of the mean.

95% of data following a normal distribution falls within 1.96 standard deviations of the mean.

If a series of samples are drawn and the mean of each calculated, these means will follow a normal distribution and thus 95% of the means would be expected to fall within the range of 1.96 standard errors above and 1.96 below the mean of these means.

We can use this to ascertain that if we have a mean from a sample, we can be 95% sure that the true population mean will fall between 1.96 standard deviations either side of this mean. This range is called the 9% confidence interval.

Other commonly used limits are the 90% and 99% confidence interval, in which case the 1.96 may be replaced by 1.65 (for 90%) or 2.58 (for 99%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How do you calculate a 95% confidence interval for a mean?

A

95% CI = Estimate ± (1.96 x Standard error)

Estimate =Mean, Proportion, Percentage or Count
SE = The relevant standard error of the estimate used e.g. SE of mean or SE of a proportion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

How can you use confidence intervals to compare two pieces of data and see if there is a likely difference?

A

Two options:

1) Calculate the confidence intervals for both sets of data, if the intervals do not cross you can say with 95% certainty that there is a significant difference between the two sets.

2) Calculate a confidence interval for the difference between the two estimates. If these do not include the null value (likely 0), then you can be 95% certain that there is a difference between the two data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

How do you calculate a confidence interval for the difference between two estimates and how is it interpreted?

A

95% CI for difference = (Estimate 1 - Estimate 2) +/- 1.96 (Pooled standard error).

Step 1) Calculate the pooled standard error,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Calculate whether or not there is a significant difference in the findings below using the confidence interval for the difference method.

The prevalence of teenage pregnancies in a city was 49 per 1000 in 2005 and 25 per 1000 in 2015.

A

1) Calculate the pooled standard error
Pooled SE= √(λ1+λ2) =√(49+25)=8.6

2) Calculate the 95% confidence interval
95% CI = (λ1−λ2) +/- 1.96(SE) = (49 – 25) ± (1.96 x 8.6)
= (7.1, 40.9)

3) Interpret
As the null value (0 in this case) is not included in the confidence interval range, then we can say that there is a statistically significant difference between the two results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Calculate whether or not there is a significant difference in the findings below by calculating two separate confidence intervals for the data.

The prevalence of teenage pregnancies in a city was 49 per 1000 in 2005 and 25 per 1000 in 2015.

A

1) Calculate the pooled standard error
SE of count = √λ (Where λ = The count)

SE #1 = √49 = 7
SE #2 = √25 = 5

2) Calculate the confidence intervals
95% CI = Estimate ± (1.96 x Standard error)

95% CI #1 = 49 ± (1.96 x 7) = (35.3, 62.7)
95% CI #2 = 25 ± (1.96 x 5) = (15.2, 34.8)

3) Interpret
As the two confidence intervals do not cross at any point, we can say with 95% certainty that the two data sets are statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the difference between a reference range and a confidence interval?

A

There is precisely the same relationship between a reference range and a confidence interval as between the standard deviation and the standard error. The reference range refers to individuals and the confidence intervals to estimates.

A confidence interval gives a range for which the true mean is likely to sit (with 95% confidence), and is used to estimate the true mean.

A reference range gives a range in which 95% of the values of a sample will likely lie and thus is used to distinguish when a result is abnormal. THis is used on an individual result in a sample to detect if they are abnormal.

Not, neither reference ranges nor CI have to always be 95%. For example, the WHO reference range for birth weight is 80%.

In appropriate circumstances the interval may estimate the reference interval for a particular laboratory test which is then used for diagnostic purposes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is the normal distribution?

A

Normal distribution describes continuous data which have a symmetric distribution, with a characteristic ‘bell’ shape.

Most Healthcare data is normally distributed and in a sample whose histogram has the approximate Normal shape, that population is presumed to have exactly, or as near as makes no practical difference, to have a Normal shape.

The Normal distribution is completely described by two parameters μ and σ, where μ represents the population mean, or centre of the distribution, and σ the population standard deviation. It is symmetrically distributed around the mean.

Only in normally distributed data sets do 95% of the values lie within 1.96 standard deviations of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is the binomial distribution?

A

Data which can take only a binary (0 or 1) response, such as treatment failure or treatment success, follow the binomial distribution provided the underlying population response rate does not change.

It describes the probability of getting r events out of n trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is the Poisson distribution?

A

The Poisson distribution is used to describe discrete quantitative data such as counts in which the population size n is large, the probability of an individual event is small, but the expected number of events, n, is moderate (five or more). Typical examples are the number of deaths in a town from a particular disease per day, or the number of admissions to a particular hospital.

Poisson distribution describes the distribution of binary data from an infinite sample. Thus it gives the probability of getting r events in a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is the t-distribution?

A

Student’s t-distribution is a continuous probability distribution with a similar shape to the Normal distribution but with wider tails.

t-distributions are used to describe samples which have been drawn from a population, and the exact shape of the distribution varies with the sample size. The smaller the sample size, the more spread out the tails, and the larger the sample size, the closer the t-distribution is to the Normal distribution.

Whilst in general the Normal distribution is used as an approximation when estimating means of samples from a Normally-distribution population, when the same size is small (say n<30), the t-distribution should be used in preference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What is the Chi-squared distribution?

A

The chi-squared distribution is continuous probability distribution whose shape is defined by the number of degrees of freedom. It is a right-skew distribution, but as the number of degrees of freedom increases it approximates the Normal distribution (Figure 4). The chi-squared distribution is important for its use in chi-squared tests.

These are often used to test deviations between observed and expected frequencies, or to determine the independence between categorical variables. When conducting a chi-squared test, the probability values derived from chi-squared distributions can be looked up in a statistical table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What is a null hypothesis?

A

The hypothesis that there is no difference between groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What is a type 1 error?

A

When you reject the null hypothesis when it is in fact true. The level at which a result is declared significant is known as the type I error rate, often denoted by α.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What is meant by rejecting the null hypothesis?

A

Rejecting the null hypothesis means that you have gathered suitable evidence to be confident that the null hypothesis is incorrect i.e. you have more than 95% confidence that there is a difference between the two groups.

Not rejecting the null hypothesis is not the same as proving the null hypothesis right, it just means you cannot reject it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

How do you reduce the risk of a type 1 error?

A

Increase the range of your confidence intervals by more standard deviations e.g. a 99% confidence interval instead of a 95% confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Make sure P-values are covered somewhere.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

What is a studies alternative hypothesis?

A

When performing a study, you develop a hypothesis. The alternative hypothesis is the hypothesis you would be looking to prove if the null hypothesis were correct.

E.g. Hypothesis may be that regular aspirin reduces the rate of heart attacks whereas the alternative hypothesis would be that regular aspirin has no effect on the rate of heart attacks .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What is a type 2 error?

A

A type 2 error is when you do not reject the null hypothesis when in fact there is a difference between the groups.

The type II error rate is often denoted as β

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What is study power?

A

The power of a study is defined as 1-β and is the probability of rejecting the null hypothesis when it is false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

How do you reduce the risk of type 2 errors?

A

Increase the study size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

How do you calculate the required sample size for a study?

A

Usually, the significance level is predefined (5% or 1%).

Select the power you want the study to have, usually 80% or 90% (i.e. type II error of 10-20%)

For continuous data, obtain the standard deviation of the outcome measure.
For binary data, obtain the incidence of the outcome in the control group (for a trial) or in the non-exposed group (for a case-control study or cohort study).

Choose an effect size. This is the size of the effect that would be ‘clinically’ meaningful.

For example, in a clinical trial, the sort of effect that would make it worthwhile changing treatments. In a cohort study, the size of risk that implies a public hazard.

Use sample size tables or a computer program to deduce the required sample size.

Often some negotiation is required to balance the power, effect size and an achievable sample size.

One should always adjust the required sample size upwards to allow for dropouts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

How can you limit multiple testing?

A
  1. Specify clearly in the protocol which are the primary outcomes (few in number) and which are the secondary outcomes.
  2. Specify at which time interim analyses are being carried out, and allow for multiple testing.
  3. Carefully review all published and unpublished studies before starting a trial.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What is multiple testing?

A

Studies that involve the simultaneous testing of more than one hypothesis.

70
Q

When can multiple testing occur?

A
  1. When many outcomes are tested for significance
  2. In a trial where one outcome is tested a number of times during the follow up
  3. Where many similar studies are being carried out at the same time.
  4. Bonferroni correction
71
Q

What is the Bonferroni correction?

A

A technique that accounts for multiple testing (or when a study is looking to test multiple hypotheses at once).

It states that if one is testing n independent hypotheses, one should use a significance level of 0.05/n. Thus if there were two independent hypotheses a result would be declared significant only if P<0.025. Note that, since tests are rarely independent, this is a very conservative procedure – i.e. one that is unlikely to reject the null hypothesis.

It is usually used informally, as a rule of thumb, to help decide if something which appears unusual is in fact quite likely to have happened by chance.

72
Q

What is a population in statistics?

A

A population is a complete set group of individuals, whether that group comprises a nation or a group of people with a common characteristic. It should be noted that it does not have to be people, it can also be creatures, things, cases etc.

73
Q

What are population parameters?

A

A number that describes something about an entire group or population (such as averages and standard deviations).

They are often denoted by Greek letters; the population mean is denoted by µ (mu) and the standard deviation denoted by s (lowercase sigma).

74
Q

What is a sample?

A

A smaller, more manageable version of a larger group. It is a subset containing the characteristics of a larger population.

75
Q

What is an unbiased measure in statistics?

A

One that is close to the true value

76
Q

What is a precise measure in statistics?

A

One that, when repeated, returns similar results.

77
Q

What is the difference between a precise measure and an unbiased measure?

A

An estimate of a parameter taken from a random sample is known to be unbiased. As the sample size increases, it gets more precise.

78
Q

What are the measures of location (also called central tendency)?

A

Mean
Mode
Median

79
Q

What is the difference between mean, mode and median?

A

Mean = Arithmetic mean = Sum of all samples/Sample size

Mode = Most Common

Median = Middle Value = The [(n+1)/2] observation

80
Q

What is the median of the following number set?

1.2, 1.3, 1.4, 1.5, 2.1, 3.5

A

The median would be the average of the 3rd and the 4th observation in the ranking, namely the average of 1.4 and 1.5, which is 1.45kg.

81
Q

What are the advantages and disadvnatages of using the mean?

A

Advantage:
Uses all the data values (aka efficient in a statistical sense).

Disadvantage:
Vulnerable to outliers

82
Q

What are the advantages and disadvantages of using the median?

A

Advantages:
Not as affected by outliers

Disadvanatages:
Not statistically efficient, as it does not make use of all the individual data values.

83
Q

What is a bimodal distribution?

A

One with two peaks e.g. Population height (one for men and one for women)

84
Q

What are measures of dispersion of statistical data?

A

Measures of dispersion describe the spread of the data.

85
Q

What are the different measures of dispersion?

A

Range
Interquartile range
Standard deviation
Variance

86
Q

What is the range?

A

A measure of variability and data spread. The range is given as the smallest and largest observations. Note in statistics (unlike physics) a range is given by two numbers, not the difference between the smallest and largest.

87
Q

What are quartiles?

A

A measure of variability and data spread. The quartiles, namely the lower quartile, the median and the upper quartile divide the data into four equal parts

88
Q

What is the median, lower quartile, upper quartile and interquartile range for this data set?

1.51, 1.53. 1.55, 1.55, 1.79. 1.81, 2.10, 2.15, 2.18, 2.22, 2.35, 2.37, 2.40, 2.40, 2.45, 2.78. 2.81, 2.85.

A

The median is the average of the 9th and 10th observations (2.18+2.22)/2 = 2.20 kg.

The first half of the data has 9 observations so the first quartile is the 5th observation, namely 1.79kg.

The 3rd quartile would be the 5th observation in the upper half of the data, or the 14th observation, namely 2.40 kg. Hence the interquartile range is 1.79 to 2.40 kg.

89
Q

What is the interquartile range?

A

A measure of variability and is given by the lower and upper quartiles e.g. 1.79 to 2.40 kg.

50% of observations lie within the interquartile range.

90
Q

WHat is standard deviation?

A

A measure of variability that expresses how much the members of a group differ from the mean value for the group.

91
Q

How do you calculate the standard deviation?

A

∑(xi -x)^2 = Subtract the mean (x) from each individual observation (xi), then square this difference.

n = Total number of observations

92
Q

What is standard variance?

A

Standard variance is equal to standard deviation squared (i.e. The sum of standard deviation before you square root it).

See the formula for standard deviation below.

93
Q

What is the standard deviation of the data below?

1.2
1.3
1.4
1.5
2.1

A

Mean = 1.5

(xi -x) = -0.3
(xi -x) = -0.2
(xi -x) = -0.1
(xi -x) = 0.0
(xi -x) = 0.6

(xi -x)^2 = 0.09
(xi -x)^2 = 0.04
(xi -x)^2 = 0.01
(xi -x)^2 = 0.00
(xi -x)^2 = 0.36

Sum of (xi -x)^2 = 0.5

n = 5

Variance = 0.50/(5-1) = 0.125 kg2

Standard deviation = √(0.125) = 0.35 kg

94
Q

What is the reference interval?

A

The fact that 95% of observations will be within two standard deviations of the mean

95
Q

Which measure of dispersion is best for highly skewed data?

A

IQR or range should be used

Standard deviations should not be used for highly skewed data, such as counts or bounded data, since they do not illustrate a meaningful measure of variation,

96
Q

What are the different graphical methods you can use to display epidemiological data?

A

Continuous Data:
Dot plots
Histograms
Box-whisker plots
Scatter plots

Categorical Data:
Bar charts
Pie charts

97
Q

What are dot plots?

A

A dot plot or dot chart consists of data points plotted on a graph.

98
Q

What are the advantages and disadvantages of a dot plot?

A

Advantage
Retains the individual subject values
Clearly demonstrates differences between the groups
Outliers clearly detected

Disadvantages
Not usually practical with large numbers

99
Q

What is a histogram?

A

A diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval.

100
Q

What type of data can be displayed in a histogram?

A

Continuous

101
Q

What are Relative frequency histograms and what are they used for?

A

Histograms where the y-axis shows the proportion of the observations in each bin rather than an absolute number.

These allow comparison between histograms made up of different numbers of observations which may be useful when studies are compared.

102
Q

Why is the number of intervals (or bins) in a histogram important?

A

Too few intervals and much important information may be smoothed out
Too many intervals and the underlying shape will be obscured by a mass of confusing detail.

103
Q

What are the advantages and disadvantages of histograms?

A

Advantages:
Allows you to visualise the shape of the frequency distribution
Demonstrates central tendency

Disadvantages:
Exact values cannot be determined.

104
Q

What is a box and whisker plot?

A

A way of representing statistical data on a plot in which a rectangle is drawn to represent the second and third quartiles, usually with a vertical line inside to indicate the median value. The lower and upper quartiles are shown as horizontal lines on either side of the rectangle.

105
Q

What do the Whiskers on a box and whisker plot represent?

A

The ‘whiskers’ in the diagram indicate the minimum and maximum values of the variable under consideration.

106
Q

What does the central horizontal line within a box and whisker plot represent?

A

The median

107
Q

What do the horizontal lines forming the edge of the box in a box and whisker plot represent?

A

The upper and lower quartile.

108
Q

What data is available in a box and whisker plot?

A

Median
Upper and lower Quartile
The maximum value
The lowest value
Range
Interquartile Range

109
Q

How do you determine if data is skewed when reviewing a box and whisker plot?

A

Any skew in the data will be apparent, as determined from the location of the median in relation to the lower and upper quartiles.

When the median is closer to the bottom of the box and the whisker is shorter on the lower end of the box, the distribution is right-skewed (or “positively” skewed).

110
Q

What are the advantages of a box and whisker plot?

A

Good at comparing two or more groups.

111
Q

How can outliers be further identified on a box and whisker plot?

A

A variation of the box and whisker plot restricts the length of the whiskers to a maximum of 1.5 times the interquartile range.

That is, the whisker reaches the value that is the furthest from the centre while still being inside a distance of 1.5 times the interquartile range from the lower or upper quartile.

Data points that are outside this interval are represented as points on the graph and considered potential outliers.

For example in the elective delivery category in the example, the two outliers of 8 and 48 fall outside of this 1.5 IQR and so are simply highlighted.

112
Q

What are scatterplots?

A

A graph in which the values of two variables are plotted along two axes.

If one variable x could be the cause of another variable y, then it is conventional to plot the x variable on the horizontal axis and the y variable on the vertical axis.

113
Q

When are scatterplots used?

A

When comparing two continuous variables.

114
Q

What are the advanatages and disadvanatages of a scatterplot?

A

Advantages:
Can demonstrate associations between two variables
Retain the exact data values (including minimum and maximum values)
Make outliers apparent

Disadvantages:
It can be hard to visualise individual results where data sets are very large
Weak relationships may not be apparent.

115
Q

What is a bar chart?

A

A diagram in which the numerical values of variables are represented by the height or length of lines or rectangles of equal width.

In a bar chart, there is a gap between each bar. This is unlike a histogram, where there are no gaps between the bars, reflecting the continuous nature of the underlying variable.

116
Q

What type of data can be displayed on a bar chart?

A

Discrete/Categorical

117
Q

What is a pie chart?

A

A type of graph in which a circle is divided into sectors that each represent a proportion of the whole.

Conventionally, the categories in a pie chart are ordered clockwise from the largest slice to the smallest, starting at the 12 o’clock position.

The relative frequency (given as a percentage) should be given for each category

118
Q

What type of data can be displayed on a pie chart?

A

Discrete/Categorical

118
Q

What are the advantages and disadvantages of a pie chart?

A

In general, pie charts are best avoided,

Advantages:
Not really any

Disadvantages:
Not good with a large number of categories ( >5)
Proportions can be difficult to estimate visually

119
Q

How to choose the right statistical test?

A

Rule 1) If there is no hypothesis, there is no statistical test. For example, in a prevalence study there is no hypothesis to test, and the size of the study is determined by how accurately the investigator wants to determine the prevalence.

Rule 2) Analysis should reflect the design, and so a matched design should be followed by a matched analysis.

Rule 3) Results measured over time require special care. One of the most common mistakes in statistical analysis is to treat correlated variables as if they were independent. For example, suppose we were looking at the treatment of leg ulcers, in which some people had an ulcer on each leg. We might have 20 subjects with 30 ulcers but the number of independent pieces of information is 20 because the state of ulcers on each leg for one person may be influenced by the state of health of the person and an analysis that considered ulcers as independent observations would be incorrect. For a correct analysis of mixed paired and unpaired data consult a statistician.

120
Q

What statistical test should you choose for paired or matched data?

A
121
Q

What statistical test should you choose for independent observations?

A

a = If data are censored

b = The Kruskal-Wallis test is used for comparing ordinal or non-Normal variables for more than two groups, and is a generalisation of the Mann-Whitney U test

c = Analysis of variance is a general technique, and one version (one way analysis of variance) is used to compare Normally distributed variables for more than two groups, and is the parametric equivalent of the Kruskal-Wallistest

d = If the outcome variable is the dependent variable, then provided the residuals (the differences between the observed values and the predicted responses from regression) are plausibly Normally distributed, then the distribution of the independent variable is not important.

e = There are a number of more advanced techniques, such as Poisson regression, for dealing with these situations. However, they require certain assumptions and it is often easier to either dichotomise the outcome variable or treat it as continuous.

122
Q

What are parametric and non-parametric tests?

A

Parametric and non-parametric tests are both ways of categorising different statistical tests.

Parametric tests are those that make assumptions about the parameters of the population distribution from which the sample is drawn. This is often the assumption that the population data are normally distributed.

Non-parametric tests are “distribution-free” and, as such, can be used for non-Normal variables.

123
Q

What tests are parametric?

A

Paired T-test
Unpaired T-test
Pearsons Correlation
One-way analysis of variance

124
Q

What tests are non-parametric?

A

Wilcoxon Rank Sum Test
Mann-Whitney U test
Spearmen Correlation
Kruskal Wallis Test

125
Q

If non-parametric tests can be used on any data set, and parametric tests cannot, why do we not use non-parametric tests for everything?

A

It would seem prudent to use non-parametric tests in all cases, which would save one the bother of testing for Normality. Parametric tests are preferred, however, for the following reasons:

  1. We are rarely interested in a significance test alone; we would like to say something about the population from which the samples came, and this is best done with estimates of parameters and confidence intervals.
  2. It is difficult to do flexible modelling with non-parametric tests, for example allowing for confounding factors using multiple regression.
  3. Parametric tests usually have more statistical power than their non-parametric equivalents. In other words, one is more likely to detect significant differences when they truly exist.
126
Q

What is a correlation coefficient, which tests produce a correlation coefficient and how do you interpret it?

A

The two main statistical tests that produce a correlation coefficient are:
Pearsons Correlation (When using the observed data)
Spearman’s Rank (when using ranks of data)

A correlation coefficient is used to measure the strength of linear association between two continuous variables, i.e. the closeness with which points lie along the regression line. The correlation coefficient (r) lies between -1 and +1 (inclusive).

If r = 1 or -1, there is a perfect positive (1) or negative (-1) linear relationship. If r = 0, there is no linear relationship between the two variables.

127
Q

What is “r” and how is it interpreted?

A

The correlation coefficient

128
Q

How is the correlation coefficient interpreted?

A

0.8 ≤ |r| ≤ 1.0 => very strong relationship
0.6 ≤ |r| < 0.8 => strong relationship
0.4 ≤ |r| < 0.6 => moderate relationship
0.2 ≤ |r| < 0.4 => weak relationship
0.0 ≤ |r| < 0.2 => very weak relationship

Correlation only measures linear association. A U-shaped relationship may have a correlation of zero.

It is symmetric about the variables x and y - the correlation of (x and y) is the same as the correlation of (y and x).

A significant correlation between two variables does not necessarily mean they are causally related.

For large samples very weak relationships can be detected.

129
Q

What is simple linear regression and what is its output?

A

Simple linear regression is a statistical method used to describe the relationship between two variables where one variable (the dependent variable, denoted by y) is expected to change as the other one (independent, explanatory or predictor variable, denoted by x) changes.

This technique fits a straight line to data, where this so-called “regression line” has an equation of the form:

  y = a + bx
  a = constant (y intercept)
  b = gradient (regression coefficient)

The output is a regression coefficient, as well as a standard error and confidence interval. From this, one can test the statistical significance of b. In this case, the null hypothesis is that b = 0, i.e. that the variation in y is not predicted by x.

The regression coefficient b tells us that for every 1 unit change in x (explanatory variable) y (the response variable) changes by an average of b units.

Note that the constant value a gives the predicted value of y when x = 0.

130
Q

How does simple linear regression work?

A

The technique fits a straight line to data, where this so-called “regression line” has an equation of the form:

y = a + bx

a = constant (y intercept)
b = gradient (regression coefficient)

It does this by the method of least squares method.

131
Q

How is the regression coefficient (b) interpreted?

A

The relationship is assumed to be linear, which means that as x increases by a unit amount, y increases by a fixed amount, irrespective of the initial value of x.

The variability of the error is assumed not to vary with x (homoscedasticity).

Unlike correlation, the relationship is not symmetric, so one would get a different equation if one exchanged the dependent and independent variables, unless all the observations fell on the perfect straight line y = x.

The significance test for b yields the same P value as the significance test for the correlation coefficient r.

A statistically significant regression coefficient does not imply a causal relationship between y and x.

132
Q

What is multiple linear regression?

A

Multiple regression allows many explanatory variables to be assessed simultaneously, with one response variable. The main use of multiple regression is to adjust for confounding.

In multiple linear regression, the dependent variable y is assumed to be continuous, and the explanatory x variables may each be continuous or binary.

Multiple linear regression, with k explanatory variables, gives an equation of the form:

      y = a + b1x1 + b2x2 +….. bkxk

When x1 is a categorical variable, such as a treatment group, and x2 is a continuous variable, such as age (a potential confounder), this is known as an analysis of covariance.

133
Q

What is logistic regression?

A

Logistic regression is used when the outcome variable is binary, being either an event (e.g. death or cure) or no event (e.g. survival or not cured). The input variables can be either binary or continuous.

In the simplest case when there is one input variable which is binary, then it gives the same result as a chi-squared test.

The logistic regression equation is as follows:

logit(p)=a+b1X1+b2X2+…+bkXk

134
Q

How do you interpret the following results in relation to the role of sex?

Lavie et al. (BMJ, 2000) surveyed 2677 adults referred to a sleep clinic with suspected sleep apnoea. They developed an apnoea severity index and related this to the presence or absence of hypertension.

The results are given in Table 1.

A

The coefficient associated with the dummy variable Sex is 0.161, so the odds of having hypertension for a man are e0.161 = 1.17 times that of a woman in this study.

On the odds ratio scale the 95% confidence interval is e-0.061 to e0.383 = 0.94 to 1.47. Note that this includes one (as we would expect since the confidence interval for the regression coefficient includes zero) and so we cannot say that sex is a significant predictor of hypertension in this study.

135
Q

Using the below results, how much more likely is a 40-year-old man to have hypertension than a 30-year-old woman?

A

Factors that are additive on the log scale are multiplicative on the odds scale. Thus a man who is ten years older than a woman is predicted to be 2.24×1.17=2.62 times more likely to have hypertension.

Thus the model assumes that age and sex act independently on hypertension, and so the risks multiply.

136
Q

What are life tables?

A

Life tables (or actuarial tables) show survival patterns for groups of individuals. Specifically, they given the probability that a person in a particular age group will die before reaching the next age group. The two types of life table are:

  1. Cohort life tables
    These show the probability of death at each age group in a described group of individuals that has been followed over time. Cohort life tables are frequently used for survival analyses.
  2. Period life tables
    These give the current probability of death in given population at different ages. Period life tables are often used in demography.
137
Q

How are life tables calculated?

A

To construct a cohort life table, in each time interval the following data are required: the number of people alive at the beginning of the time interval; the number of deaths occurring within the time interval; and the number of censored individuals (e.g. lost to follow-up).

If we assume that on average censoring occurs at the mid-point of the time interval, the number of people at risk in any given time interval is given by the number alive at the start minus half of the number of those censored.

The risk of death (and, hence, the risk of survival) can then be calculated from the number of deaths occurring during this interval.

138
Q

What is survival data anaylsis?

A

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death.

Survival data are times from a particular point to either an event or a censoring point.

This event point does not have to be death, for example, it could also be discharge or disease recurrence.

The censoring point is often termed disease-free survival, where the event in question (such as death or discharge from the hospital) has not happened.

An example of survival data might be the time from diagnosis of a disease to death.

139
Q

What is a censored observation in relation to survival data?

A

Survival data are times from a particular point to either an event or a censoring point. The censoring point is often termed disease-free survival, where the event in question (such as death or discharge from the hospital) has not happened.

Censored observations are observations for this censoring point.

Censored observations occur in two main ways:
1. Before the study completes, a subject may withdraw, or be lost to follow-up.
2. On completion of the study, subjects who have not yet experienced an event.

An important assumption in survival analysis is that the censoring is uninformative. What this means is that the probability of being censored is unrelated to the probability of having an event. For example, if terminally ill people are transferred to a hospice, and where they are then lost to follow-up, this would be informative censoring.

140
Q

How is survival data presented/displayed?

A

The best way to display survival data is a Kaplan-Meier survival curve. This has the probability of survival on the vertical axis, and time on the horizontal axis.

141
Q

What is a Kaplan-Meier survival curve and what value does it produce?

A

The best way to display survival data is a Kaplan-Meier survival curve.

This has the probability of survival on the vertical axis and time on the horizontal axis.

Every time an event occurs, the survival curve is re-calculated by first dividing the number of events that have occurred by the number of people remaining at risk at the time.

This value is then used to calculate the probability of survival.

142
Q

What is the difference between a Kaplan-Meier Survival curve and an actuarial survival curve?

A

An actuarial survival curve calculates survival at fixed points of time, such as annually rather than a Kaplan-Meier Survival curve which recalculates the rate every time there is a new event (such as a death).

143
Q

What is a hazard ratio?

A

A measure of how often a particular event happens in one group compared to how often it happens in another group, over time.

Example:
A cohort of 100 men followed up until they are all dead. By the age of 70, 25 had died, and in the following five years, 10 more died.

The risk of dying between 70 and 75 is 10/100 or 0.1.

The hazard of dying is conditional on a man living until he is 70, and so is 10/75=0.13.

144
Q

What is the difference between a risk ratio and a hazard ratio.

A

Need to do

145
Q

What is a Mann-Whitney U test?

A

The Mann-Whitney U test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed.

146
Q

What is a Modified Wilcoxon test and what result does it produce?

A

A function of the Wilcoxon rank sum statistic is proposed for testing the equality of the marginal distributions when sampling from a bivariate population.It is a nonparametric test and appropriate to use when the data are right skewed and censored (technically, the censoring must be non-informative).

It is a test of significance

147
Q

What is the log-rank test and what result does it produce?

A

The log-rank test, or log-rank test, is a hypothesis test to compare the survival distributions of two samples. It is a nonparametric test and appropriate to use when the data are right skewed and censored (technically, the censoring must be non-informative).

It is a test of significance

148
Q

What is the difference between the log-rank test and the Modified Wilcoxon test

A

They differ in the weight they give to events occurring early (modified Wilcoxon) or late (log-rank) in the follow-up period.

149
Q

What is cox regression?

A

The most commonly used model to analyse survival data is the Cox proportional hazards model. This models the log hazard ratio against a linear predictor of explanatory variables.

Similar to other multiple regression techniques, it allows for multiple exposure variables, allowing adjustment for confounding.

It is a semi-parametric model, which means that there is no requirement to parameterise the underlying survival distribution, but that the explanatory variables are included in a parametric model.

The assumption of proportional hazards means that, in the two group case, the hazard in one group remains proportional to the hazard in the other group over the follow-up time, or equivalently that the relative hazard remains constant.

See an example in the image below.

150
Q

What does the below cox regression tell us about the relationship between mortality and slate dust exposure?

A

One can see that there is a 24% increased risk of death over the follow-up period in those exposed to slate dust.

Need to heck this, don’t understand it.
The right half of the table shows the regression coefficients when smoking history is included in the analysis. It can be seen that the risk of slate dust is unaffected by smoking history.

151
Q

What are the assumptions of cox regression?

A

The main assumption is that this risk is constant over the follow-up period (the proportional hazard assumption).

152
Q

What are the assumptions of cox regression?

A

The main assumption is that this risk is constant over the follow-up period (the proportional hazard assumption).

153
Q

What is Heterogeneity?

A

relates to differences between studies; differences, for example, in study design, in populations studied and in interventions given. Heterogeneity must be considered before deriving conclusions based on systematic reviews, and its presence may impede the pooling of results from different studies.

There are three broad types of heterogeneity: clinical, methodological and statistical.

154
Q

What are the different types of heterogeneity?

A

Clinical heterogeneity:
Differences in the specific research question that was studied, such as differences in the eligible populations, in the interventions and controls, and in the outcome measures.

Methodological heterogeneity:
Describes a variability in study design and in the risk of bias. This can include differences in the interventions given, and in how the outcomes were defined and measured, as well as variations in the use of blinding and allocation concealment. Such methodological heterogeneity may result in different studies actually measuring slightly different things.

Statistical heterogeneity:
Refers to variability in the “true” intervention effects in different studies, and it arises as a consequence of clinical and/or methodological heterogeneity. It results in a variation in effect sizes that are larger than can be expected by chance.

155
Q

How is statistical heterogeneity measured?

A

There are two methods for assessing statistical heterogeneity:

Cochran’s Q statistic (a form of chi-squared test of the null hypothesis that the true effect in all included studies are the same)

I^2 test (which uses Cochran’s Q statistic to give a percentage score for heterogeneity, with higher percentages indicating greater heterogeneity).

156
Q

What is Cochran’s Q test and what does it produce?

A

This is calculated as the weighted sum of squared differences between the effects from individual studies and the pooled effects from all included studies.

The Q statistic has a chi-square distribution with (k-1) degrees of freedom, where k is the number of included studies.

The resulting Q statistic can be used to generate a p value for the null hypothesis of no heterogeneity.

157
Q

What is the I-Squared Test?

A

The I^2 statistic estimates the proportion of variation across included studies that is secondary to heterogeneity (rather than chance). It is calculated using the Q statistic, as follows:

I^2(%)=Q−dfQ×100

Where df is the degrees of freedom (the number of studies minus 1).

158
Q

When is Cochran’s Q less effective at detecting heterogeneity?

A

Cochran’s Q has a low power to detect heterogeneity when the number of studies is small (e.g. < 20), as is the case with most meta-analyses.

To compensate for this, a higher significance level may be used to determine statistical significance (e.g. p < 0.10).

159
Q

How do you interpret the I-squared value?

A

An I^2 of zero means that all the variability in effect sizes seen is due to sampling error and not heterogeneity.

An I2 value of above 30% may represent at least moderate heterogeneity, but this result needs to be interpreted in context of the actual clinical or methodological features that may have led to the heterogeneity.

160
Q

What is a funnel plot?

A

A funnel plot is a specific type of scatterplot, plotting the intervention effect sizes from different studies (on the x-axis) against some measure of the study size or precision (e.g. the inverse of standard error, on the y-axis). It is used to visualise the presence or absence of publication bias.

Figure 1. Hypothetical funnel plot showing the estimated effect size from studies with various sample sizes. The dashed lines (the funnel) indicate the region where 95% of studies would be expected to lie if there were no heterogeneity. If there were no publication bias, one would expect some smaller studies’ results to occupy the empty region bounded by the circle.

161
Q

How do you interpret a funnel plot?

A

Because the precision of the estimate of the effect size increases with the size of the study, the smaller studies will have more widely scattered effect sizes towards the bottom of the scatterplot, and this variability will reduce as the study sizes increase.

The premise is that publication bias will result in smaller studies with non-significant outcomes not being published. If publication bias is present it will result in an asymmetric appearance of the funnel plot, with a unilateral gap towards the bottom of the funnel where the results of the small, negative, unpublished studies should have been (Circled on Figure 1). Where publication bias has occurred, a subsequent meta-analysis will result in an overestimation of the true treatment effect.

162
Q
A

Crude death rate = Number of deaths in a population in a year (x1000)/ mid year population size

163
Q

What is Infant Mortality Rate (IMR)?

A

I.M.R = no of deaths in babies aged 0-1 in a year (x 1000)/no. of live births in same year

164
Q

WHat is Maternal Mortality Rate (MMR)?

A

M.M.R = no. of maternal deaths in a year (x 100,000)/no. of live births in year
child/woman ratio = children 0-4/Women 15-44 in population

165
Q

What is Crude Birth Rate (CBR)?

A

C.B.R = Births in year (x 1000)/Population at mid-year

166
Q

What is the General fertility rate?

A

GFR = Births in a year (x 1000)/Women 15-44/49 at mid-year

167
Q

What is Age Specific Fertility Rate (A.S.F.R)

A

A.S.F.R = no. of births to women age (x) in a year/ no. of women age (x) at mid-year

168
Q

What is Total Fertility Rate (TFR)

A

TFR = Sum Age Specific Fertility Rate x5/1000

169
Q

What is the child/woman ratio

A

CWR = children 0-4/Women 15-44 in population

170
Q

Add terms from the glossary once finalised

A

Needs to be done

https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/glossary