Tutorial 1 - Tools of the trade: understanding and interpreting the findings commonly reported in papers Flashcards
Why is it not ideal to sample the whole population
We cannot measure every individual as it is time consuming, expensive and takes time- virtually impossible.
How do we get around not being able to sample the entire population
We take samples of the population.
Generalise and interpret the results to make conclusions for the whole population.
What does statistics help to overcome
Sources of variation
Is one sample relevant on its own
No
We need to use statistical tools to generalise for the whole population
Describe confidence intervals
The PCT’s estimate of their population’s smoking prevalence is 28% from their sample, but there will be some uncertainty around this estimate. We express this uncertainty using a 95% confidence interval (95% CI) around the estimate, e.g. 19% to 37%. This means that if we repeated the sampling 100 times, we would expect the true prevalence of smoking in the PCT to fall within the CI in 95 of the 100 samples.
How do we assess whether differences are due to chance (sampling error) or by a real difference in prevalence
This is done statistically by setting up a null hypothesis of no difference and looking for evidence to disprove it: what is the likelihood that our two samples were 28% and 21% if the two true underlying prevalences were the same? We then choose the appropriate statistical test (e.g. chi-squared test to compare the two proportions) to get this likelihood, which is the P value. The lower the P value, the less likely that our estimated difference is a chance finding. Suppose the P value was 0.014. Convention has it that if P<0.05 (and this is an arbitrary cut-off!) then we can reject the null hypothesis and conclude that the smoking prevalence fell after the campaign. Such a result is called statistically significant.
Are statistically significant results more or less likely with small sample size than with large sample sizes?
The larger the sample size, the more information we have and so uncertainty reduces, more likely to be statistically significant
What is the formula for the chi squared test
X= sum of (Observed-Expected) squared/ expected
How do we calculate the odds ratio
e.g. the odds of exposure is the number of people who have been exposed divided by the number of people who have not been exposed
Which type of studies can be used to calculate odds ratio
– the relative risk can be calculated from cohort studies, since the incidence of disease in the exposed and non-exposed is known. In case-control studies, however, the subjects are selected on the basis of their disease status (sample of subjects with a particular disease (cases) and sample of subjects without that disease (controls)), not on the basis of exposure. Therefore, it is not possible to calculate the incidence of disease in the exposed and non-exposed individuals. It is, however, possible to calculate the odds of exposure. The odds ratio (of exposure) is the ratio between two odds, e.g. the odds of exposure in the case s divided by the odds of exposure in the controls.
Explain how an odds ratio may be a good estimate of the relative risk
This ratio is the measure reported in case-control studies instead of the relative risk. It can be mathematically shown that the odds ratio of exposure is generally a good estimate of the relative risk. An odds ratio of 1 tells us that exposure is no more likely in the cases than controls (which implies that exposure has no effect on case/control status); an odds ratio greater than 1 tells us that exposure is more likely in the case group (which implies that exposure might increase the risk of the disease). An odds ratio less than 1 tells us that exposure is less likely in the case group (which implies that exposure might have a protective effect).
How do we calculate relative risk
the relative risk is used as a measure of association between an exposure and disease. It is the ratio of the incidence rate in the exposed group and the incidence rate in the non-exposed group.
How do we interpret relative risk
A value of 1.0 indicates that the incidence of disease in the exposed and the unexposed are identical and thus the data shows no association between the exposure and the disease. A value greater than 1.0 indicates a positive association or an increased risk among those exposed to a factor. Similarly, a relative risk less than 1.0 means there is an inverse association or a decreased risk among those exposed, i.e. the exposure is protective.
Describe attributable risk and fraction
The attributable risk for lung cancer in smokers is the rate of lung cancer amongst smokers minus the rate of lung cancer amongst non-smokers (i.e. the risk difference). It gives an indication of how many extra cases for which the exposure is responsible, making the important assumption that the relation between the exposure and the disease is causal (i.e. not explained by other confounding factors – see below). The attributable risk and related measures are typically used to help guide policymakers in planning public health interventions.
Describe the structure of a null hypothesis for a case-control study
That the odds of taking HRT in women who had had an MI are the same as the odds of taking HRT in women who had not had an MI, i.e. the odds ratio equals 1. This would mean that taking HRT does not affect your chances of getting an MI (at least in the age range of those studied here)