Tutorial 1- Example 1 Flashcards
Q: A Primary Care Trust (PCT) wants to estimate the prevalence of smoking among their 100,000 residents. What does prevalence mean? How would they do this?
A:
take a random sample
Q: A Primary Care Trust (PCT) wants to estimate the prevalence of smoking among their 100,000 residents. Suppose they surveyed a random sample of people – why take a random sample?
A: It is the only way to be sure that the sample is not biased and is representative of the population as a whole
Q: Suppose they asked 100 people if they smoked and found that 28 did. If they then asked another 100, would they also find that 28 of them smoked? Why might they not?
A: If they kept sampling sets of 100 people and plotted the percentage of smokers (prevalence of smoking) in each sample, we would expect to see a normal distribution (see glossary), with most sample estimates centred around the true population percentage.
Q: The PCT’s estimate of their population’s smoking prevalence is 28% from their sample, but there will be some uncertainty around this estimate. How is this expressed? What does this mean?
A: We express this uncertainty using a 95% confidence interval (95% CI) around the estimate, e.g. 19% to 37%. This means that if we repeated the sampling 100 times, we would expect the true prevalence of smoking in the PCT to fall within the CI in 95 of the 100 samples.
Q: Suppose the PCT wanted to lower this prevalence; they could implement a smoking reduction campaign and then see if it worked by comparing their first estimated prevalence with an estimate after the campaign. They took two random samples, the first finding that 28% smoked as above, and the second finding that 21% smoked. Can we therefore say for certain that the campaign has worked and cut the prevalence by 28-21=7%? Why not?
A: the difference of 7% could simply be due to chance (sampling error). can’t decide by just subtracting if it is a real difference in prevalence.
Q: How would you determine whether there actually is a difference in prevalence?
A: This is done statistically by setting up a null hypothesis of no difference and looking for evidence to disprove it: what is the likelihood that our two samples were 28% and 21% if the two true underlying prevalences were the same?
We then choose the appropriate statistical test (e.g. chisquared test to compare the two proportions) to get this likelihood, which is the P value.
The lower the P value, the less likely that our estimated difference is a chance finding. Suppose the P value was 0.014.
Convention has it that if P<0.05 (and this is an arbitrary cut-off!) then we can reject the null hypothesis and conclude that the smoking prevalence fell after the campaign.
Such a result is called statistically significant
Q: What is the attributable risk? Calculation? Represents?
A: measure of exposure effect that indicates, on an absolute scale, how much greater the frequency of disease in the exposed group is compared with the unexposed, assuming the relationship between exposure and disease is causal (an important assumption).
It is the difference between the incidence rate in the exposed and non-exposed groups, i.e. it represents the risk attributable to the exposure of interest.
Q: Define and interpret a P value.
A: A p-value is the probability of obtaining the study result (relative risk, odds ratio etc.) if the null hypothesis is true.
The smaller the p-value, the easier it is for us to reject the null hypothesis and accept that the result was not just due to chance.
A p-value of <0.05 means that there is only a very small chance of obtaining the study result if the null hypothesis is true, and so we would usually reject the null.
Such as result is commonly called “statistically significant”.
A p-value of >0.05 is usually seen as providing insufficient evidence against the null hypothesis, so we accept the null. =