Statistics Flashcards
Distinguish absolute and relative risk.
Absolute:
- Incidence
- Prevalence
- Odds
- Hazard rate/ratio
Relative:
- Risk ratio
- Hazard ratio
- Odds ratio
Define risk.
The likelihood of an event occurring (and, in HACCP, the consequence of that event occurring). It is the number of outcome events / number of all events.
What is relative risk?
Relative risk/risk ratio is the ratio of 2 risks.
RR = risk in group 1 / risk in group 2
How is relative risk calculated?
(A / (A+B)) / (C / (C+D))
Why can numbers be unreliable?
- Small sample size
- Some not sampled at all
- How representative are the different samples
- Where those that were assessed selected at random
- Always think about the potential for bias and how that could be introduced
What does the 95% confidence interval mean?
- 95% of the values lie within the range of 12 and 29
- Prevalence of obesity is 3.6% and 8.7%
- As you increase the number of samples, the values get closer to a normal approximation
If risk is the number of outcome events over the number of all events, what happens if we do not know the total number of events?
Pick random individuals without disease so as not to calculate the whole population. This is when we use odds.
What are odds?
The ratio of positive outcomes to negative outcomes.
= number of outcome events / number of non-outcome events
What is odds ratio?
The ratio of 2 odds is the odds ratio:
OR = odds in group 1 / odds in group 2
How is odds ratio calculated?
(A / B) / (C / D)
Define p value.
The probability that the difference between the value occurred by chance.
What is the significance of low and high p values?
- Low p-value: Unlikely that the difference is due to chance alone
- High p-value: Likely that the difference is due to chance
- P-value for difference between breeds = 0.001 (Chi sq test)
- Convention: a 5% (p=0.05) cut-off value for p-values is used to signify “statistical significance” (below 0.05 = statistically significant [difference])
What does a RR less than 1 mean if statistically significant?
Risk of disease/group is reduced.
What does a RR greater than 1 mean if statistically significant?
Much more likely to be in that particular group, have that disease.
Under what circumstances is odds ratio a good estimate of relative risk?
A good estimate of risk when the prevalence of disease is low or when the disease is rare.
Why use risk?
- More accurate reflection of population prevalence
- Easier to interpret
- Harder to calculate (need to be clear what the denominator us)
- Relative risk is the measure of association calculated from cohort studies
Why use odds?
- Cannot estimate prevalence of disease in the population
- Easier to calculate
- Odds ratio if the measure of association calculated from case-control studies
How can the way we present data affect how we interpret it?
- Always check axes on a graph and what the error bars actually are.
- This is why statistical analysis is used to interpret data.
- Is the right question being asked, for example, in the study above, is brain volume equivalent to intelligence?
What is statistics in veterinary/scientific literature?
Refers to the process establishing the probability of samples coming from the same populations.
What can graphs be used to do?
- Identify the general shape of the data including the centre of its distribution (its location or average) and how variable it is (its spread)
- Identify unusual or outlying points
- Compare the shape of two independent datasets
- Identify relationships between two variables in a dataset
List different ways of displaying data.
Simple scatter/x-y plots
Grouped frequency table
Histogram
Dot display
Bar chart
Box and whisker plot
Describe the usefulness of median and range as measures.
Median does not use all the data and so is not informative.
Useful for summary measures but make limited use of data.
Describe the usefulness of the mean as measures.
Unlike the median, the value of the mean is sensitive to outliers. A ‘truncated’ mean is sometimes used to avoid this, removing a predetermined number of the highest and lowest points before calculating.
What is standard deviation?
- The standard deviation is a measure of variation which gives an indication of the average spread of the data about the mean.
- It is (roughly) the average of the differences between all the points and the mean.
- The standard deviation also provides a useful tool for calculating probabilities from the data.
What is normal distribution?
The set of normal distributions are represented by a family of curves (distribution functions). For each curve, the y-value (frequency) is uniquely defined by just two parameters:
- The mean
- The standard deviation (or the variance which is simply the standard deviation squared)
Many biological characteristics conform closely enough to a normal curve for it to be used to model the data
What is data transformation?
- Sometimes continuous data are not normally distributed
- However, they can often be ‘transformed’ to approximate normality
- This then allows the standard, and very powerful, ‘parametric’ tools and tests that we will cover later to be used
- With biological data a log transform will often result in a normal distribution – particularly if the data are skewed.
- Almost any reversible transformation is acceptable to generate a normally distributed data set.
- Understanding that transformation isn’t fiddling the data is a really important statistical step
How do we measure disease?
Count the number of deaths which occur to that disease.
What is prevalence?
Prevalence = number with disease(dead) / total number. Typically expressed as a percentage
What is positivity?
Prevalence may reflect the underlying prevalence of infection in the flock, but as we know not all infections are symptomatic. In that case we might wish to test all the animals and see what fraction are infected.
The proportion of tests which are positive is the positivity
Positivity = Tests positive / (Negative tests + Positive tests)
If we test all animals with a perfect test then positivity and prevalence of infection are the same.
What is incidence?
- Incidence measures disease occurrence over time
- The number of new cases of a disease or condition that occur within a stated period of time as a proportion of the individuals at risk.
- Often stated per 10,000 or 100,000 population for ease of comparison between populations.
- Incidence rate = number of new cases / total animal time at risk
Distinguish the information that incidence gives with the information that prevalence gives.
Incidence:
- New cases
- Information about risk of contracting disease
Prevalence:
- Proportion affected at point in time
- Information on how widespread the disease is
- Dependent on duration of disease
- Good when stable conditions
- Not suitable with acute disease
What are denominator and numerator populations?
Denominator population: The total population under consideration or at risk
Numerator population: Number ‘of interest’ on top of a proportion calculation (e.g. proportion with disease = numerator (disease) / denominator (total population)
What is attributable risk?
Difference in disease incidence or risk in exposed compared to unexposed groups.
Why do we do surveillance?
- Rapid detection of disease outbreaks
- Monitor importation of new/emerging diseases
- Identification of new/emerging diseases
- Changes in host range (for vector borne disease)
- Monitor changes over time (incidence/prevalence)
- Evaluation of disease control programmes
- Assessment of the health status of a defined population
- Define priorities for disease control and prevention
- Provision of information to plan research
- Food safety or zoonosis risk
- Confirmation of absence of a specific disease
What are the properties of an ideal surveillance system?
- Pre-agreed intervention threshold
- Agreed case definition
- Harmonised diagnostic procedures
- ‘Collectors’ of cases
- Population at risk
- Means to record cases
- Means to report data
- Central data collation
- Timely and competent data analysis
- Means to communicate findings in a timely fashion
- Pre-defined intervention if threshold exceeded
- Means to assess effect of intervention
Describe active surveillance.
- Purposeful collection of disease information
- May muss ‘new’ diseases
- Where deliberate effort is made to screen apparently healthy individuals, as well as testing ill individuals, to attempt to identify the true prevalence in a population.
Describe passive surveillance.
- Makes use of existing data collection and voluntary reporting.
- Often no defined population size
- Often no defined unit of measure (sample vs case)
- Liable to under-ascertainment because it misses subclinical cases
Why is passive surveillance under ascertainment?
- Not all animals that get infected fall ill
- Not all owners of ill animals get a vet to see them
- If a vet does see an ill animal, may not take any samples, whether or not they decide to treat
- The owner might not be willing to pay for lab tests
- Samples may not yield a positive result, even if the animal is infected
- If a sample is diagnosed in a laboratory, it may not be reported
What are the advantages of passive surveillance?
- Cheap
- Can be responsive to new outbreaks
What are the disadvantages of passive surveillance?
- Biased and/or patchy (for example towards symptomatic or severe cases)
- Relies on individuals (for example, vets or farmers to report)
What are the advantages of active surveillance?
- Systematic detection of cases
- Comparable data time or area
- Unbiased?
What are the disadvantages of active surveillance?
- Expensive
- Time-consuming
Describe the surveillance strategy of the notifiable disease, brucellosis.
- Post import checks including tests done post-calving of imported cattle
- Regular bulk milk testing of dairy herds
- Investigation of cattle abortions
- Annual check blood testing of eligible herds
- Breeding bull monitoring
Outline the process of outbreak investigation.
- Outbreak investigation team
- Investigate causal agents
- Design test
- Treat/manage cases
- Implement controls
- Implement surveillance
- Inform national / international agencies
- Communications with public
Why might we want to survey an area or population?
For a particular disease to evaluate, such as prevalence of disease or extent of spread of an outbreak.