Stats Flashcards
an interval that has a certain probability of including the true population value.
confidence interval
increasing the sample size ____ SEM
DECREASES SEM
SEM = SD / √n
For sample sizes > ~30: The 95% CI
= sample mean ± 2 (sd/√n)
= sample mean ± 2 (SEM)
Variance of a proportion
Variance of a proportion = (P[1—P])/n
(provided n large, say > 100)
Where P = proportion that have heartworm
n = sample size = 200
Standard error of a proportion
Std error (SE) of a proportion = √variance
P value or Type I error (a)
The probability of having observed our data randomly - when the null hypothesis is true
Usually 0.05 (arbitrary)
No REAL difference to be found
Type II error (b)
Probability of accepting the null hypothesis when in fact the null hypothesis is false
Type II error depends on the true difference btwn populations
Occurs if there is no difference detected by the study!
Power of an experiment (1-b)
Given that there is a difference of a nominated amount btwn the two populations, the power is the probability of rejecting the null hypothesis at the a level of significance
Want the power to be at least
0.8
INCIDENCE RATE
Measure of the average risk of becoming a case during a specified time period
= number of NEW cases during a specified time period (often a year) / average population at risk during that time period
At least two visits are mecessary
ATTACK RATE
Incidence rate used in investigations of disease outbreaks
= number of NEW cases during a specified time period / initial population at risk
PREVALENCE
Prevalence focuses on disease states
= proportion of a population that is AFFECTED (may have had the disease for years) by disease at a given time
Dimensionless
Affected by both duration of disease and incidence - direct interpretation often difficult
relationship exists btwn incidence rate (I), average duration of a disease (D) and prevalence (P)
P (is proportional to) I x D
mortality is a specific form of
incidence rate
death rate
CASE FATALITY RATE
Numerator: deaths dt a given cause during a given time period
Denominator: total number of animals affected (ie cases)
CFR is measure of virulence or severity of a disease
Answers the question: “How many of those that get the disease will eventually die because of it during a given time period?”
PROPORTIONAL MORTALITY RATES
Often used when investigator has some mortality data but doesn’t know the population at risk (ie no denominator data is avail)
Numerator: number of deaths from a specified cause during a given time period
Denominator: all animals that died, regardless of cause
Answers the question: “Given that an animal has died, what is the probability that it died of a specific cause?”
Does not answer the question, “What is the risk of dying of a specific cause?”
RISK RATIO (RELATIVE RISK)
RR is usual way of comparing incidence rates
Formula: I1/I0
–I1 is incidence rate among “exposed group”
–I0 is incidence rate among “unexposed” group or reference group
Risk ratio > 1 association btwn “exposure” and disease
Risk ratio = 1 association btwn “exposure” and disease has not been demonstrated
Risk ratio < 1 negative association (protection) btwn the factor and disease
RATE DIFFERENCE (ATTRIBUTABLE RISK)
Formula: I1 – I0
Comparing incidence rates; measure of the absolute effect of exposure (1/time)
SENSITIVITY
the probability of a test correctly identifying those animals that are infected or have a specified condition
–Measure of the test’s accuracy w INFECTED animals
–Says nothing about the test’s accuracy w non-infected animals
–high Se = few false neg
–Poor Se = lots of false neg
SPECIFICITY
the probability of a test correctly identifying those animals that are not infected or which do not have the specified condition
–Measure of the test’s accuracy w non-infected animals
–Says nothing about the test’s accuracy w infected animals
–High Sp = few false positives
–low Sp = lots of false positives
If a disease has a v low prevalence, the Sp of the test can be estimated by
by assuming that virtually all the reactions are false positives and that the Sp would therefore be approximated by:
(c + d)/N
PPV
answers the question:
“What proportion of the test positive animals really have the condition?”
a/(a+b)
NPV
answers the question:
“What proportion of the test negative animals really do not have the condition?”
d/(c+d)
3 types of bias
Selection bias
Information bias
Confounding
selection bias
exists when the study group you have chosen is not representative of the target population
information bias
can occur when the information collected is wrong or poorly interpreted.
incorrectly recorded data, poorly calibrated scales, different environmental exposures of control and tx groups
collecting better quality information from ‘case’ group than ‘control’ group in case-control study
confounding bias
occurs when the association between a factor and the outcome of interest is distorted by the effect of an extraneous variable – a “mixing” of effects
P-value
P-value is the probability of having observed our data (or more extreme data) when the null hypothesis is true.
Power
The ‘power’ of an experiment is the probability that the null hypothesis will be rejected when there is a real difference of a given magnitude between treatments.
Generally, power of 80% is considered reasonable à we have an 80% chance of rejecting the null hypothesis if a difference exists.
Clinical Trial
Planned experiment conducted in the field designed to assess efficacy of a tx in animals/herds by comparing the outcome observed under the test treatment, with that observed in a comparable group of animals/herds receiving a control tx
Clinical Trial Requirements
Experiment is planned
Experiment involves the comparison of a test (new tx, new mgmt. procedure) and control (std. therapy, std. mgmt. procedure) tx
The study groups are comparable
Subjects are followed for a defined outcome
Cohort Study Features
Investigator identifies group of animals that have the hypothesised “cause” and that are free from the disease of interest
A group of control animals that do not have the hypothesised “cause” of disease also identified
The two groups of animals are followed over a period of time to determine how many animals in each group develop the disease of interest. The incidence rates of the disease are calculated and the risk ratio is determined.
Cohort Study Advantages and Disadvantages
+ Incidence rates can be directly calculated from the study
+ Study provides a complete description of the natural history of the disease. You will be following healthy animals and monitoring their progression into diseased states.
+ Allow the study of multiple potential effects of a given exposure.
+ Allow for good quality control.
- Large no. of animals required to study rare diseases
- In diseases w long induction or incubation period, there will be a long period of follow-up examinations
- They are relatively expensive
- Maintaining follow-up may be difficult
- Control of extraneous variables can be difficult
- Detailed study of the pathogenesis of the disease is rarely possible
Features of a Case Control Study
Investigator starts by identifying a group of animals that have or had the disease in question (the cases)
A similar group of animals (the controls) that do not have the disease are identified
The past history of each group is investigated to determine how many animals per group had the “exposure” of interest.
Unlike cohort studies, case-control studies are retrospective
The objective in selecting controls is to select animals that are representative of the source population from which the cases were derived.
Cannot calculate incidence rates à cannot calculate risk ratios either
**An estimate of the risk ratio can be made from a case control study by determining the odds ratio
Case control study advantages and disadvantages
+ Well suited to study of rare diseases or diseases w long incubation/induction periods
+Relatively quick and easy to conduct
+Require comparatively few subjects and are generally inexpensive
+Existing records can be used. There is no risk to the subjects in the study
+Allows for the study of multiple potential causes of the disease.
-Relies on recall or records for information on past exposure.
-Validation of the information you are told is difficult or sometimes impossible
-Control of extraneous variables is difficult
-Selection of an appropriate control group can be hard
-Incidence rates cannot be determined
-Detailed study of the pathogenesis of the diseases is rarely possible
Cross Sectional Study Features
Investigator samples a population of animals (or farms)
Each animal is classified according to whether it has the disease at that time. Cross sectional studies are therefore measuring prevalence of a disease
Each animal is classified according to whether it has the exposure at that time
Analyses are carried out to identify relationships btwn the cause and the disease
–Quick, easy and “dirty” studies
–Because both cause and outcome measure at same time, impossible to tell w any certainty which came first
Precise Study
relatively free from random error
REPEATABLE
Valid Study
relatively free of systematic error and bias
if repeat over and over, the average result would be the “right” answer
How to improve precision
1) Increasing number of subjects in study
2) Increasing the study efficiency
Imprecise estimate favours null hypothesis
Precise estimate more likely to be statistically significant
bias
may be defined as any systematic error in the design, conduct or analysis of a study that results in a mistaken estimate of the parameter of interest.
selection bias
Exists when the study group you have chosen is not representative of the target population.
Can occur due to:
a) Poor choice of control group
b) Poor choice of “sampling frame” (sampling frame is the sub-population in the target population from which the study subjects are drawn)
c) Non-responders and persons “excluded” from studies can cause selection bias
information bias
Can occur when the information is wrong or poorly interpreted, leading to a misclassification of an exposure or disease.
Two broad categories of misclassification:
1. Differential misclassification: when an error in classification is more likely to occur in one group than the other
–E.g. recall bias
–Can bias the study in either direction
- Non-differential misclassification - when an error in classification is equally likely to occur in either group
–Always biases a study towards the null hypothesis
Confounding
Occurs when the association between a factor and the outcome of interest is distorted by an extraneous variable. To be confounding, the extraneous variable must have the following three characteristics:
1. Confounding variable must be a risk factor for the disease.
2. Confounding variable must be associated with the exposure in the population from which the cases derive
3. The confounding variable must not be an intermediate step in the causal pathway between exposure and disease.
Sensitivity =
a / a+c
Specificity =
d / b+d
Accuracy =
a+d/a+b+c+d
dx correctly / all animals
RR
a/a+b / c/c+d
OR
AD/BC
Prevalence =
a+c / a+b+c+d
total cases / total #