9. Statistics Flashcards
Continuous
Continuous:
can take any value in a
given range
e.g. height or weight
Discrete:
Discrete:
can take an integer value only
e.g. visual analogue scale
Ratio:
Ratio:
a data series that has zero as its
baseline, such as heart rate and
temperature (degrees Kelvin)
Interval:
Interval:
a data series that has zero as a
point on a larger scale, such as
temperature (degrees Celsius)
Numerical data (obtained from measurements)
Continuous
Discrete:
Ratio:
Interval:
Categorical data (grouped data
Nominal (unordered)
Ordinal (ordered):
Nominal (unordered):
Nominal (unordered):
data comes from mutually exclusive
unranked groups, e.g. procedural outcome (success or
failure) and gender (male or female)
Ordinal (ordered):
Ordinal (ordered):
ranked groups, e.g. categorical pain
rating scale (mild, moderate and severe), ASA grade (class I,
II, III, IV, V)
Probability
Probability is the chance of occurrence of an event.
It has a value
between 0 and 1.
The probability density curves are used to describe the
distribution of data in the given population.
They can be of different types:
Types of distribution of data
normal (most important), binomial (value of 0 or 1) or Poisson
distribution.
Characteristics of a normal distribution:
also called the Gaussian distribution
describes the distribution of continuous variables
bell-shaped symmetrical curve
mean, median and mode are identical
tails do not touch the baseline
peaks if variance (standard deviation) is low
flattens if variance (standard deviation) is high.
Mean:
it is the sum of all the values, divided by the number of values.
Median
it is the point that has half the values above and half below.
Mode:
it is used when we need a label for the most frequently occurring event.
Standard deviation:
it indicates how much a set of values is spread
around the average. It is a measure of dispersion.
Standard error of the mean
: it is the standard deviation of the
sample-mean estimate of a population mean. It gives an idea of how
closely the estimated mean value (from the sample) is likely to
represent the true mean value (from the general population).
It is worth remembering that:
what are the % SD for 1 2 3
± 1 standard deviations includes 68.2% of the data
± 2 standard deviations includes 95.4% of the data
± 3 standard deviations includes 99.7% of the data.
Reliability
Reliability is the dependability of a test
(consistency and
reproducibility).
Precision
Precision is the extent to which random variability is absent from the
test.
Reliability of a test is dependent upon its precision.
Validity
Validity is the extent to which the test measures what it was designed
to measure.
It has two components: sensitivity and specificity.
Accuracy
Accuracy is the ability of the test to produce the true values of the
measurements.
Sensitivity
The ability of a test to correctly
identify the individuals
who have the condition
A/(A + C)
Specificity
The ability of a test to correctly identify
the individuals who
do not have the condition
D/(B + D)
False-positive rate
Proportion of false positives
in the non-diseased population
B/(B + D)
or
(1 – specificity)
False-negative rate
Proportion of false negatives
in the diseased population
C/(A + C)
Or
(1 – sensitivity)
Positive predictive value
Proportion of true positives among all positives
A/(A + B)
Negative predictive value
Proportion of true negatives among all negatives
D/(C + D)
Accuracy
Proportion of true results
(true positive and true negative)
among all results
(A + B)/(A + B
+ C + D)
having a lax criteria
(three criteria – Point A) for
diagnosis leads to:
more total positives and less total negatives
high true positive (high sensitivity)
high false positive (low specificity)
low false negative
low true negative
stringent criteria (four criteria – Point B) for diagnosis means:
less total positives and more total negatives
low true positive (low sensitivity)
low false positive (high specificity)
high false negative
high true negative.
Hence adding another criterion as a requirement for diagnosis will lead to
lower true-positive rate
(low sensitivity),
but a lower false-negative rate as well
(higher specificity).
The former situation is desirable
in a screening test
(high sensitivity),
while the latter is desirable in a
confirmatory test (high
specificity).
Negatively skewed data
Most of the values are
positive,
tail points negatively (left)
Mode is least changed, while mean the most
Mean < Median < Mode
Example: a very easy test will be high-scoring, so
it will be negatively skewed
Positively skewed data
Most of the values are negative, tail points positive
(right)
Mode is least changed, while mean the most
Mean > Median > Mode
Example: a very difficult test will be poor-scoring,
so it will be positively skewed
Comparison of case-control and cohort study
Case-control study
Example: study of a group of chronic obstructive
pulmonary disease patients (case) and without
chronic obstructive pulmonary disease (control) to
identify risk factors (smoking)
Retrospective study
Outcome is measured before exposure
Inexpensive, easier, hospital-based
Needs small sample size
Allows estimation of odds risk only
Used to study relatively rare conditions
Selection bias more likely
Cohort study
Example: follow-up of smokers (cohort) and
non-smokers (another cohort) and the
development of chronic obstructive
pulmonary disease in each cohort
Prospective study
Outcome is measured after exposure
Expensive, harder, community-based
Needs large sample size
Allows determination of incidence and
relative risk
Used to study common conditions
Selection bias less likely
Cross-sectional study
(prevalence study):
studies present cases,
allowing estimation of prevalence
and
risk factors at the same time.
It is easy and
inexpensive.
Randomised clinical trial:
is a prospective study
with randomised study groups.
It may be blinded to reduce selection bias.
Randomised clinical trial:
is a prospective study
with randomised study groups.
It may be blinded to reduce selection bias.
Relative risk is
a ratio of the probability
of the event (disease)
occurring in the exposed group
versus a non-exposed group
Attributable risk
is the difference in rate
of a condition between an
exposed population
and an unexposed population
The odds ratio
is the ratio of the odds of an event
occurring in one group to the odds
of it occurring in another group
Absolute risk reduction
is the reduction in risk associated
with treatment
(or removal of risk factor)
as compared with placebo
Number needed to treat
is the average number of patients
who need to be
treated to prevent one
additional bad outcome.
NNT
It is the inverse of absolute risk reduction.
Example: number of children we need to vaccinate to prevent one case of disease.
The lower the number needed to treat, the more effective the intervention is.
Number needed to harm
is the number of patients that
need to be exposed to a risk factor
over a specific period to cause harm
in one additional patient.
NNH
It is the inverse of the attributable risk.
Example: number of adults that need
to be exposed to smoking to have
one more case of lung cancer.
The lower the number needed to harm,
the worse the risk factor
Null hypothesis
States that there is no difference between the two groups being studied
Null hypothesis
define
It provides a starting point for the study.
Then, if no difference is found,
it is accepted and
there is no statistical difference noted.
However, if a difference is noted, it is rejected and the result is ascribed a
statistical significance.
The level of statistical significance is called
the P value (usually 0.05)
and is the
probability of occurrence
of a type I error
(α).
Type I error
Cause: rejecting null hypothesis when it is true
Inference: finding a false difference between groups when none exists
Probability of type I error is α
Type II error
Cause: accepting null hypothesis when it is
false
Inference: missing a true difference between groups
Probability of type II error is β
Power of a study is described as
its ability to detect a
difference between groups.
It may be described as the probability
of not obtaining a type II error
(β).
Hence, power = (1 – β).
For a good study,
Power should be 0.8 or more.