Lecture 1_190605 Flashcards
% Error
100%*(measured value – “true” value) / “true” value
Standard Deviation
= √((∑ (measurement – average)2) / (N – 1))
*Standard deviation is a measure of precision, not accuracy!
** One standard deviation includes 68% of the values in a sample population and two standard deviations include 95% of the values.
Ratio = yes
Interval = yes
Ordinal = maybe….
Nominal = no
peta = P
10^15 (Quadrillion)
tetra = T
10^12 (Trillion)
giga = G
10^9 (Billion)
mega = M
10^6 (Million)
kilo = k
10^3 (Thousand)
hecto = h
10^2 (Hundred)
deca = da
10^1 (Ten)
deci = d
10^-1 (Tenth)
centi = c
10^-2 (Hundredth)
milli = m
10^-3 (Thousandth)
micro = “m”
10^-6 (Millionth)
nano = n
10^-9 (Billionth)
Descriptive statistics
the methods to describe a data set (CASE REPORT)
Inferential statistics
used to draw conclusions about the data
Population
group from which data are to be collected
Sample
a subset of a population
Variable
a feature characteristic of any member of a population possibly differing in quality or quantity from one member to another
Categorical variables
Variables with discrete or qualitative values (names of labels). Example: blue, Ridge-back, whole number
- Nominal – no intrinsic order (GofT Characters, shirt designs)
- Ordinal – have order (Tofu 1-5, no measurement)
- Dichotomous – only 2 values (gender)
Continuous variables
Variables that they can be measured along a continuum. Example: 1, 2, everything in-between, etc
- Interval – numeric value & is measured (temperature….except kelvin)
- Ratio – like interval, but value of 0 indicates there is nothing (cannot have “-“ value, most common; height, age, kelvin)
Mean
Average disadvantage = outliers Ratio = yes Interval = yes Ordinal = maybe.... Nominal = no
Median
Middle value {13,23,11,16,15,10,26}->{10,11,13,15,16,23,26} = 15 {13,23,11,16,15,10,14,26}->{10,11,13,14,15,16,23,26} = 14.5 advantage = outlier insensitive Ratio = yes Interval = yes Ordinal = yes Nominal = no
Mode
Most common value {1,2,2,3,4,4,4,5,5,6} = 4 {4,2,4,3,2,2} = 2 Ratio = yes Interval = yes Ordinal = yes Nominal = yes
Range
highest value to lowest value disadvantage = outliers Ratio = yes Interval = yes Ordinal = yes Nominal = no
Interquartile range
75th percentile – 25th percentile advantage = outlier insensitive Ratio = yes Interval = yes Ordinal = yes Nominal = no
Standard error of the Mean (SEM)
= standard deviation / √N increased N = decreased SEM increased SD = increased SEM Ratio = yes Interval = yes Ordinal = maybe.... Nominal = no
Null hypothesis
no difference (mean for some variable is NOT statistically different between the groups)
Difference
The mean for some variable is “statistically” different between the groups
T-test
simplest test for difference between 2 groups
t = (Mean1 – Mean2) / √(SEM1^2 + SEM2^2)
increased SEMs = decreased t = less likely different
*increased N = decreased SEM = increased t
*increased SD = increased SEM = decreased t
increased Mean delta = increased t = more likely diff
The greater the magnitude of “t”, the more likely the groups are different
If |t| > 2.0 then P < 0.05 ~5% - difference occurred by random chance
If |t| > 2.7 then P < 0.001 ~0.1%
Chance
caused by random variations in subjects & measurements – bigger sample size will reduce chance errors (i.e. increased N = decreased SEM = increased t = more likely to be true difference vs. chance error)
Bias
Bias is NOT caused by random variation or chance, but rather by systematic variation (a bigger sample size will NOT help with bias and statistical analysis often will not reveal bias)
1) Selection bias – biased sampling of population
2) Measurement bias – aka systematic bias – poor measurement technique
3) Analysis bias – using analysis that favors one conclusion over another
Confounding
similar to bias, but involves mis-interpretation of accurate variables
* without adjusting for other factors that are known risk factors
POEM
Patient Oriented Evidence that Matters (SLOW STUDY)
*What patients really care about: mortality and morbidity. Everything else is DOE.
DOE
Disease Oriented Evidence (FASTER STUDY, CHEAPER)
- The stuff that patients don’t care about, but is related to disease.
- *The vast majority of articles in medical journals involve DOE. While suggestive, these studies can be misleading!
Research classification
STUDY DIAGRAM FROM PDF’d PP!!!
Clinical Trial
Experimental study in which the exposure status (e.g. assigned to active drug versus placebo) is determined by the investigator
Randomized Controlled Trial
A special type of clinical trial in which assignment to an exposure is determined purely by chance
Cohort Study
Observational study in which subjects with an exposure of interest (e.g. hypertension) and subjects without the exposure are identified and then followed forward in time to determine outcomes (e.g. stroke). [longitudinal study.]
Case-Control Study
Observational study that first identifies a group of subjects with a certain disease and a control group without the disease, and then looks to back in time (e.g. chart review) to find exposure to risk factors for the disease. *rare diseases
Cross-Sectional Study
Observational study that is done to examine presence or absence of a disease or presence or absence of an exposure at a particular time. Since exposure and outcome are ascertained at the same time, it is often unclear if the exposure preceded the outcome.
*good for question asking, bad for answers
Case Report or Case Series
Descriptive study that reports on a single or a series of patients with a certain disease.
*generates a hypothesis but cannot test a hypothesis because it does NOT include an appropriate comparison group.
DESCRIPTIVE STUDY
Incidence
number of new events
Incidence Rate
incidence / sum of time individuals in the population were at risk for having the event (e.g. events/person-years)
Prevalence
number of persons in the population affected by a disease at a specific time
*prevalence rate = prevalence / the number of persons in the population at the time
Relative risk or Risk ratio (Cohort Studies)
the ratio of the incidence of disease in the exposed group divided by the corresponding incidence of disease in the unexposed group
YES NO
+ A B
- C D
RR = (A/(A+B))/(C/(C+D)) = X times greater risk
Odds ratio (Case-Control Studies)
the odds of exposure in the group with disease divided by the odds of exposure in the control group
YES NO
+ A B
- C D
OR = (A/C)/(B/D) = AD/BC = X times greater odds
Attributable risk or Risk difference
a measure of absolute risk
attributable risk = difference between the incidence rates in the exposed and non-exposed groups.
Population Attributable Risk
= Attributable risk X proportion of exposed individuals in the population
Number needed to treat (NNT)
= number of patients who would need to be treated to prevent one adverse outcome
Sensitivity
is the ability of the test to identify correctly those who have the disease = A/(A+C)
YES NO
+ A B
- C D
Specificity
is the ability of the test to identify correctly those who do not have the disease = D/(B+D)
YES NO
+ A B
- C D
Positive predictive value
is the probability of disease in a patient with a positive test = A/(A+B) YES NO \+ A B - C D
Negative predictive value
is the probability that the patient does not have disease if he has a negative test result = D/(C+D)
Confidence Intervals
a range of values within which there is a high probability (95% by convention) that the true population value can be found
increased N = decreased SD = CI narrows!
Type I error (alpha)
the probability of incorrectly concluding there is a statistically significant difference in the population when none exists
*the number after a Pvalue. A P<0.05 means that there is a less than 5% chance that the difference could have occurred by chance
Type II error (beta)
the probability of incorrectly concluding that there is no statistically significant difference in a population when one exists
Power
measure of the ability of a study to detect a true difference = 1 - type II error rate or 1 - beta
* the smaller the difference, the greater the number of observations needed
Survival Analysis
Kaplan-Meier analysis measures the ratio of surviving subjects (or those without an event) / total number of subjects at risk for the event.
*Every time a subject has an event, the ratio is recalculated. These ratios are then used to generate a curve to graphically depict the probability of survival.
LR+
= Sensitivity / (1- Specificity)
LR-
= (1- Sensitivity) / Specificity