Week 1 Day 1 - Mathematics Flashcards
Descriptive statistics
Use to organize, summarize, and present the values
Draws NO consclusions
“The data is the data”
Inferential statistics
Used to draw conclusions about data
Categorical variable
variable with discrete or qualitative value
male/female
liking tofu 1-5 scale
shirt (4 types)
quarantine activity is qualitative, but is infinite, not discrete
Continuous variable
variable that can measured along a continuum
age
temp
height
years as a nurse
nominal
categorical variable
no intrinsic order - shirt, quarantine activity
ordinal
categorical variables
have order - tofu (1,2,3,4,5)
dichotomous
categorical variable
only 2 values - m/f (order doesn’t matter)
interval
continuous variable
numeric value and is measured
i.e. age, temp, height, years as a nurse
ratio
continuous variable
like interval, but value of ‘0’ indicates there is nothing
i.e. age, height, years as a nurse
temp not ratio variable, nothing meaningful or valuable about my favorite temp being 70F and yours 75F
mean
as it relates to variables
advantage: easy to calc
disadvantage: affected by outliers
ratio (height, age): yes
interval (temp): yes
ordinal (tofu): maybe, possible mathematically, but you shouldn’t
nominal (shirt): no
median
as it relates to variables
advantage: outlier insensitive
ratio (age, height): yes
interval (temp): yes
ordinal (tofu): yes
nominal (shirt): no
mode
as it relates to variables
ratio (age, height): yes
interval (temp): yes
ordinal (tofu): yes
nominal (shirt): yes
measures of central tendency
mean, median, mode
measures of variability/spread
describes the manner in which data are scattered around a specific value (such as the mean)
range interquartile range standard deviation standard error of the mean percentile
range
definition + as it relates to variables
highest value to lowest value
ratio (age, heigh)t: yes
interval (temp): yes
ordinal (tofu): yes
nominal (shirt): no
interquartile range
definition + as it relates to variables
refers to the upper and lower boundary defining the middle percent of observations
75th percentile-25th percentile
commonly used- 90th percentile-10th percentile
ratio (age, height): yes
interval (temp): yes
ordinal (tofu): yes
nominal (shirt): no
standard deviation
definition + as it relates to variables
measure of variability
how much people/subject differ from the the average (mean)
ratio (age, height): yes
interval (temp): yes
ordinal (tofu): maybe (we can, but we shouldn’t)
nominal (shirt): no
standard error the of the mean
definition + as it relates to variables
how well does the mean represent the sample
error of the mean gets smaller as the sample gets bigger
describes the amount of variability in the measurement of the population mean from several different samples
ratio (age, height): yes
interval (temp): yes
ordinal (tofu): maybe (we can, but we shouldn’t)
nominal (shirt): no
inferential statistics
trying to reach conclusion that extend beyond the immediate data alone
Null hypothesis
There is no difference
T test
simplest test for difference between 2 groups
the greater the magnitude of “t”, the more likely the groups are different (statistically different)
Reasons research may not be valid
bias
chance
confounders
chance
caused by random variations in subjects and measurements
larger sample size will reduce chance errors
bias
systematic variation
larger sample size WILL NOT help
Types of bias
selection bias
measurement bias
analysis bias
selection bias
biased sampling of population
measurement bias
systematic bias-poor measurement technique
Spanish vs Portuguese men height
analysis bias
using analysis that favors one conclusion over another
“torture the data until you get the conclusion that you want”
confounding
similar to bias
misinterpretation of accurate variables
occurs when an investigator falsely concludes that a particular exposure is causally related to a disease without adjusting for other factors that are known risk factors for the disease and are associated with the exposure.
POEM
Patient Oriented Evidence that Matters
What patient’s really care about: mortality and morbidity
DOE
Disease oriented evidence
The stuff that patients don’t care about, but is related to disease
blood pressure, cholesterol, blood glucose
percentile
percentage of a distribution that is below a specific value
i.e. a child in the 80th percentile for height if only 20% of children of the same age are taller than he is
experimental study
researcher assigns exposure
can’t assign BAD exposures usually
randomized controlled trial
experimental study
assignment to exposure is determined purely by chance
(allocation is random)
usually double blind, has controls
randomizing helps (but does not guarantee) to get rid of confounding and bias
observational study
researcher did not assign exposure
cohort study
observational study
subjects with an exposure of interest (i.e. HTN) and subjects without the exposure are identified and then followed forward in time to determine outcomes (i.e. stroke)
exposure—–>outcome
disadvantage: longitudinal study - take a long time
i. e. Framingham
case-control study
observational study
first identified a group of subjects with a certain disease and a control group without the disease, and then look back in time to find exposure to risk factors for the disease
advantages: wells suited for rare diseases, doesn’t take a long time
outcome—–>exposure
disadvantage: much more likely to have biases because it’s hard to recruit a bunch of controls who are just like your cases, except they don’t have the disease
cross-sectional study
observational study
examines presence or absence of a disease or presence or absence of an exposure at a particular time.
disadvantage: Since exposure and outcome are ascertained at the same time, it is often unclear if the exposure preceded the outcome.
case report or case series
Descriptive study
reports on a single or a series of patients with a certain disease.
disadvantage: usually generates a hypothesis but cannot test a hypothesis because it does not include an appropriate comparison group.
measures of frequency of events
incidence
incidence rate
prevalence
incidence
number of NEW events that occur during a specified period of time in a population at risk for develop the events
new events per unit of time
incidence rate
incidence that reports the number of new events that occur over the sum of time individuals in the population were at risk for having the event (i.e. events/person-years).
new cases per year (or other time frame) per population
prevalence
number of persons in the population affected by a disease at a specific time/number of persons in the population at that time
cummulative incidences (when someone dies, they fall out of the prevalence pool)
How close the average of measured values are to the true value
accuracy
how close measured values are to each other
precision
standard deviation is a measure of precision! not accuracy
%error
100% * (measured value - “true” value) / “true value”
population
group from which data is to be collected
sample
subset of a population
1 in —> ? cm
2.54 cm
peta (P)
10^15
1E+15
quadrillion
tera (T)
10^12
1E+12
trillion
giga (G)
10^9
1E+9
billion
mega (M)
10^6
1E+6
million
kilo (k)
10^3
1000
thousand
hecto (h)
10^2
100
hundred
deca (da)
10^1
10
ten
deci (d)
10^-1
0.1
tenth
centi (c)
10^-2
.01
hundredth
milli (m)
10^-3
.001
thousandth
micro (μ)
10^-6
1E-6
millionth
nano (n)
1^10-9
1E-9
billionth
pico (p)
1^10-12
trillionth
exact number
number NOT obtained using a measuring device
easily countable, absolutely no question of value
small number
can be reproducibly determined by counting
How can we improve accuracy?
making replicate measurements and taking the average
How can we improve precision?
careful lab technique and/or using instruments capable of yielding greater precision
measures of association
relative risk odds ration absolute risk attributable risk population attributable risk NNT (number needed to treat)
Relative risk
ratio of the incidence of disease in the exposed group divided by the corresponding incidence of disease in the unexposed group
used in cohort studies
RR–> across rows on chart
Odds ratio
odds of exposure in the group with disease divided by the odds of exposure in the control group
used in case control studies
OR - down columns on chart
Number needed to treat (NNT)
number of patients who would need to be treated to prevent one adverse outcome
considers cost effectiveness
considers what is being cured
absolute risk
relative risk and odds ratio provide a measure of risk compared with a standard
However, 40% increase in risk of heart disease because of a particular exposure does not provide insight into the likelihood that exposure is an individual patient will lead to heart disease.
attributable risk or risk difference
measure of absolute risk
difference between the incidence rates in the exposed and non exposed groups
population attributable risk
describes the excess rate of disease in the total study population of exposed and non exposed individuals that is attributable to the exposure
calculated by multiplying the attributable risk by the proportion fo exposed individuals in the population
measures of diagnostic test accuracy
sensitivity
specificity
positive predictive value
negative predictive value
positive predictive value
probability of disease in a patient with a positive test
negative predictive value
probability that the patient does not have disease if he has a negative result
sensitivity
ability of the test to identify correctly those who have the disease
test with high sensitivity has few false negative results
sensitivity rules out, specificity rules in
specificity
ability of the test to identify correctly those who do not have the disease
high specificity has few false positive results
how specific is this test for this disease
sensitivity rules out, specificity rules in
Probability of incorrectly concluding there is a statistically significant difference in the population when none exists.
Type 1 error (alpha)
Probability of incorrectly concluding that there is no statistically significant difference in a population when one exists.
Type II error (beta)
Measure of the ability of a study to detect a true difference
Power
Confidence intervals
gives a range of values within which there is a high probability (95% by convention) that the true population value can be found
CI narrows as the # of observations increases or SD decreases
Kaplan-Meier Analysis
Survival analysis
ration of surviving subjects (those without an event)/total number of subjects at risk for the event