Final Exam Flashcards
evidence based medicine
quality revolution in healthcare
Demings philosophy
quality is about people, not products
Deming facts
- didn’t believe in quotas
- worked for US Census and Western Electrical
- improved manufacturing quality during wartimes
kaizan
quality improvement requires teamwork, open communication and problem solving
nelson data to wisdom continuum
organizing data so that it can provide new insights and information
history of probability
basically people aren’t good at understanding probabilty
uniform distribution
block, each score is equally as likely
probability distributions
allows you to distribute possible outcomes and which is most common
normal distributions
bell curve, rare events are the tails
exponential distributions
rare events
availability bias
linking an event to something that happened in our past
Monty Hall problem
odds of winning go from 1/3 to 2/3 when you switch
categorical measurements
put observations into named categories (HIV status, gender)
ordinal measurements
categories that can be put in rank order (cancer stage, smoking)
quantitative measurements
numerical values that can be put on a number line (age, weight, BMI)
observation
unit upon which a measurement is made (ie. a person/row)
variable
thing we measure (ie. ID or age/column)
value
realized measurement for a variable (ie. age=27/cell)
objectivity
not making data conform to a preconceived worldview
reliability
ability to collect the same values for variables repeatedly (how close the darts are to each other)
validity
how truthful the data is (darts hitting the bullseye)
internal validity
truth within a study
external validity
if results can apply beyond the study
incidence
new cases in a population over a defined period
prevalence
total number of cases at a given point in time
non-experimental vs experimental
experimental assigns subjects to groups according to explanatory variables
case-control
subjects with a certain disease are matched to a similar group without the disease
cohort
two groups (1 exposed and 1 non-exposed) are followed to compare rates of new cases
James Lind Scurvy trials
treatment for scurvy, 6 different treatment plans, example of an RCT
RCT
group of individuals with the same condition and assigning them to interventions or control
convenience sampling
worst kind of sampling, usually biased, sampling whoever is around
power of sampling
can be effectively used a number of ways
frequency distributions
check distributions for outliers, errors, normal distribution, and if any can be combined
symmetry
balance in the pattern
modality
number of peaks
kurtosis
width of tails
departures
outliers, they skew data
positive skew
right tail is longer
negative skew
left tail is longer
mean
gravitational center
median
middle value
mode
value with the highest recurrence
range
spread of data (maximum-minimum)
frequency table
list all data values and frequency count
sample vs population mean
usually use a sample population mean to estimate the population mean
quartiles
divides data into 4 equal groups
variance
how spread out data is around the mean
standard deviation
spread of data around the mean
random variables
number that has different values depending on chance
population
set of all possible values for a random variable
event
outcome/set of outcomes
probabilities
proportion of times an event may occur in a population
discrete random variables
countable set of possible outcomes
continuous random variables
unbroken continuum of possible outcomes
probability mass function (pmf)
assigns probabilities to all possible outcomes for a discrete random variable
area under the curve
probability, adds up to 1
cumulative probability
probability of said value or less
probability density function (pdf)
assigns probabilities to all possible outcomes for a continuous random variables
binomial random variable
discrete random variable with only 2 outcomes
normal random variable
most common type of continuous random variable (ie. height, weight, systolic bp)
68-95-99.7 rule
68% of data is within u+-o and so on…
SEM equation
SEx=s/square root of n
statistical inference
how you generalize from the particular to the general
central limit theorem
sampling distribution of x-bar tends towards normality
z-scores
gives you the p-value
null hypothesis
no difference/association
significance levels
p-values and whether you should reject Ho
one-sided vs two-sided
one-sided looks for values larger than the null, two-sided is for when you don’t know the direction of the alternative
point estimation
single best estimate of a parameter
confidence intervals
type of interval estimation
interval estimation
surrounds point estimate with margin of error
family of t-distributions
like a z-distribution but with more df and more uncertainty
df
degrees of freedom that allow tails to be skinnier or broader
relationship between df and distributions
df increases ->t-tails get skinnier->t becomes more like z
paired data
get data from 2 groups and compare
paired t-test
each point matches another in a different sample
conditions for inference (using a t-test)
simple random sample, valid information, normal population, large sample
normality condition for using a t-test
normality applies to the sampling distribution of the mean, not the population
single sample t-test
one group, comparisons are made to an external population
independent 2-sample t-test
two separate groups with no pairing, compare the separate groups
Levene’s test
tests that the variances are equal (thats the null), f-test to determine pooled variance
ANOVA
analysis of variance
statistics used to compare 3+ means
use ANOVA, with a continuous variable
family-wise error rate
probability of making a type 1 error
variability between groups (MSB)
mean square between, variability between groups of means around the grand mean
variability within groups (MSW)
mean square within, average amount of variation within groups
post hoc comparisons
only if you accept the alternative hypothesis, tells you which of the means differ
LSD vs. Bonferroni
bonferroni is more conservative
if the interval doesn’t include 0…
it is statistically significant
homoscedasticity
equal in variance
heteroscedasticity
unequal in variance
correlation
determines if there’s significant association
r
linter relationship between -1 and 1
r^2
coefficient of determination, variance in Y explained by X
what affects correlation
confounding, outliers, non-linear relationships
residuals
distance from data point to the line
dummy variables
giving categorical variables to independent variables in a multiple regressions, k-1
binary response variable
categorical variable with 2 responses
chi-square test
two categorical variables, compare expected to observed, the more difference there is the more association there is
Mantel Haenszel test
using another categorical to split data up more, need a large sample size
assumptions for parametric tests
normal distributions, large sample sized, quantitative data
assumptions for non-parametric tests
doesn’t assume normality, observations are independent
advantages of non-parametric
use on non-normal data, small sample size, easier to apply
disadvantages of non-parametric
loss of info, harder to reject null, decreased statistical power
general rules for non-parametric
use parametric whenever possible
univariate vs multivariate
testing the relationship between two variables vs testing the relationship of multiple variables
outcome variable
event time, dependent, variable in question
survival analysis
time to some event (ie. death, infection, hospitalization) need to define the outcome variable
logistic regression
multiple regression with a binary outcome variable (ie. age vs diabetes diagnosis)
fitted model
actual model that contains outcome and explanatory
sample size rules (for general regression model)
n>30, 1 variable per 30-50
Cox regression
survival analysis and logistic regression together, binary response
consecutive sampling
used a lot in healthcare, sampling people with characteristics you like, not purely random
simple random sampling
everyone has a known probability of being sampled, best kind of sampling
stratified random sampling
divides populations into groups so that each group has an equal chance of being included
systematic sampling
samples every nth individual
cluster sampling
sampling of a natural grouping
n
sample size
x
variable
xi
value of individual I for variable x
u
population mean
s
standar deviation
x-bar
sample mean
degrees of freedom calculation
Welsh method or conservative method
conservative post hoc
makes it more difficult to detect statistical differences among the means
Cox
determine what variables are most associated with outcome and time, consider multiple variables
undercoverage bias
some groups are left out ur underrepresented
volunteer bias
self-selected participants, tend to be atypical of the population
nonresponse bias
large percentage of individuals refuse to participate/cannot be contacted
where are proportions from?
categorical variable
chi-sqaures are non-parametric which means…
there is a decrease in statistical power and it compares ranked data
Pearson’s correlation is equivalent to…
spearman’s ranked something
correlation doesn’t mean
causation
key features of RCT
- randomizatoin
- control group for comparison
- blinding or masking
- ethics
what can multiple regression analysis help you do?
determine confounding