Biostats. Flashcards
labels/names
nominal data
data with only 2 outcomes
ex: yes or no
dichotomous data
data that consists of names, labels or other nonnumerical data
categorial data
uses labels in an order
ex: poor, fair, excellent
ordinal data
data that can take any value
ex: numbers
continuous data
values that are equally spaced
ex: age
interval data
values that has an actual zero point
ex: blood alcohol level
ratio data
central tendencies of continuous data is measured with….
mean, median, mode
variation/spread of the data
dispersion
with a normal [Gaussian Distribution], how does mean/median/mode relate?
mean = mode = median
If data has a right/positive skew [tail is to the right], what does this mean?
Mean > Median
If data has a left/negative skew [tail is to the left], what does this mean?
Mean < Median
Measures of dispersion
range, variance, standard deviation
value below the point where a particular percent of scores or observations fall
percentiles
what does 95th percentile mean?
95% of values are below this number
data from the 25th to 75th percentiles
interquartile range
why is interquartile range used?
helps to ignore outliers
calculates on average how far the mean is from other data points
variance
square root of variance
larger = more spread out
standard deviation
for skewed distributions what is the best ways to evaluate the data?
Median, range, interquartile range
For normal distributions what is the best ways to evaluate the data?
mean and standard deviation
with a normal distribution, how much of the data data should be within 1 SD of the mean?
Within 2 SD of the mean?
1: 68.3%
2: 95%
process of using data obtained from a sample to make estimates about the characteristics of a population
statistical interference
what is the basis of statistical interference?
random sampling
error that is due to chance and is not standardized
random error
large number repeated sampling = normal distribution
central limit theorem
standard deviation of a sampling distribution
standard error
what effect on standard error does a larger sample size have?
larger sample size = smaller standard error
95% of the sample menas should be within how many units of standard error?
1.96
determine how close the sample relates to the actual population - Are 95% of the samples within 1.96 SE units from the mean?
confidence intervals
there is no difference in the outcome between variable groups
null hypothesis
there is a difference in the outcome between variable groups
alternative hypothesis
when the null hypothesis is true and you reject it
“you say there is a difference but there isn’t”
false positive
Type I Error
the probability of making a Type I error
typically 0.05
alpha error
failing to reject the null hypothesis when there is a difference between groups
false negative
Type II Error/Beta Error
probability of correctly rejecting the null
1 - beta
power
the greater the ability to NOT make a Type II error….
the larger the power
what increases the power of a study?
increased sample size
larger effect size
decreased variability sample data
increased alpha
If p < alpha
reject the null hypothesis
there is statistical significance
If p > alpha
fail to reject the null
not statistically significant
compares in the menas of normally distributed continuous variables between two gorups;
determines tif the means of two groups shows significantly different distributions
T-test
what is the non-parametric version of a T-test?
Mann-Whitney U Test
1 way analysis of variance
compares distribution of continuous variables among more than 2 independent groups
ANOVA
Problem with ANOVA?
can determine there is statistical difference among groups but cannot tell which group is different
Nonparametric version of ANOVa
Kruskal Wallis Test
compares ranks between groups rather than means
uses the h-statistic
Kruskal Wallis Test
T-test performed on a repeated measures two-group designs
same thing measured on patient at two different ties - pre and post assessments
paried T-test
what data is used with paired T-test?
dependent
normally distributed
continuous variables
what statistical test do you used with dependent but ordinal data?
Wilcoxan Matched-Pairs Signed Ranks Test
scatterplots are an effective way to convey info for
2 continuous variables
this assesses linear relationships between 2 continuous variables
Pearson correlation coefficient
(+) Pearson correlation coefficient
positive linear relationships
as x increase y increases
in Pearson correlation coefficient, the closer r is to -1 or +1
the stronger the relationship
(-) Pearson correlation coefficient
negative linear relationship
as x increases y decreases
Pearson correlation coefficient = 0
no linear relationship
compares association between rankings of 2 variables (non-parametric)
rho
spearman correlation coefficient
(observed value - expected)^2 / expected value
Chi-square analysis
if any value for Chi-square test < 5, what test should you then do?
Fisher Test
Used to check difference between to or more percentages or proportions of categorical outcomes
chi-square test
relationship between two variables that are due to the presence of unmeasured variables
confounding
what ways can you account for confounding?
stratified analysis
multivariable analysis
measure of the relationship between two continuous variables - represented by scatterplots
correlation
formulae that forms a line - used to quantify a change in y based on a change in x
linear regression
3 ways to measure confounding
multiple linear regression
logistic regression
proportional hazards modeling
This looks at relationship between multiple independent variables and a single dependent variable
Multiple linear regression
amount of variance in the dependent variable that is predicted from the independent variable
R^2
The closer R^2 is to 1 ….
the better the model
same thing as linear regression for multiple continuous and/or categorical variables - dichotomous outcome
logistic regression
likelihood that an outcome will occur based on changes in the variables
odd ratio
in logic regression you evaluate the beta coefficient AND ___ for each independent variable
odd ratio
this looks at the relationship between multiple variables and the TIME to an event
How long does it take to get a certain outcome?
Cox proportional hazards analysis
independent variables in Cox proportional hazards analysis can be either
continous or categorical
In Cox you evaluate the beta coefficient AND ___ for each independent variable
hazard ratio
compares the probability of an event occuring over a given time
chance of an event occurring the treatment arm/chance in the control arm
hazard ratio
variation that occurs due to change with random sampling - affects the study and control equally
random error
error disproportionately affects one group
bias
bias that is introduced in the way in which participants are assigned to groups
selection bias
error that is due to differences in the way data is collected
measurement bias
participants do not accurately recall information
recall bias
Proper study design can control
confounding and limit bias
probability is the expression of ___
= # of times an event occurs/total # of opportunities for occurrence
risk
ratio of probability an event occurs vs probability that anything else occurs
odds
multiplication law of probability
prob of A + B = (prob of A)*(prob of B)
addition law of probability
prob of A or B = (prob of A) + (prob of B) - (prob of (A+B))
focuses on describing the distribution of health conditions
descriptive epidemiology
compares groups to test hypotheses regarding potential causes and contributing facotrs
analytic epidemiology
existing cases of a disease in a given population
= # with disease/total population
prevalence
number of people in a population who have a disease over a given time period
period prevalence
3 determinants of prevalence
incidence of disease
duration of disease
entry/exist of cases
in a stable population what is the formula for prevalence
incidence * duration
number of new cases of disease in a given population during a specific time period
incidence
proportion of at risk people who get the disease
attack rate
proportion of people diagnosed with a given condition who die due to that condition
case fatality ratio
proportion of all deaths in a given time period that are due to a specific condition
proportionate mortality
of new events that occur during a time period/average population at risk
rate
the number of deaths per 1000
mortality rate
(total # of deaths/mid interval population) * 1000
crude mortality rate
types of observational epidemiological studues
cross sectional study
cohort study
case control study
types of experimental epidemiological studies
random control trial
describes prevalence of potential risk factors (exposures) and conditions (outcomes)
cross sectional study
strengths of cross sectional study
quick and not costly; usefull for developing hypotheses
limits of cross sectional study
can’t determine relationshipd between variables
late look bias (cases of disease are longer duration)
group of people are followed over time to monitor the development of disease or health condition
can be prospective or retrospective
cohort study
strengths of cohort study
useful for rare exposure
can study multiple outcomes
can measure risk of outcome in group
limits of cohort study
time and cost requirements
not as good for rare outcomes
loss follow up
ratio of incidence of disease in exposed persons to the risk in unexposed
relative risk
relative risk = 1
no difference in groups
relative risk > 1
exposed group as greater risk
relative risk < 1
exposed group has less risk
measure of how much disease is actually attributable to the risk factor
attributable risk
participants with a disease are compared to participants without a disease
case control study
strengths of case control study
useful for rare outcomes
able to study multiple exposures
less time and cost
limiations of case control study
not as good for rare exposures
potential for recall bias
challenges in selecting control
only ESTIMATE risk
estimates relative risk when disease is relatively uncommon
odds ratio
additional amount of disease that is present in a population because of the presence of a risk factor
population attributable risk
randomization minimizes
confounding
blinding minimizes
selection and measurement bias
of patients who need to receive treatment to prevent one event occuring
number needed to treat (NNT)