Biostatistics Flashcards
Lecture objectives
1) review sampling, variables, and basic descriptive statistics (including measures of location and measures of spread)
2) Review the statistical distributions that are commonly used in biostatistics
3) Understand the application and usefulness of the t-test and the Chi-squared test
4) understand the usefulness of the 2x2 contingency table and how to derive various epidemiological study measures of association from its application
Lecture objectives part 2
5) Understand the usefulness and application of the 2x2 contingency table as a means to derive sensitivity, specificity, and related measures
6) Review scatter plots and their relation to the correlation coefficient
7) Understand the basics of simple linear regression, multiple linear regression, logistic regression and meta-analysis
8) Understand how to read and interpret a multiple linear regression table
Who was the 1st African American professional to practice in Memphis/”hero of the yellow fever epidemic”?
Dr. R H Tate
What was used as a yellow fever hospital?
peabody hotel
Why do I need to know this stuff?
prevalent in research
What is population?
collection of persons or things to which we want to generalize a set of findings; largest collection of persons to which we have an interest at a particular time
What is sample?
part of population; smaller collection of persons or things from a population used to determine generalities about the population of persons or things
What is variable?
a characteristic that takes on different values in different persons, place, or things
What are variable descriptors?
numeric, categorical, dichotomous
What is a numeric variable?
a variable that has values that describe a measurable quantity as a number
What are the 2 categories of numeric variables?
discrete and continuous
What is discrete?
a numeric variable that can only take on certain values and is characterized by gaps or interruptions in the values that the variable can assume, usually integer numbers ex: pts in a day, # of meds
What is continuous?
a numeric variable that can technically be measured with unlimited precision and that is not characterized by gaps in values that the variable could assume, ex: IOP
What is a categorical variable?
a variable that is made up of groups of objects and that names distinct entities
What are two categories of categorical variables?
ordered and unordered
What is ordered? aka ordinal
a categorical with a value variable that can take on a logical order, sequence or rank ex: exercise
What is unordered? aka nomial
a categorical variable with a value that is not able to be organized in a logical order, sequence or rank ex: iris color
What is dichotomous?
a variable that consists of only two categories ex: diabetic or not diabetic
What is independent variable?
the variable that is manipulated by the experimenter and that does not depend on any other variables aka predictor variable
What is dependent variable?
the variable that is not manipulated by the experimenter and that does depend on the other variable aka outcome variable
What are descriptive statistics/measures of location?
mean, median, mode
What are descriptive statistics/measures of spread?
range, variance, standard deviation
What is the normal IOP and the mean IOP?
normal 10-21 and mean 15.5 mmHg
What is the standard deviation for IOP?
2.75 mmHg
What percent of the population falls within 1 SD?
68%
What percent of the population falls within 2 standard deviations of the mean?
95%
What percent of the population falls within 3 SDs of the mean?
99%
What are noteworthy distribution examples?
normal and t (there are many many distributions)
What is normal distribution?
symmetrical with a central peak “bell curve”; defined soley and completely by the mean and the standard deviation/variance
What is t distribution?
similar in appearance to normal distribution; utilizes degrees of freedom (distribution changes with number of degrees of freedom)
The smaller the degree of freedom…
the lower the peak and the higher the tail
Where does a t distribution approach normal distribution?
approaches normal distribution with degrees of freedom greater than 30
What allows us to make inferences based on small sample sizes?
t distribution
What are “other” distributions?
chi-square, binomial, poisson
What are “other” distributions?
chi-square, binomial, poisson
What is the P-value?
describes the likelihood of observing certain data given that the null hypothesis is true
If the p value is larger than the pre-determined criteria, then we…
do not have evidence to reject the null hypothesis (aka the data is consistent with the null hypothesis)
What is p-value usually set at?
0.05 aka 5% (2 SDs)
A p-value is the probability of an observation…
arising by chance
A p-value is the probability of an observation…
arising by chance
What is the t-test used for?
to test whether two group means are different
If p value of trial is higher than chosen p value…
you cannot reject null hypothesis
If p value of trial is lower than chosen p value…
you can reject the null hypothesis
What p value is more conservatibe?
0.01
When is an independent t test used?
used when there are two experimental conditions w/ different participants assigned to each condition
What does an independent t test show?
establishes whether two means collected from independent samples differ significantly
What are other names for independent t test?
independent measures or independent samples t test
When is a dependent t test used?
used when there are two experimental conditions w/ same participants assigned to each condition
What does a dependent t test establish?
whether two means collected from the same sample differ significantly
What are other names for dependent t test?
matched pairs or paired samples t test
What are 2x2 contingency tables?
cumulative incidence, relative risk, odds, odds ratio, chi squared test for independence, attributable risk, population attributable risk
What is relative risk?
aka risk ratio RR, compares the risk of a health even (disease, injury, risk factor or death) among one group with the risk among another group
What is odds ratio?
OR compares the odds of a health event (disease, injury, risk factor, or death) among one group with odds among another group
What general 2 things is a 2x2 contingency table comparing?
exposures and outcomes
What are the two most widely used measures of association in epidemiology?
relative risk and odds ratio
What measure of association does a cohort study use?
relative risk
What measure of association does a case-control study use?
odds ratio assuming incidence is not known
T/F the odds ratio always underestimates the relative risk
false, odds ratio always overestimates RR – overestimation is greatest when the outcome is common
When may relative risk and odds ratio be close/similar?
when the outcome is rare
What is a chi-squared test for independence?
tests the association between categorical variables using chi-squared distribution
What is the cumulative incidence in the exposed?
a/ (a+b)
What is the cumulative incidence in the unexposed?
c/(c+d)
What is the relative risk for the outcome?
(a/(a+b))/
(c/(c+d)) aka cumulative incidence of exposed of unexposed
What is the odds in the exposed?
a/b
What is the odds in the unexposed?
c/d
What is the odds ratio?
ad/bc aka cross multiplication of odds
What do you do with the chi-squared “statistic”?
identify P value from table
If P value is less than .05 what happens?
reject the null hypothesis
Type I error
BAD, occurs when one rejects the Null hypothesis when the Null hypothesis is actually true aka rejection of a true null hypothesis
Optometry Type I error example
you conclude that a new glaucoma drug lowers IOP better than an old glaucoma drug, when in fact it does not
Type II error
occurs when one rejects the alternate hypothesis (fails to reject the null) when the alternative hypothesis is actually true aka not rejecting a false null hypothesis
Optometry Type II error example
you conclude that a new glaucoma drug does not lower IOP better than an old glaucoma drug when in fact it does
False positive VF
patient says it’s there but it’s not; field may look better than it actually is
False negative VF
patient says it’s not there but it is
Sensitivity
the proportion of subject with the target condition who have a positive test result aka true positive/ (true positive + false negative)
Specificity
the proportion of subjects without the target condition who have a negative result aka true negative/ (true negative + false positive)
Positive predictive value
the proportion of subjects who test positive who actually have the target condition aka true positive/ (true positive + false positive)
Correlation coefficient
a summary value used to assess the strength of the correlation between two continuous variables
What is the most commonly used correlation coefficient?
Pearson’s correlation coefficient “r”
What does a higher correlation mean?
two variables are changing together
Review scatter plots for various R values
1.00 straight positive line
What does a larger value of r mean?
stronger correlation
T/F correlation = causation
false, correlation does not equal causation
What is simple regression?
a linear model in which one outcome is predicted from a single predictor variable (an expansion of the correlation coefficient)
What is the equation of a line?
y=mx +b
In the equation of a line, what is y?
dependent variable
In the equation of a line, what is x?
independent variable
What is multiple regression?
a linear model in which one outcome is predicted from two or more predictor variables (expansion of simple regression)
Constant
the value of the dependent variable in a regression equation when its associated independent variable equal zero aka baseline levels
What is the constant graphically?
the y-intercept, the point at which the regression line crosses the y-axis
Beta-coefficient
the degree of change in the dependent variable for every 1-unit of change in a particular independent variable
Example of beta-coefficient b1=0.2001
this means a one unit increase in x is associated with a 0.2001 unit increase in y
Coefficient P value
tells us whether or not an independent variable is statistically significant
R^2 coefficient of determination
a way to measure how well linear regression line fits the data; the proportion of the variance in the dependent variable that can be explained by the dependent variables
What does a coefficient of determination range between?
0 to 1, 0 indicates the response variable cannot be explained by the predictor variable at all
Standard error
measures how well the linear regression line fits the data, the average distance that the observed values fall from the regression line
What does a smaller standard error mean?
the model fits the data better
What is useful for calculating the p-value and the confidence interval for its corresponding coefficient?
standard error
Logistic regression
no linear relationship between x and y (or x and probability)
What does a logistic regression model?
log (odds), scale is linear
What does the formula for logistic regression do?
use formula to calculate the probability that a given observation/dependent/independent variable relationship takes on a value of 1; formula predicts the log odds of the dependent variable taking on a value of 1; then use a predetermined probability threshold to classify the given observation/dependent/independent variable relationship as either 1 or 0
Continuous output use
linear regression
When is logistic regression popular?
in epidemiology because odds ratio is the natural parameter estimated in a case control study
Categorical output use
logistic regression
Meta-analysis when and why
used to combine results from different studies to see if overall effect is significant, makes the equivalent of one large study, often used when there are multiple studies with conflicting results
Meta-analysis how
decide which studies to include and exclude using objective criteria, find all the studies on the subject, extract the required info, do the meta-analysis statistic, interpret the results