General Biostats Flashcards
Sensitivity is best described as:
True Positive
False negative probability can be calculated from sensitivity by:
1-sensitivity
Specificity is best described as:
True negative.
The false positive probability can be calculated by
1-specificity
Define Recall Bias
Cases are more likely to recall exposures than controls.
relative risk
RR = experimental event rate / control event rate
A/(A+B) / C/(C+D)
Regression to the mean
Tendency for extreme values of a variable to fall closer to the group mean when retested
Hawthorne Effect
The tendency of subjects to act differently because they are being studied.
Why use person years?
Accounts for variable follow up periods and number under observation.
___ percent of the population falls within 1 standard deviation on the curve.
68%
2 standard deviations contain
95% of the population
3 standard deviations?
1%
chi-square test
- analyze categorical data
- no cell has less than 1
- no more than 20% of cells have less than five
fischer’s exact test
same as for chi-sq but w/ SMALL sample size
McNemars test
- paired dichotomous data without fractions
student t test
- independent continuous variables
Differential misclassification is___
Not random
Differential misclassification cause an ____of an association.
overestimate
Non-differential classification is ____
random
Non-differential misclassification causes an ___ of an association.
under estimation
Selection bias
systematic errors in the way subjects are included in a study.
Odds ratio is good for _____ conditions.
Rare
With rare conditions, the OR approaches__
RR
The lower the confidence, the ___ the confidence interval.
narrower
Increasing the sample size ____ the confidence interval
Decreases
The probability that two independent events will occur is:
p(A) + p(B) - p(A+B)
Case control studies are best suited for ___ conditions.
Rare
Length bias
Rapidly progressing cancer are less likely to be detected when asymptomatic.
Lead time bias
screening gives the appearance that the person diagnosed earlier survived longer. Even when survival is to same age
Information bias
systematic difference in the way data is collected
Surveillance bias
over detection of disease because one population is monitor more closely
Positive predictive value
Measures how often the test is right when positive or negative
Positive predictive value
TP/(TP+FP)
Negative predictive value
TN/(TN+FN)
You cannot calculate predictive value of a test without__
prevalence
Sensitivity and specificity ___ depend on prevalence
do not
Decreasing prevalence causes _____ in positive predictive value
decrease
Decreasing prevalence causes ____in negative predictive value
increase
Correlation studies
compare disease frequencies among entire populations
Attributable risk
Ie - Iue
Attributable risk percentage
(Ie-Iue)/Ie
Population attributable risk
Attributable risk X Prevalence
Intent to treat analysis
The inclusion of all subjects randomized to drug and placebo.
Cross-sectional study
study of a population at one point in time
Prevalence
of cases/population at risk
categorized data
ie. ill vs not ill
Power ___ not important with significant results.
is not
secondary attack rate
of new cases among those exposed to index case/# of all those exposed to index case
vaccine efficacy
(Iuv - Iv)/Iuv
In decision analysis, utilities refers to
the relative values placed on outcomes by patients
A method of controlling for confounding after data collection is
stratification and multivariate analysis
A bar chart is useful for
depicting the frequency of nominal or ordinal data
A pie chart is useful for
depicting frequency of categorical data
A frequency polygon is useful for
illustrating frequency distributions for discrete or continuous data. Overlapping.
Line graphs are useful for
presenting continuous data over time.
A curve skewed to the left has a mean ___ than the median
Less, closer to the tail,or to the left.
A curve skewed to the right has a mean ____to the median
Right, or greater than the median. Again, closer to the tail.
Curves are skewed to the
tail
Is there a test 100% sensitive or specific?
No.
The x axis is Receiver-operator curves is
1-specificity.
If alpha is set to .10, then the confidence interval is
90%
Linear regression examines
the association between two continuous variables.
Multiple regression is used to
examine the relationship between multiple dependent variables and the independent variable
Logistic regression is used
when the dependent variable is is dichotomous.
Correlation analysis is used
to determine a linear relationship between continuous variables in correlation studies of populations.
Which gives a better picture of an intervention’s impact: ARR or RRR? Why?
AR. RR depends on mortality in control group. Low mortality in a control group means little improvement with intervention, even if RRR is large.
Numbers needed to treat =
1/ARR
If the crude and adjusted rates are the same, the stratification variable is
not a confounder.
The Kappa statistic is used
to assess interrater/intrarater reliability.
Wilcoxon rank sum tests, Wilcoxon signed rank test, and signed tests are used to
study non normally distributed sample sizes
Kaplan-Meier Survival Curves
Estimates the probability of survival each time an event occurs.
Advantage of Kaplan Meier curve:
Takes into account “censored data” which occurs if a patient withdraws before the outcome is reached.
Log rank test
Compares the survival distributions of two samples. Non parametric.
Hazard ratio
Risk of an event an instantaneous point in time between two groups.
Cox proportional hazards model
Multivariate survival analysis used in studies in which participants are followed for unequal amounts of time.
Cox proportional hazards model relies on:
the assumption that the proportional effect of study factors does not change over time
Why are randomized clustered samples not as good as individual randomization?
Similarities of individuals in the clusters reduces variability, there by reducing the power.
Clustered data results when:
preexisting group structure is used to select participants, but the researcher is interested in individual level data
intercluster correlation coeffiient (rho) is a measure of
Relatedness of clustered data, can range from 0-1, 1 is all the same.
Nominal Data
No intrinsic order. Ie Race.
Sensitivity
Positive in disease
Specificity
Negative in health.
Ordinal
Have an order, but no set numerical relationship between values. “Worse, same..”
Interval
Ordered with numerical units, but no actual zero. Dividing them does not make sense. Ie one date of birth cannot be twice another date of birth.
Ratio scales
Order, numerical, and a real zero. Can divide them. Kelvin.
Sensitivity
Chances of test being positive in those with disease.
Specificity
Chances of test being negative in those without disease.
Calculate TP:
Sensitivity x prevalence
Calculate TN
Specificity x (1-prevalence)
Calculate false positive
(1-specificity)x(1-prevalence)
Calculate false negative
(1-sensitivity)x prevalence
In a normal distribution, the mean, median, and mode are?
The same
What happens to PPV as prevalence increases.
It increases.