Biostats Flashcards
Tchebysheffs Theorem
For any value of k that is ≥ 1, at least 100(1 – 1/k2)% of the data will lie within k standard deviations of the mean. 100(1 – 1/12 )% = 0% of the data will lie within one standard deviation of the mean.
When performing a nonparametric Wilcoxon rank-sum test, the first step is to combine the data values in the two samples and assign a rank of ‘1’ to
the smallest observation
Contingency table test
(r-1)(c-1) (r means row, c means column)
Dichotomous variables
Only two possible responses
Used to classify participants (e.g. has/does not have attribute of interest)
Ordinal variables
Categorical, ordered variable
Nominal variables
Categorical, unordered variable
Continuous variables
Quantitative/measurement variables; unlimited responses
Standard deviation
Measures how far individual observations deviate from the average
Small = the observed values are close to the mean
Large = if the observed values vary widely around the sample mean
Sample variance
Average of squared deviations; not interpretable, therefore use sample std. deviation.
Sample standard deviation
Square root of sample variance
Interquartile range
Difference between 1st and 3rd quartiles
IQR = Q3 – Q1
Sensitivity
true positive fraction; the probability of a diseased person testing positive
Specificity
true negative fraction; the probability of a disease-free person testing negative
Z scores
Used when we cannot use the properties of a normal distribution
Converting to a z score means we are standardizing
Z score formula converts x values to a standard normal distribution: Z=x-μ/σ
Central Limit Theorem
Theorem states, as long as the distribution is sufficiently large (n ≥ 30), then the distribution of sample means is normal in spite of a normal or skewed population distribution
Two exceptions:
1. Results are normal for population, then results will be normal for sample means if sample is less than 30
2. If the outcome for the population is dichotomous and the results meet the following criteria: min [np, n (1 – p)] > 5
Standard error
Standard deviation of the sample means
Decreases as sample size increases
Variability in sample means is smaller for larger sample sizes (extreme values less likely to impact larger samples)
confidence interval
Range of values for a population parameter with a level of confidence attached. (e.g. 95% confidence =we are 95% confident that the interval contains the unknown parameter)
Confidence Interval estimates
General form: point estimate ± margin of error
Confidence level starts with point estimate then adds in a margin of error
Margin of error = Z*SE
Z value = Z score value from standard normal distribution based on confidence level (e.g. 90%, 95%, etc.)
SE = standard error of the point estimate (sampling variability)
Reflects the likelihood that the confidence interval contains the true, unknown parameter
Commonly used values are 90%, 95%, and 99% (Table 1 B in textbook)
Higher confidence levels = larger z values, therefore wider confidence intervals; (99% CI = wider range to account for greater variability to include unknown parameter)
0 = null value; if included in range, then results are not statistically significant
T distribution
Used for small samples (generally n