Biostatistics Flashcards
Continuous data
data with numeric values
-age, weight, height, A1c, etc
Categorical data
data with categorical values
-gender, race, exposure/disease status
Normal Distribution
symmetrical around the mean, bell shaped
normal distribution: -1σ to +1σ
68%
normal distribution: -2σ to +2σ
95%
normal distribution: -3σ to +3σ
99.7%
Bimodal distribution
suggestive of two different population
right skewed distribution
mode
left skewed distribution
mean>median>mode
Descriptive statistics for continuous data
measures of central tendency: mean, median, mode
measures of dispersion: variance, SD
graphic representation: histogram, box plot, line graph
Descriptive statistics for categorical data
frequency
proportion: rate, ration, prevalence, incidence rate, relative risk, odds ratio, sensitivity, specificity
graphic: pie chart, bar graph
Measures of central tendency
Mean: sum of values/total # of values
Median: value in middle of a ranked data
Mode: value that occurs most often
Measures of dispersion
Variance: (sum of square of deviance from mean) divided by (total number of variable - 1)
SD: square root of variance
Standard error of mean: SD / (square root of total number of values)
Inferential statistics
to make an inference for a population group from a sample group
Population group
complete collection to be studied
Sample group
Part of the population of interest selected for study
Goal of Statistical Hypothesis Testing (SHT)
make decisions about a population from a sample
Procedure of SHT
- state null hypothesis
- state alternative hypothesis
3 select level of significance - collect and summarize the sample data into a statistic
- refer to a criterion for evaluating the sample evidence producing p-value
- make a decision to reject/retrain null hypothesis based on p-value
Significance level
standard defined by probability of rejecting a true null hypothesis (false positive)
P value
quantifies how consistent your sample statistics are with the null hypothesis
high p value
same results are consistent with null hypothesis that is true
low p value
your sample results are not consistent with a null hypothesis which is true
Type I error
false positive
Type II error
false negative
Power
true positive
confidence interval
quantifies the uncertainty in the estimates
the narrower interval implies higher precision with less variability
the wider interval implies lower precision with increased coverage
If CI contains a null value
it fails to reject null hypothesis
If CI does not contain null value
it rejects the null hypothesis
Student’s t-test
compares means of 2 independent groups
ANOVA
compare means of 3 or more independent groups
Chi-square test
compare the proportions/ratios between independent groups