Biostatistics Flashcards
Descriptive statistics
the collection, organization, summarization, and analysis of data
Inferential staitistics
drawing inferences about a body of data when only a part of the data is the observed
population
defined by a sphere of interest
sample
subgroup or subset of the population
parameter
characteristics or measure obtained from a population
statistic
characteristics or measure obtained from a sample
We compute _____ and use them to estimate _____.
We compute statistics and use them to estimate parameters.
nominal scale
The lowest measurement scale.
Used for naming or labeling, not ordering.
Though numbers can be used, the relationship between the numbers are not meaningful.
Ex: Categorical and Dichotomous variables (Marital status, DL #, SSN)
ordinal scale
observations are ranked; level of differences between ranks is unknown
Ex: Low, Medium, High; Likert-type scale
interval scale
observations are ranked; level of differences between ranks is equal; scale is relative
No true zero point, so ratios are meaningless.
Ex: Temperature (F/C) or pH scales (0 does not equal absence of heat/acidity)
ratio scale
observations are ranked; level of differences between ranks is equal;
true zero point exist
Ex: height, length, Kelvin Temperature scale (defines 0K as absolute zero)
Measures of disease frequency
count, ratio, proportion, rate
count
of cases of a disease or other health condition;
Ex: dorm students with COVID-19
proportion
measure that states a count relative to the size of the group;
numerator/denominator
Ex: dorm students with COVID-19/all student
ratio
divide one number into another number
numerator does not have be a subset of denominator
Ex: dorm students with COVID-19/dorm students with flu
rate
similar to ratios and proportions, but includes a time components
Ex: % of dorm students with COVID-19 in 2020
Descriptive Study Examples
- case studies/reports
- cross-sectional studies
- ecological studies
Analytical Study Examples
- Case-control Studies
- cohort studies
- randomized control studies
Cohort Study
begin with a group of people who are disease free at baseline
Follow over time and classify on exposure; identify incident cases
MOA: Relative risk
Good for prevalent diseases
Case-Control Study
Compare Diseased (cases) to Disease free (controls)
Classify on disease status; collect exposure data retrospectively
MOA: Odds ratio
Good for rare disease
RR or OR = 1
no association between exposure and outcome
RR or OR > 1
exposure increases risk of the outcome
Positive (direct) association
RR or OR < 1
exposure decreases risk of the outcome
Negative (inverse) association
RR range
-1 to 1
When interpreting OR, begin with the _____
outcome
When interpreting RR, begin with the _____
exposure
Attributable risk
tells us how much of the disease that occurs can be attributed to a certain exposure
calculate among exposed individuals or an entire population
background risk
the risk of non-exposed people is not zero
Ex: some people who get lung cancer do not smoke
Attributable risk formula
(incidence in exposed) - (incidence in unexposed)
simple random sample
enumerate all members of the population N
select n individuals at random (each has the same probability of being selected)
systematic sampling
- start with sampling frame
- determine sampling interval (N/n)
- select first person at random from first (N/n) and every (N/n) thereafter.
Stratified sampling
organize population into mutually exclusive strata, select individuals at random within each stratum
binomial distribution
- models # of events out of n observations
- 2 possible outcomes: success or failure
- replications of process are independent
- P(success) is constant for each replication
normal distribution
m = mean s = standard deviation
mean = median = mode and are located at the center of the distribution (not skewed)
area under curve = probability of observation
2 statistical inference methods:
- Estimation
2. Hypothesis Testing
Estimation
sample statistics are used to generate estimates of the population parameter
Hypothesis Testing
Sample statistics are analyzed to either support or reject the hypothesis about the parameter.
Are statistics from different samples in the same population the same?
No, the sample mean of the second sample is likely to be different from the first sample mean.
sampling distribution
consists of multiple sample means
point estimate
the “best” single estimate of that parameter
confidence interval
range of plausible values for the population parameter; carries a level of confidence
confidence level
reflects the likelihood that the confidence interval contains the true, unknown parameter;
90%, 95%, and 99%
If we repeatedly generate similar Confidence Intervals for the same population, 95% of those intervals will cover the true parameter.
As Confidence Level _____, Confidence Interval _____.
As Confidence Level increases, Confidence Interval widens.
standard error
reflects the variability of the sampling distribution of the sample statistic
estimated standard error formula
s/ square root of n
s = sample std. dev. n = sample size
As sample size _____ , standard error _____ .
As sample size increases, standard error decreases.
Small samples have a lot of standard error
population standard deviation can be _____ by sample standard deviation.
replaced
The midpoint of the Confidence Interval is _____.
the mean
margin of error formula
Z * s/square root of n
s = sample std. dev. n = sample size
Z reflects the critical value for _____.
confidence level
Confidence interval formula
Sample mean +/- Z * s/square root of n
null hypothesis (H0)
assumes nothing is going on, usually carries equality
alternative hypothesis (HA)
the “research hypothesis”
reflects the researcher’s belief
Hypothesis Testing: 2 Possible Conclusions
- Reject the null hypothesis
2. Fail to reject the null hypothesis
Hypothesis Testing: 2 Possible Hypotheses
- null hypothesis
2. Alternative hypothesis
Hypothesis Testing Procedures
- Set up a null and research hypothesis
- Determine significance level - acceptable rate at which a Type I error can occur.
- Select test
- Compute test statistic
- Compute p-value
- Compare p-value to alpha
- Draw conclusion + summarize significance