Epidemiology Chapter 2 Flashcards
Estimating population proportion, p, by sample proportion, p hat
Number in the sample with the characteristic/ sample size let X_i=1 if the ith person has the characteristic in question and 0 otherwise
p_hat = sum from i=1 to n of xi / n
This isa point estimate of p
The sampling distribution of p_hat
If each of our samples is large the sampling distribution will be approximately normal with expected value p and variance (p(1-p))/n, where n is the sample size of each sample
P hat is an unbiased estimator of p
Standard error of p hat (one sample)
SE(p hat)=sqrt((p(1-p))/n)
Inference for a proportion in a one sample case
Confidence interval
Hypothesis
Test statistic
p ̂±z_(α/2) (sqrt(p(1-p)/n))
95% z_α/2 = 1.96
90% z_α/2=1.645
99% z_α/2=2.5758
Null - p=p_0
Alternative
one sides: pp_0
two sides: p≠p_0
Test statistic
z=(p ̂-p_0)/sqrt((p_0 (1-p_0 ))/n)
- compare with the standard normal distribution
Inference for a proportion in a two sample case
Notation
Point estimates
Population 1. Population 2
Proportion of successes p_1. p_2
Sample size n_1 n_2
Number of successes m_1. m_2
p ̂_1=m_1/n_1, p ̂_2=m_2/n_2
p ̂_1-p ̂_2 is a point estimate of p_1-p_2
Inference for a proportion in a two sample case
Confidence interval
Hypothesis
Test statistic
Confidence interval (p ̂_1-p ̂_2)±z_(α/2)SE(p ̂_1-p ̂_2)
SE(p ̂_1-p ̂_2) = sqrt((p ̂_1(1-p ̂_1)/n1) +(p ̂_2(1-p ̂_2)/n2))
Hypothesis Null : p1=p2 or p1-p2=0 Alternative One sides: p1p2 (p1-p2>0) Two sided: p1≠p2 or p1-p2≠0
Test statistic
Under null, p1=p2=p
z=(p ̂_1-p ̂_2)/sqrt(p bar(1-p ̂)(1/n1 +1/n2))
p ̂=(m1+m2)/(n1+n2)
Inference for rates the one sample case
Estimated rate
Standard Error
Confidence interval
Let D be the number of events and Y be the total person time
estimated rate = lama hat = D/Y
standard error (log(lamda hat))=sqrt(1/D)
CI for log rate
log(λ ̂)±z_(α/2) √(1/D)
CI for rate
λ ̂e^(±z_(α/2) √(1/D))
Inferences about rate ratio (2 samples)
Point estimates
Confidence interval
Standard Error
Suppose we have two groups to compare.
Let D_0 and D_1 denote the number of events in the unexposed and exposed groups respectively
Let Y_0 and Y_1 denote the person years
Exposed: λ ̂_1=D_1/Y_1 Unexposed:λ ̂_0=D_0/Y_0 Rate Ratio (exposed to unexposed):λ ̂_1/λ ̂_0
Confidence interval log rate ratio log(λ ̂_1/λ ̂_0)±z_(α/2) √(1/D_0+1/D_1) rate ratio (λ ̂_1/λ ̂_0)e^(±z_(α/2) √(1/D_0+1/D_1))
SE(log(λ ̂))=√(1/D_0+1/D_1)
Testing for equality of rates ( 2 samples)
Null: λ_1=λ_0 or log(λ_1/λ_0)=0, no difference
Alternative: log(λ_1/λ_0)≠0 or log(λ_1/λ_0)>0 or log(λ_1/λ_0)<0
Test statistic
z=(log(λ ̂_1/λ ̂_0)-0)/SE(log(λ ̂_1/λ ̂_0))
will follow a standard normal distribution under the null hypothesis
Inference for odds
Point estimate for odds
SE(log(odds hat))
Confidence intervals
Odds hat = Number in our sample with the disease/ Number in our sample without the disease = m/n-m
SE(log(odds hat)
= sqrt(1/m +1/n-m)
CI
95% confidence interval for the true log odds
log((odds) ̂ )±1.96SE(log((odds) ̂ ))
95% confidence interval for the true odds
e^log((odds) ̂ )±1.96SE(log((odds) ̂ ))
Inference for odds ratio
Point estimate
Standard Error
Confidence intervals
Diseased Non-diseased
Exposed. a. b
Unexposed c. d
(OR) ̂=odds of disease for the exposed group/odds of the disease for the unexposed group = (a/b)/(c/d) = ad/bc
SE(log((OR) ̂))=sqrt(1/a+1/b+1/c+1/d)
Confidence interval for log odds ratio
log(OR) ̂±z_(α/2)SE(log(OR) ̂)
for odds ratio
e^log(OR) ̂±z_(α/2)SE(log(OR) ̂)
Inferences for Relative Risks
Point estimate
Standard Error
Confidence interval
The ratio of the risk in the exposed group to that in the unexposed group
Diseased Non-diseased
Exposed. a. b
Unexposed c. d
(RR) ̂=risk in exposed group/risk in unexposed group = a(c+d)/c(a+b)
SE(log(RR) ̂)=sqrt(1/a-1/a+b+1/c-1/c+d)
Confidence interval for log relative risk
log(RR) ̂±z_(α/2)SE(log(RR) ̂)
for odds ratio
e^log(RR) ̂±z_(α/2)SE(log(RR) ̂)