proportions Flashcards

Question 1

Q

What are the four conditions of a Binomial random variable?

Answer

A

Fixed number of trials (𝑛)

Independent trials

Two possible outcomes per trial

Same probability of success for each trial

Question 2

Q

How is the sample proportion (𝑝̂) estimated?

Answer

A

𝑝̂ = (Number of successes) / (Total number of trials)

Question 3

Q

What is the formula for the standard error of the sample proportion?

Answer

A

𝑠𝑒(𝑝̂) = sqrt[𝑝̂(1−𝑝̂)/𝑛]

Question 4

Q

What is the formula for a 95% confidence interval for 𝑝?

Answer

A

𝑝̂ ± (1.96 × 𝑠𝑒(𝑝̂))

Question 5

Q

Why do we assume a Normal approximation for the sample proportion?

Answer

A

Because the sample sizes are large, allowing the Binomial distribution to be approximated by a Normal distribution.

Question 6

Q

What R function is used to compute a confidence interval for proportions?

Answer

A

binconf(x, n, alpha=0.05, method=”asymptotic”) from the Hmisc package.

Question 7

Q

What happens when the sample size is too small for a Normal approximation in confidence intervals?

Answer

A

The confidence interval may extend below zero, which is not possible for probabilities. A different method, such as Wilson’s interval, should be used.

Question 8

Q

What is Wilson’s confidence interval formula used for?

Answer

A

It is used for small sample sizes to ensure the confidence interval does not go below zero or above one.

Question 9

Q

What is the formula for the standard error of the difference in two proportions?

Answer

A

se(p^1− p^2)= sqrt[((p^1(1 - p^1) / n1) + (p^2 (1 - p^2) / n2)]

Question 10

Q

What are the null and alternative hypotheses for comparing two proportions?

Answer

A

Null Hypothesis (𝐻0): No difference between proportions (𝑝𝐴 - 𝑝𝐶 = 0)

Alternative Hypothesis (𝐻1): There is a difference (𝑝𝐴 - 𝑝𝐶 ≠ 0)

Question 11

Q

Why is a hypothesis test used to compare two proportions?

Answer

A

It determines whether the observed difference between two proportions is due to chance or represents a true difference.

Question 12

Q

Why is Wilson’s interval preferred for small sample sizes?

Answer

A

It avoids impossible probability values (e.g., negative probabilities) and provides more accurate confidence intervals when 𝑝̂ is close to 0 or 1.

Question 13

Q

What does the test statistic measure in a two-proportion z-test?

Answer

A

It measures the observed difference between sample proportions as a ratio of the standard error, helping determine statistical significance

Question 14

Q

How is the test statistic for comparing two proportions calculated?

Answer

A

dataestimate−hypothesizedvalue / standard error

Question 15

Q

What does a test statistic of 4.82 indicate in a z-test?

Answer

A

It means the observed difference is 4.82 standard errors away from the null hypothesis (zero difference), suggesting strong evidence against 𝐻0

Question 16

Q

What is the p-value for a test statistic of 4.82 in a z-test?

Answer

A

The probability of obtaining such an extreme value (or more) under 𝐻0 is very small, around10−6, leading to rejection of 𝐻0

Question 17

Q

How is the confidence interval for the difference in two proportions calculated?

Answer

A

(p^1−p^2)±(z×se(p^1−p^2))

Question 18

Q

What are the three sampling situations for comparing proportions?

Answer

A

Situation A: Independent samples (e.g., comparing two countries).

Situation B: One sample, mutually exclusive categories (e.g., voting choices).

Situation C: One sample, multiple response options (e.g., survey with multiple answers).

Question 19

Q

How is the standard error calculated for Situation A (independent samples)?

Answer

A

sqrt [((P^1(1-P^1))/ n1)+ ((p^2(1-p^2) / n2)]

Question 20

Q

When comparing survey responses from two countries, which sampling situation applies?

Answer

A

Situation A (independent samples), since each person belongs to only one country’s sample.

Question 21

Q

How does the choice of standard error formula impact results?

Answer

A

If the wrong formula is used, confidence intervals and hypothesis tests may be incorrect, leading to misleading conclusions.

Question 22

Q

What is the formula for the standard error of the difference between two proportions?

Answer

A

se(p^1− p^2)= sqrt [(P^1 + P^2 - ( P^1-P^2)^2) /n]

Question 23

Q

When should Situation B be used in sampling?

Answer

A

Situation B is used when one sample is asked a single question with mutually exclusive response options, such as “agree,” “disagree,” or “don’t know.”

Question 24

Q

How are statistical odds calculated?

Answer

A

Odds= p(success) / p(failure) = p / 1−p

Question 25

Q

What does an odds ratio (OR) greater than 1 indicate?

Answer

A

It indicates that the intervention group has higher odds of success compared to the control group.

Question 26

Q

How is an odds ratio (OR) calculated?

Answer

A

θ= oddsingroup1/ oddsingroup2

Question 27

Q

What does an odds ratio (OR) less than 1 indicate?

Answer

A

It indicates that the control group has higher odds of success compared to the intervention group.

Question 28

Q

Why do we use the log of the odds ratio to construct confidence intervals?

Answer

A

Because the distribution of the odds ratio is highly skewed, and taking the log makes it approximately normal.

Question 29

Q

What is the formula for the standard error of the log odds ratio?

Answer

A

seOR= sqrt [ 1/n11 + 1/n12 + 1/n21 + 1/n22 ]

Question 30

Q

How do you obtain a confidence interval for an odds ratio?

Answer

A

Compute log(θ̂)
use: log(θ^)±z1−α/2×seOR
Exponentiate the lower and upper limits to return to the odds ratio scale.

Question 31

Q

What does an odds ratio of exactly 1 indicate?

Answer

A

It indicates no difference between the two groups.

Question 32

Q

What is the formula for the pooled sample proportion when testing the difference between two proportions?

Answer

A

p^ = (x1 + x2) / (n1 + n2)

where 𝑥1 and 𝑥2 are the number of successes in each sample.

Question 33

Q

How do you interpret a confidence interval for the difference between two proportions?

Answer

A

If 0 is in the interval, there is no significant difference.

If the interval is entirely positive, 𝑝1>𝑝2

If the interval is entirely negative, 𝑝1<𝑝2

Question 34

Q

Why are odds used instead of probabilities in logistic regression?

Answer

A

Because odds have mathematical properties that allow for a linear relationship with predictor variables on the log scale.

Question 35

Q

What does a log odds ratio of 0 mean?

Answer

A

It means the odds ratio is 1, indicating no difference between the two groups.

Question 36

Q

What transformation is used to make the odds ratio’s distribution approximately normal?

Answer

A

The natural logarithm (log transformation) is applied to the odds ratio.