7 | Statistics for Proportions and Frequencies II Flashcards
(POLL)
How many times, on average, for 100 reports in scientific papers the confidence interval does not contain the true population value?
* never
* 1 time
* 5 times
* 10 times
* always
5 times
(POLL)
Does the 95% confidence interval allow us as well to judge on the significance level for 0.05 ?
Yes
can be used to judge statistical significance at the 0.05 significance level.
Here’s how:
Key Idea:
hypothesis test at α = 0.05 rejects null H₀ if p-value < 0.05.
95% CI provides range where true parameter is likely to lie with 95% confidence.
Decision Rule:
If the H0 value (e.g., 0 in a difference test) is outside 95% CI → Reject H₀ (p-value < 0.05, significant result).
If the H0 value is inside 95% CI → Fail to reject H₀ (p-value ≥ 0.05, not significant).
Example 1: Significant Result
Suppose a mean difference CI is (2.5, 5.8).
Since 0 is not in the interval, we reject H₀ (p < 0.05).
Example 2: Not Significant
Suppose a CI is (-1.2, 3.5).
Since 0 is inside the interval, we fail to reject H₀ (p ≥ 0.05).
Advantages of CI over p-values:
- CIs show the effect size (not just whether something is significant).
- CIs provide more information than just a yes/no decision.
(POLL)
Assuming a dependency of two variables, the same effect size and the same variance assumed, larger sample sizes gives us …
* lower p-values
* larger p-values
* the same p-value
- lower p-values
(POLL)
Can logistic regression be used instead of a Chisq-Test to evaluate the dependency between two variables?
Yes
(POLL)
To compare two distributions with values of 0 and 1 only we can use the …
* Chisq-Test
* Fisher-Test
* Kolmogorov-Smirnov-Test
* T-Test
- Kolmogorov-Smirnov-Test
(a and b could be as well seen as true, but they check if these variables are independent not the distributions)
(POLL)
To determine the effect size for a 3x2 contingency table we can use …
* Cohen’s d
* Cohen’s h
* Cohen’s w
* Cramer’s v
- Cohen’s w
- Cramer’s v
(Cohen’s h: for 2x2 table only)
(POLL)
We observe a 80 to 20 ratio in our sampling, but we expected a 50/50 ratio, what is the effect size Cohen’s w:
* 0.1
* 0.2
* 0.4
* 0.6
* 1.0
w = sqrt( ∑ [ (p_obs - p_exp)^2 / p_exp] )
= sqrt(0.36)
= .0.6
(POLL)
Simpson’s Paradoxon can happens
* during watching TV
* in un-balanced experimental design with unequally sized groups
* if a third confounder variable interfer with our two variables of investigation
* and can be usually ignored
* and might alter the interpretation of the result
* and might be removed by splitting the data based on the confounder variable
- in un-balanced experimental design with unequally sized groups
- if a third confounder variable interfer with our two variables of investigation
- and might alter the interpretation of the result
- and might be removed by splitting the data based on the confounder variable
https://www.arte.tv/de/videos/107398-002-A/mathewelten/
(POLL)
You observe zero blood pressure increases in your patients in 30 observations. What is the 95% confidence interval ou state for the confidence limits to make conclusions about the population.
* 0, 0.05
* 0, 0.1
* 0, 0.2
* 0.1, 0.5
* 0.25, 0.35
0, 0.1
3/30 = 0.1
rule of three CI95% = [0, 3/N]
_________________________
rule of three:
* if the numerator is zero (zero observations) rule of
three: CI95%[0,3/n]
* if the numerator is one (one observation) rule of
five CI95%[0,5/n]
* if the numerator is two (two observations) rules of
seven CI95%[0,7/n
Confidence interval?
- confidence in statistics: another way to describe probability
- mean of estimate plus & minus variation in estimate.
- range of values estimate expected in if testing again, within certain level of confidence.
What can we use to express uncertainty about our inferences? Which is more informative?
- p-values
- confidence intervals (a bit more informative - or did he mean both together?
What’s going on here?
~~~
> prop.test(6,44) # sample in 2009
1‐sample proportions test with continuity correction
data: 6 out of 44, null probability 0.5
X‐squared = 21.841, df = 1, p‐value = 2.962e‐06
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.05667816 0.28046238
sample estimates:
p
0.1363636
~~~
Summary of prop.test(6,44) Output
* Test: 1-sample proportion test (Chi-square with continuity correction).
* Null Hypothesis (H₀): True proportion p = 0.5 (default)
* Observed Proportion: 6/44 ≈ 0.136 (13.6%).
* Test Statistic: X² = 21.841 (df = 1).
* p-value: 2.96e-06 (very small, reject H₀).
* 95% CI: [0.057, 0.280] (true proportion likely between 5.7% and 28.0%).
* Conclusion: Strong evidence that p ≠ 0.5; observed proportion is significantly lower.
Null Hypothesis: H0 with one group?
- sample proportion not different from known population proportion
- (sample mean not different from known population mean)
alternative hypothesis: - same sentences but without not
Null Hypothesis: H0 with two or more groups?
- groups not different from each other regarding population proportions (or means)
alternative hypothesis: - same sentences but without not
p-value according to D
gives probability to observe a difference as large or even larger as seen with the sample if H0 would be true
Can we reject the null hypothesis?
Based on our sample data we will never know if the null hypothesis is true or false‼
We almost always observe a difference‼
What does the p-value tell us when testing one group?
We need to know the population mean/proportion!:
Tells us how sure we are that our sample is not different from the population.
Can we reject the null hypothesis?
Based on our sample data we will never know if the null hypothesis is true or false‼
We almost always observe a difference‼
What does the p-value tell us when testing two groups?
how often get observed or larger difference by random sampling from two populations if means/proportions of both populations would be equal.
What does the p-value not tell us?
how sure we are that there is really a difference!
Common threshold for significance?
- statistically significant does not necessary tell us that the observation is “important”
- significance level α: just custom threshold for p-value to claim significance (mostly <0.05)
- α states the probability of falsely rejecting the null hypothesis
p-value of 0.04 → random sampling of two _________ populations would observe in __ of all sampling procedures an _____________ → significant
p-value of 0.04 → random sampling of two identical populations would observe in 4% of all sampling procedures an equal or larger difference → significant
even if something is not significant don’t stop thinking‼ What must one be aware of?
- p-value of 0.06 → not significant
- significant * (<0.05), highly significant ** (<0.01). extremely significant *** (<0.001)??
- it might be that a fivefold change can be just significant, whereas a 1.05 fold change can be extremely significant ….
- with same fold changes but larger sample size you can go from significant to highly significant
- doesn’t tell you anything about the strength of the effect !
Rosnow and Rosenthal (1989): “Surely God loves the .06 nearly as much as the .05!”
Significance, Importance and Scatter:
Large _______ can result in ________ even with larger changes, possibly more important results!
Large scatter can result in ‘non-significant’ even with larger changes, possibly more important results!
How does sample size affect significance? Here is a hint:
~~~
> set.seed(1234)
> s1=rnorm(1000,mean=10)
> s2=rnorm(1000,mean=11)
> t.test(s1,s2)$p.value
[1] 2.826639e‐108
> s1=rnorm(100,mean=10)
s2=rnorm(100,mean=11)
t.test(s1,s2)$p.value
[1] 7.321056e‐09
> s1=rnorm(20,mean=10)
s2=rnorm(20,mean=11)
t.test(s1,s2)$p.value
[1] 0.05413113
~~~
A larger sample size will increase the significance for data sampled from the same distribution. The code shows sampling from a normal distribution.
* Sample size 1000: extremely significant
* Sample size 100: very significant
* Sample size 20: barely significant
R:
CI and p-value for one group with known population proportion. What’s going on here?
~~~
> prop.test(9,60,p=0.298) # http://de.statista.org/
1‐sample proportions test with continuity correction
data: 9 out of 60, null probability 0.298
X‐squared = 5.5948, df = 1, p‐value = 0.01801
alternative hypothesis: true p is not equal to 0.298
95 percent confidence interval:
0.07501469 0.27076912
sample estimates:
p
0.15
> prop.test(9,60,p=0.298)$p.value
[1] 0.01801402
> prop.test(9,60,p=0.298)$conf.int
[1] 0.07501469 0.27076912
attr(,”conf.level”)
[1] 0.95
> prop.test(11,60,p=0.298)$p.value # two smokers more
[1] 0.07173252
> prop.test(11,60,p=0.298)$conf.int # ci does cross pop.prop
[1] 0.09932066 0.30853889
attr(,”conf.level”)
[1] 0.95
~~~
Smoker students!
1st test: prop.test(9,60, p=0.298)
* example of one-sample proportion test (prop.test()),
* compares observed proportion against hypothesized population proportion.
* H0: true proportion is p=0.298
* H1 true proportion is not 0.298 (two-sided test).
* Chi-squared statistic: 5.5948
* p-value: 0.01801 → less than 0.05 → reject H0, suggesting that observed proportion significantly different from 0.298.
* 95% CI: [0.075,0.271] → 0.298 not in interval, test confirms a significant difference.
2nd Test: prop.test(11, 60, p = 0.298)
* two more individuals are counted in the sample, increasing the proportion… etc
R:
What is a prop.test?
- function performs a test of proportions, either a one-sample or two-sample test
- uses Chi-squared test
1 sample: - prop.test(x,y) → 1-sample prop test, checks if obs. prop differs from exp. prop (default = 0.5).
- x → number of observed “successes” (e.g., cases, events, etc.)
- 44 → total number of trials (or observations).
2 sample: - prop.test(c(x,y), c(a,b))
CI vs p-value? Effect size?
- p-values are only a measure of randomness
- CI’s tells about proportions difference to the population or other group
- CI’s are better measure, unfortunately used less often in science
GraphPad-Website 2022: - Consider emphasizing effect size and CI, rather than P value.
- effect size can be difference or ratio or correlation coefficient (or something else)
- with 95% CI to demonstrate how precisely ES has been determined.
- → more useful, in many cases, than P values
R:
Two variables / CI for each group
CI and p-value for difference between groups?
Instead of computing it separately we compute it better together
→ 2 sample prop test: prop.test(c(a,b), c(x,y)) or prop.test(aids.azt)
chisq.test
chisq.test vs prop.test?
- same p-value as prop.test for 2x2 tables
- but no CI computed
- chi sq can be used for more than two levels (as for 3x2 tables, one of the two variables has 3 levels)
- compare the output χ2 with tabulated values → p.value
What is Yates correction? Why is it needed?
χ 2 = Σ1 ≤ j ≤ m (njo – nje)2</sup / nje
χ 2yates = Σ1 ≤ j ≤ m (|njo – nje| - 0.5)2</sup / nje
nj= observed, nje = expected
Yates’ Continuity Correction in Chi-Square Tests
1. What is Yates’ Correction?
Yates’ continuity correction is an adjustment applied to the Chi-square test for proportions to account for the fact that the Chi-square test is based on a continuous distribution, while proportions (counts) are discrete.
It slightly reduces the absolute difference between observed and expected frequencies, making the test less likely to overestimate statistical significance for small sample sizes.
2. Why is it needed?
* The Chi-square test assumes a continuous distribution, but when dealing with small sample sizes (especially with expected counts below 5), the discrete nature of count data can cause overestimation of significance.
* Yates’ correction reduces the Chi-square value slightly, making the test more conservative.
What can we use to check if data comes from the same distribution if we have more than 2 variables?
Logistic regression
Logistic Regression / more than 2 variables? Cf chi square? Advantage?
→ we code factors as simple numbers → convert to data frame
→ Significant relation counts, treatment and progress.
→ Pearson line same p-value and χ2 value as chisq.test!
Advantage: with logistic regression we can easily add more factors/variables in our model.
[[[
→ just an example with two for simplicity reasons
~~~
> options(useFancyQuotes=FALSE)
> sbi$data$aids.azt
DiseaseProgress NoDiseaseProgress
AZT 76 399
Placebo 129 332
> treatment=c(1,0,1,0); progress=c(1,1,0,0)
> df.azt=data.frame(
+ counts=expand.grid(sbi$data$aids.azt),
+ treatment=treatment,progress=progress)
> colnames(df.azt)[1]=’counts’
> df.azt
counts treatment progress
1 76 1 1
2 129 0 1
3 399 1 0
4 332 0 0
> library(MASS)
> with(df.azt,loglm(counts~treatment+progress))
Call:
loglm(formula = counts ~ treatment + progress)
Statistics:
X^2 df P(> X^2)
Likelihood Ratio 19.79944 1 8.600783e‐06
Pearson 19.63833 1 9.357289e‐06
~~~
Chatgpt:
Explanation of the Logistic Regression Example
This example demonstrates a simple log-linear model using logistic regression for categorical data analysis. The goal is to analyze whether treatment (AZT vs. Placebo) affects disease progression.
1. Data Setup
The dataset contains information about whether patients on AZT or a placebo experienced
We encode:
* Treatment: 1 = AZT, 0 = Placebo
* Disease Progression: 1 = Progress, 0 = No Progress
r
> treatment=c(1,0,1,0); progress=c(1,1,0,0)
- Treatment 1,0,1,0 corresponds to: (AZT, Placebo, AZT, Placebo)
- Progress 1,1,0,0 corresponds to: (Disease, Disease, No Disease, No Disease)
The data frame df.azt is created using expand.grid():
> df.azt=data.frame( \+ counts=expand.grid(sbi$data$aids.azt), \+ treatment=treatment,progress=progress)
Then, we rename the first column:
> colnames(df.azt)[1]='counts'
Resulting in:
> df.azt counts treatment progress 1 76 1 1 2 129 0 1 3 399 1 0 4 332 0 0
- Running the Log-Linear Model (Logistic Regression)
The function loglm() from the MASS package is used to fit a log-linear model:
> with(df.azt, loglm(counts ~ treatment + progress))
This models the relationship between counts, treatment, and disease progression.
Key Output:
Statistics: X^2 df P(> X^2) Likelihood Ratio 19.79944 1 8.600783e‐06 Pearson 19.63833 1 9.357289e‐06
- Likelihood Ratio Test (χ2\chi^2χ2 = 19.799, df = 1, p = 8.6e-6)
- Pearson χ2\chi^2χ2 = 19.638, df = 1, p = 9.35e-6
Since the p-value is very small (< 0.05), there is a significant relationship between treatment and disease progression.
Key Takeaways
1. Significant Relationship:
o The test suggests that AZT treatment affects disease progression.
2. Pearson’s Chi-Square & Likelihood Ratio:
o The p-values and chi-square values match chisq.test(), showing that log-linear models generalize Chi-square tests.
3. Advantage of Logistic Regression:
o Logistic regression allows adding more variables (e.g., age, gender, other treatments) to the model.
]]]
Kolmogorov-Smirnov test?
- (K-S or KS test)
- nonparametric test of equality of continuous (discontinuous), 1D prob distributions
- can be used to compare sample with reference prob distribution (one-sample K-S test)
- can be used to compare two samples (two-sample K-S test)
- named after Andrey Kolmogorov and Nikolai Smirnov.
(Wikipedia 2020)
When is a ks-test useful?
- No clue what test to apply
- check if two samples are coming from the same distribution
- flexible for different type of problems
- qualitative and quantitative data
- p-value < 0.05 → reject H0 that samples are from pop having same properties (dist)
Testing for distributions: what other way than chi-squared or logistic regression? Give an example in R
Kolmogorov-Smirnov test
Eg: Check for binomial distribution - Comparing Two Identically Distributed Samples:
> dice6a=rbinom(100,100,1/6) # 100 times with 100 dices > dice6b=rbinom(100,100,1/6) # and again > head(dice6b) # amount of ones (or sixes) [1] 19 23 22 15 16 16 > ks.test(dice6b,dice6a) # same distribution? H0 is yes! Asymptotic two‐sample Kolmogorov‐Smirnov test data: dice6b and dice6a D = 0.09, p‐value = 0.8127 alternative hypothesis: two‐sided
→ p-value >= 0.05 → we accept H0 that both samples might come from the same distribution
Eg: Comparing to a Different Distribution:
> ks.test(dice6b,rbinom(100,100,1/2)) Asymptotic two‐sample Kolmogorov‐Smirnov test data: dice6b and rbinom(100, 100, 1/2) D = 1, p‐value < 2.2e‐16 alternative hypothesis: two‐sided
→ p-value < 0.05 → we reject H0 that they come from the same distribution
→ we accept H1, that the two samples come from different distributions
P-value vs CI? How are they related? Advantages and disadvantages?
What is an effect size? Why is it important?
TLDR:
* Effect size shows “how much” difference there is, not just “whether” there is one.
* Larger effect sizes mean stronger relationships or bigger differences.
* Effect size is crucial for practical interpretation, meta-analysis, and power analysis.
What Is an Effect Size?
* quantitative measure of strength of relationship between variables or magnitude of difference between groups.
* complements p-values by showing how meaningful result is, rather than just whether statistically significant.
Why Is Effect Size Important?
* Practical Significance: A small p-value tells us there is a difference, but effect size tells us whether that difference is meaningful.
* Comparability: It allows comparisons across different studies, even with different sample sizes.
* Power Analysis: Effect size is used to determine the sample size needed for detecting significant effects in hypothesis testing.
Well known effect size?
Correlation system: correlation coefficient R
What is an important question in inferential statistics that is often ignored?
How large is the effect size for a Contingency Table?
See Fergusson, 2009: http://dx.doi.org/10.1037/a0015808
What are some effect sizes for contingency tables? Name 6.
eing based on proportions, doesn’t change when you scale the numbers by the same factor.
Larger difference in proportions (70/30 vs 30/70) results in a higher Cohen’s W, as the disparity between the two categories is more pronounced.
In conclusion, Cohen’s W reflects the strength of the association or effect size, and is not affected by the total size of the contingency table but by the relative proportions.
- Cohen’s w (omega) (focus in this course)
- Cohen’s h
- Cramer’s V (phi)
- Odds Ratio
- Relative Risk
- Numbers needed to treat (NNT)
- …
How is Cohen’s ω calculated?
square root of proportions based χ2 value:
ω = √ [ Σ1 ≤ I ≤ n(po,i - pe,i)2 / pe,i ]
po,i: observed proportion in cell i
pe,i: expected proportion in cell i
n: number of cells of a contingency table
R:
Implementations of Cohen’s ω?
> cohensW = function (tab) { \+ pe=prop.table(chisq.test(tab)$expected) \+ po=prop.table(tab) \+ w=sqrt(sum(((po‐pe)^2)/pe)) \+ return(w[[1]]) \+ } > cohensW(aids.azt) [1] 0.1448486 > library(pwr) > po=prop.table(aids.azt) > pe=prop.table(chisq.test(aids.azt)$expected) > pwr::ES.w1(pe,po) [1] 0.1448486 > pwr::ES.w2(prop.table(aids.azt)) [1] 0.1448486
How do we calculate cohens h? Limitations?
arcsine transformation difference between square root of two proportions.
h = 2(arcsin√pa - arcsin√ pb)
pa : proportion of group a
pb : proportion of group b
only available for 2x2 tables
Implementations of Cohen’s h?
```> cohensH = function (tab) {
+ pt=prop.test(tab)
+ h=2*abs(asin(sqrt(pt$estimate[1]))‐
+ asin(sqrt(pt$estimate[2])))
+ return(h[[1]])
+ }
> cohensH(aids.azt)
[1] 0.2917774
> library(pwr)
> pwr::ES.h(0.16,0.28)
[1] ‐0.292164
> pwr::ES.h(0.28,0.16)
[1] 0.292164
> sbi$cohensH(aids.azt)
[1] 0.2917774
~~~
What is cramers V (phi) and how do we calculate it?
- Alternative to cohens omega for larger contingency tables
- measure of association between two nominal variables → value between 0, +1 (inclusive).
- based on Pearson’s chi-squared and, published by Harald Cramer in 1946 (Wikipedia).
- V = √ [ χ2 / N(t - 1)
- N: number of items,
- t: number of rows or columns of the contingency table, what ever is smallers
Implementation of Cramers V
> # use dg.prefix to not overwrite lsr::cramersV > dg.cramersV = function (tab,correct=TRUE) { \+ t=min(dim(tab)) \+ return(sqrt(chisq.test( \+ tab,correct=correct)$statistic/ \+ (sum(tab)*(t‐1)))[[1]]) \+ } > dg.cramersV(aids.azt) [1] 0.1422651 > sbi$cramersV(aids.azt) # is as well in sbi.R [1] 0.1422651
Interpretation of Cohen’s d, h, w (ϕ)
Which effect size should we mostly use and why? What are it’s limits? Rule of thumb?
Use Cohen’s w for continency tables as it is similar to r and
it is available to larger contingency tables as well.
Limits:
.1 small effect
.3 medium effect
.5 large effect
Based on Cohen’s recommendations. But it’s a rule of thumb - what is small, medium
or large, might depend on your problem!
————————————————
r! | variance | d / h | w! | RR | effect
————————————————
.1 | 1% | .2 | .1 | 1.49 | small effect
.3 | 9% | .5 | .3 | 3.45 | medium effect
.5 | 25% | .8 | .5 | 9.00 | large effect
————————————————
https://en.wikipedia.org/wiki/Effect_size
Fergusson 2009 has other (lower) ranges.
Interpretation of Cramers V - what does the effect size depend on?
df=1 (small=.10, medium=.30, large=.50)
df=2 (small=.07, medium=.21, large=.35)
df=3 (small=.06, medium=.17, large=.29)
df=4 (small=.05, medium=.15, large=.25)
df=5 (small=.05, medium=.13, large=.22)
df is based on the minimum dimension:
3x2 table - df 1
3x3 table - df 2
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd edition). Hillsdale, NJ: Lawrence Erlbaum Associates
R:
library(vcd) – effect sizes?
Yes eg cramers V, contingency coefficient
~~~
> library(vcd)
> assocstats(aids.azt)
X^2 df P(> X^2)
Likelihood Ratio 19.799 1 8.6008e‐06
Pearson 19.638 1 9.3573e‐06
Phi‐Coefficient : 0.145
Contingency Coeff.: 0.143
Cramer’s V : 0.145
> library(lsr)
> lsr::cramersV(aids.azt)
[1] 0.1422651
> dg.cramersV(aids.azt,correct=TRUE)
[1] 0.1422651
> cramersV(aids.azt,correct=FALSE)
[1] 0.1448486
~~~
Contingency Coefficient ?
- Used for large Tables, > 5x5
- C = √ [ χ2 / ( N + χ2) ]
- Problem: can’t reach 1.0!
~~~
> contCoeff <‐ function (tab,correct=TRUE) {
+ chisq=chisq.test(tab,correct=correct)$statistic
+ c=sqrt((chisq/(sum(tab)+chisq)))
+ return(c[[1]])
+ }
> contCoeff(aids.azt)
[1] 0.1408469
> contCoeff(aids.azt,correct=FALSE)
[1] 0.1433526 ```
How to report a Chisq-Test (2x2 table) – what must be included?
- Statement of research question
- result
- statistics
- significance
- confidence intervals
- effect size.
How to report a Chisq-Test (2x2 table)? Eg:
> ct=chisq.test(aids.azt) > pt=prop.test(aids.azt) # confidence interval
A _____ was performed to examine the relation between AZT treatment and AIDS progress based on CD4 helper cell counts in HIV patients. The relation between these variables was ______, ______. Patients with _____ showed _____, _____ less disease progress (_____) than _____treated patients (_____), _____ = _____.
A chi-square test of independence was performed to examine the relation between AZT treatment and AIDS progress based on CD4 helper cell counts in HIV patients. The relation between these variables was significant, χ2(1,N=936) = 18.94, p<1e-05. Patients with AZT treatment showed 11.98%, CI95%[6.51,17.46] less disease progress (16%) than Placebo treated patients (27.98%), Cohen’s w = 0.145.
How could you analyse the nym dataset?
analysis of a contingency table to explore the relationship between age (categorized into quantiles) and gender.
breakdown of the steps and results:
1. Creating the cAge variable: The age data (nym$age) is divided into quartiles using cut(). The labels for these quartiles are set as “1st”, “2nd”, “3rd”, and “4th”.
2. Constructing the contingency table: The table tab is created with nym$gender as the rows and the cAge categories as the columns. The table shows the counts of individuals across different age categories and genders.
3 Proportions within rows: prop.table(tab, 1) calculates the proportions within each gender (rows). This gives insight into the distribution of genders across the different age categories.
4 Chi-squared test: The chisq.test() function is applied to test the independence of gender and age category. The test returns a very small p-value (p = 9.374e-11), indicating a significant association between gender and age categories.
5. Association tests using vcd package: assocstats() provides a summary of different association measures, including:
Contingency Coeff. (0.218): Measures the strength of association.
Cramer’s V (0.223): Another measure of association strength, where values closer to 0 suggest weak association, and values closer to 1 suggest strong association.
6. Cohen’s W:
The value of Cohen’s W is 0.2229, which is similar to Cramer’s V. This indicates a small to medium effect size for the association between gender and age category.
7. Pairwise comparisons: The pairwise proportion tests (pairwise.prop.test()) evaluate differences between the proportions of different age categories. P-values are adjusted using the Holm method for the first set and left unadjusted for the second. Significant differences are found, especially between the “1st” and “4th” categories, with p-values smaller than 0.05.
8. Test for equality of proportions between the first two gender groups and the first two age categories:
A chi-squared test for proportions between the first two gender groups and the first two age categories (using prop.test()) results in a p-value of 0.01369, indicating a significant difference between the proportions.
Takeaways:
→ significant association between gender and age, as indicated by Chi-sq, assoc measures.
→ pairwise comparisons show notable differences between certain age categories.
→ effect sizes suggest small to medium strength of association between gender, age cats.
~~~
> nym=sbio$data$nym
> cAge=cut(nym$age,breaks= quantile(nym$age,
+ c(0,0.25,0.5,0.75,1)),ordered=TRUE,
+ include.lowest =TRUE)
> levels(cAge)=c(“1st”,”2nd”,”3th”,”4th”)
> table(sex=nym$gender,cAge)
cAge
sex 1st 2nd 3th 4th
Female 111 84 60 37
Male 145 175 192 196
> tab=table(sex=nym$gender,cAge)
> prop.table(tab,1)
cAge
sex 1st 2nd 3th 4th
Female 0.3801370 0.2876712 0.2054795 0.1267123
Male 0.2048023 0.2471751 0.2711864 0.2768362
> chisq.test(tab)
Pearson’s Chi‐squared test
data: tab
X‐squared = 49.674, df = 3, p‐value = 9.374e‐11
> library(vcd)
> assoc(tab,shade=TRUE)
> assocstats(tab)
X^2 df P(> X^2)
Likelihood Ratio 50.531 3 6.1582e‐11
Pearson 49.674 3 9.3741e‐11
Phi‐Coefficient : NA
Contingency Coeff.: 0.218
Cramer’s V : 0.223
> cramersV(tab)
[1] 0.2228766
> cohensW(tab)
[1] 0.2228766
> cohensW(t(tab))
[1] 0.2228766
> pairwise.prop.test(t(tab))
Pairwise comparisons using Pairwise comparison of proportions
data: t(tab)
1st 2nd 3th
2nd 0.04107 ‐ ‐
3th 2.5e‐05 0.07731 ‐
4th 4.6e‐10 0.00013 0.07731
P value adjustment method: holm
> prop.test(tab[1:2,1:2])
2‐sample test for equality of proportions with continuity correction
data: tab[1:2, 1:2]
X‐squared = 6.0775, df = 1, p‐value = 0.01369
alternative hypothesis: two.sided
95 percent confidence interval:
0.02363161 0.20857993
sample estimates:
prop 1 prop 2
0.5692308 0.4531250
> pairwise.prop.test(t(tab),p.adjust=’none’)
Pairwise comparisons using Pairwise comparison of proportions
data: t(tab)
1st 2nd 3th
2nd 0.014 ‐ ‐
3th 4.9e‐06 0.039 ‐
4th 7.6e‐11 3.3e‐05 0.039
P value adjustment method: none
~~~
Larger Contingency Tables – what’s going on here?
~~~
> nym=sbio$data$nym
> cAge=cut(nym$age,breaks= quantile(nym$age,
+ c(0,0.25,0.5,0.75,1)),ordered=TRUE,
+ include.lowest =TRUE)
> levels(cAge)=c(“1st”,”2nd”,”3th”,”4th”)
> table(sex=nym$gender,cAge)
cAge
sex 1st 2nd 3th 4th
Female 111 84 60 37
Male 145 175 192 196
> tab=table(sex=nym$gender,cAge)
> prop.table(tab,1)
cAge
sex 1st 2nd 3th 4th
Female 0.3801370 0.2876712 0.2054795 0.1267123
Male 0.2048023 0.2471751 0.2711864 0.2768362
> chisq.test(tab)
Pearson’s Chi‐squared test
data: tab
X‐squared = 49.674, df = 3, p‐value = 9.374e‐11
> library(vcd)
> assoc(tab,shade=TRUE)
> assocstats(tab)
X^2 df P(> X^2)
Likelihood Ratio 50.531 3 6.1582e‐11
Pearson 49.674 3 9.3741e‐11
Phi‐Coefficient : NA
Contingency Coeff.: 0.218
Cramer’s V : 0.223
> cramersV(tab)
[1] 0.2228766
> cohensW(tab)
[1] 0.2228766
> cohensW(t(tab))
[1] 0.2228766
> pairwise.prop.test(t(tab))
Pairwise comparisons using Pairwise comparison of proportions
data: t(tab)
1st 2nd 3th
2nd 0.04107 ‐ ‐
3th 2.5e‐05 0.07731 ‐
4th 4.6e‐10 0.00013 0.07731
P value adjustment method: holm
> prop.test(tab[1:2,1:2])
2‐sample test for equality of proportions with continuity correction
data: tab[1:2, 1:2]
X‐squared = 6.0775, df = 1, p‐value = 0.01369
alternative hypothesis: two.sided
95 percent confidence interval:
0.02363161 0.20857993
sample estimates:
prop 1 prop 2
0.5692308 0.4531250
> pairwise.prop.test(t(tab),p.adjust=’none’)
Pairwise comparisons using Pairwise comparison of proportions
data: t(tab)
1st 2nd 3th
2nd 0.014 ‐ ‐
3th 4.9e‐06 0.039 ‐
4th 7.6e‐11 3.3e‐05 0.039
P value adjustment method: none
~~~
analysis of a contingency table to explore the relationship between age (categorized into quantiles) and gender.
1. Creating the cAge variable
2. Constructing the contingency table
3 Proportions within rows
4 Chi-squared test
5. Association tests using vcd package
6. Cohen’s W
7. Pairwise comparisons
8. Test for equality of proportions between the first two gender groups and the first two age categories
Takeaways:
→ significant association between gender and age, as indicated by Chi-sq, assoc measures.
→ pairwise comparisons show notable differences between certain age categories.
→ effect sizes suggest small to medium strength of association between gender, age cats.
Which values do you need for an assocplot? Draw a vcd style assocplot from the analysis below.
Then write a report.
~~~
> nym=sbio$data$nym
> cAge=cut(nym$age,breaks= quantile(nym$age,
+ c(0,0.25,0.5,0.75,1)),ordered=TRUE,
+ include.lowest =TRUE)
> levels(cAge)=c(“1st”,”2nd”,”3th”,”4th”)
> table(sex=nym$gender,cAge)
cAge
sex 1st 2nd 3th 4th
Female 111 84 60 37
Male 145 175 192 196
> tab=table(sex=nym$gender,cAge)
> prop.table(tab,1)
cAge
sex 1st 2nd 3th 4th
Female 0.3801370 0.2876712 0.2054795 0.1267123
Male 0.2048023 0.2471751 0.2711864 0.2768362
> chisq.test(tab)
Pearson’s Chi‐squared test
data: tab
X‐squared = 49.674, df = 3, p‐value = 9.374e‐11
> chisq.test(tab)$residuals
cAge
sex 1st 2nd 3rd 4th
Female 4.1924958 0.9626932 -1.5835653 -3.7626720
Male -2.6924487 -0.6182480 1.0169762 2.4164129
> library(vcd)
> assoc(tab,shade=TRUE)
> assocstats(tab)
X^2 df P(> X^2)
Likelihood Ratio 50.531 3 6.1582e‐11
Pearson 49.674 3 9.3741e‐11
Phi‐Coefficient : NA
Contingency Coeff.: 0.218
Cramer’s V : 0.223
> cramersV(tab)
[1] 0.2228766
> cohensW(tab)
[1] 0.2228766
> cohensW(t(tab))
[1] 0.2228766
> pairwise.prop.test(t(tab))
Pairwise comparisons using Pairwise comparison of proportions
data: t(tab)
1st 2nd 3th
2nd 0.04107 ‐ ‐
3th 2.5e‐05 0.07731 ‐
4th 4.6e‐10 0.00013 0.07731
P value adjustment method: holm
> prop.test(tab[1:2,1:2])
2‐sample test for equality of proportions with continuity correction
data: tab[1:2, 1:2]
X‐squared = 6.0775, df = 1, p‐value = 0.01369
alternative hypothesis: two.sided
95 percent confidence interval:
0.02363161 0.20857993
sample estimates:
prop 1 prop 2
0.5692308 0.4531250
> pairwise.prop.test(t(tab),p.adjust=’none’)
Pairwise comparisons using Pairwise comparison of proportions
data: t(tab)
1st 2nd 3th
2nd 0.014 ‐ ‐
3th 4.9e‐06 0.039 ‐
4th 7.6e‐11 3.3e‐05 0.039
P value adjustment method: none
~~~
See word doc or do in R
Report:
Weak report:
A chi-square test of independence was performed to examine the relation between sex and age class distributions for people running the New York marathon. There was a significant influence of the sex on the age class proportions p<0.001 between all age classes p<0.05.
weak: only p-value, no direction.
Better report:
Report A chi-square test of independence was performed to examine the relation between sex and age class distributions for people running the New York marathon. There was a significant influence of the sex on the age class proportions χ 2 (3,1000)=50,p between all age classes p< 0.05, Cohen’s w = 0.223.
better with χ 2 value, and effect size.
Great report:
A chi-square test of independence was performed to examine the relation between sex and age class distributions for people running the New York marathon. There was a small significant influence of the sex on the age distribution χ 2 (3,1000)=50,p< 0.001 between all age classes p< 0.05, Cramers V = 0.224.
better! with χ 2 value and Cramers V.
[Could even further improved showing all pairwise comparisons in a table! You can as well use Cohen’s W.]
Implementation: Pairwise Cohen’s w value?
~~~
> pairwise.cohensW <‐ function (tab) {
+ res=matrix(0,nrow=ncol(tab),ncol=ncol(tab))
+ rownames(res)=colnames(res)=colnames(tab)
+ for (i in 1:(ncol(tab)‐1)) {
+ for (j in i:ncol(tab)) {
+ res[i,j]=res[j,i]=sbi$cohensW(tab[,c(i,j)])
+ }
+ }
+ return(res)
+ }
Whats going on here?
~~~
> tab=matrix(c(12,22,16,44,24,43),nrow=2)
> tab=as.table(tab)
> tab
A B C
A 12 16 24
B 22 44 43
> rownames(tab)=c(“Healthy”,”Ill”)
> tab
A B C
Healthy 12 16 24
Ill 22 44 43
> round(pairwise.cohensW(tab),2)
A B C
A 0.00 0.09 0.01
B 0.09 0.00 0.10
C 0.01 0.10 0.00
~~~
Example: Pairwise Cohen’s w value
What’s going on here?
~~~
> tab=matrix(c(60,40,40,60),nrow=2) # 60/40 vs (50/50, 40/60)
> sbi$cohensW(tab)
[1] 0.2
> tab=matrix(c(120,80,80,120),nrow=2) # same
> sbi$cohensW(tab)
[1] 0.2
> tab=matrix(c(70,30,30,70),nrow=2) # same
> sbi$cohensW(tab)
[1] 0.4
> sbi$cohensW(c(70,30),p=0.5)
[1] 0.4
~~~
One sample test (here against p=0.5)
→ We create a theoretical second group with a reverse distribution! Then the
independence table would have 4 values of 50 in each cell.
Chatgpt:
Here’s what’s happening in your code step by step:
First matrix (tab = matrix(c(60,40,40,60), nrow=2)):
This creates a 2x2 contingency table with values:
60 40
40 60
The cohensW() function is calculating Cohen’s W, which is a measure of effect size for the association between the variables in the table. The result is 0.2, indicating a small to medium effect size.
Second matrix (tab = matrix(c(120,80,80,120), nrow=2)):
This creates a similar table but with doubled values:
120 80
80 120
Despite the table values being larger, Cohen’s W remains the same at 0.2. Cohen’s W is based on the relative proportions, so multiplying the counts by a constant doesn’t change the effect size.
Third matrix (tab = matrix(c(70,30,30,70), nrow=2)):
Here the table is:
70 30
30 70
The result for Cohen’s W is 0.4, indicating a larger effect size compared to the previous two matrices.
sbi$cohensW(c(70,30), p=0.5):
In this line, you’re passing just two numbers (70 and 30), which seems like an attempt to calculate Cohen’s W from a single comparison (possibly from a 2-group comparison). By providing p=0.5, you’re specifying a 50-50 probability distribution (e.g., under the null hypothesis of no effect). This also results in 0.4, matching the Cohen’s W from the third table.
Key Observations:
Effect size calculation (Cohen’s W):
Cohen’s W is sensitive to the relative proportions between the two groups (rows and columns). When the numbers are similar and the proportions are close, Cohen’s W will be smaller (around 0.2 or less, indicating a small effect). When the proportions differ more (as with 70/30 vs 30/70), Cohen’s W increases, indicating a stronger association.
Why the same effect size for different totals?:
The first two matrices (60/40 vs 40/60 and 120/80 vs 80/120) are proportional to each other; the only difference is the scale of the numbers. Cohen’s W, b
What is Relative Risk?
- Another effect size. ___ is ____ times more likely to ___…
- RR=P1/P2
- 16%/28%=0.57 → subjects with AZT where 0.57 as likely as placebo subjects to have disease progression
- 28%/16% = 1.75 → subjects with placebo are 1.75 times as likely as AZT subjects to get disease progression
Be careful about the direction!
Misunderstandings about RR?
How useful is a vaccine with a relative risk of 0.5?
It depends! If we see RR, also need to know number needed to treat [to avoid one case] (NNT)!
Case 1: a very rare disease:
* prevalence of 2 cases in 1 million
* 1 million vaccinations prevent one case → NNT = 1 million
Case 2: more common disease:
* prevalence of 20 cases in 100
* 1 million vaccinations prevent 100.000 cases → NNT = 10
What is the Odds / Odds ratio?
Odds:
* odds = event did occur / event did not occur
* odds = probability / (1 – probability)
* ranges from 0 till Inf
* probability of 0.5 == odds of 1.0
* probability of 0.33 == odds of 0.5
* probability of 0.75 == odds of 3
Odds ratio:
* again from a 2x2 times contingency table
* odds1 / odds2 = odds-ratio
* 0.19 (AZT) / 0.39 (Placebo) = 0.49
* OR = O1 / O2
Cf chi square vs fisher?
- Chi square uses proportions, fisher uses odds ratio (can be > 1)
- Use fisher if one expected number in 2x2 contingency table is <= 5
How does fisher.test work?
- permutations → slower than prop.test
- uses odds ratio
Eg
~~~
> fisher.test(aids.azt)
Fisher’s Exact Test for Count Data
data: aids.azt
p‐value = 9.24e‐06
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.3512693 0.6818650
sample estimates:
odds ratio
0.4905877
~~~
Which test for proportions? Compare:
binom.test prop.test fisher.test chisq.test
—————————————————————————————–
counts ?
[SPICKER]
----------------------------------------------------------------------------------------- binom.test prop.test fisher.test chisq.test ----------------------------------------------------------------------------------------- counts | >= 0 | >= 5! | >= 0 | >= 5! ----------------------------------------------------------------------------------------- groups | 1 | 1-2 | 2 | >= 2 ----------------------------------------------------------------------------------------- speed | fast | fast | slow | fast ----------------------------------------------------------------------------------------- expected | No | No | No | Yes ----------------------------------------------------------------------------------------- conf.int | prop | prop | odds | none -----------------------------------------------------------------------------------------
Which test for proportions? Compare:
binom.test prop.test fisher.test chisq.test
—————————————————————————————–
Groups?
----------------------------------------------------------------------------------------- binom.test prop.test fisher.test chisq.test ----------------------------------------------------------------------------------------- counts | >= 0 | >= 5! | >= 0 | >= 5! ----------------------------------------------------------------------------------------- groups | 1 | 1-2 | 2 | >= 2 ----------------------------------------------------------------------------------------- speed | fast | fast | slow | fast ----------------------------------------------------------------------------------------- expected | No | No | No | Yes ----------------------------------------------------------------------------------------- conf.int | prop | prop | odds | none -----------------------------------------------------------------------------------------
Which test for proportions? Compare:
binom.test prop.test fisher.test chisq.test
—————————————————————————————–
speed ?
----------------------------------------------------------------------------------------- binom.test prop.test fisher.test chisq.test ----------------------------------------------------------------------------------------- counts | >= 0 | >= 5! | >= 0 | >= 5! ----------------------------------------------------------------------------------------- groups | 1 | 1-2 | 2 | >= 2 ----------------------------------------------------------------------------------------- speed | fast | fast | slow | fast ----------------------------------------------------------------------------------------- expected | No | No | No | Yes ----------------------------------------------------------------------------------------- conf.int | prop | prop | odds | none -----------------------------------------------------------------------------------------
Which test for proportions? Compare:
binom.test prop.test fisher.test chisq.test
—————————————————————————————–
expected ?
----------------------------------------------------------------------------------------- binom.test prop.test fisher.test chisq.test ----------------------------------------------------------------------------------------- counts | >= 0 | >= 5! | >= 0 | >= 5! ----------------------------------------------------------------------------------------- groups | 1 | 1-2 | 2 | >= 2 ----------------------------------------------------------------------------------------- speed | fast | fast | slow | fast ----------------------------------------------------------------------------------------- expected | No | No | No | Yes ----------------------------------------------------------------------------------------- conf.int | prop | prop | odds | none -----------------------------------------------------------------------------------------
Which test for proportions? Compare:
binom.test prop.test fisher.test chisq.test
—————————————————————————————–
CI ?
----------------------------------------------------------------------------------------- binom.test prop.test fisher.test chisq.test ----------------------------------------------------------------------------------------- counts | >= 0 | >= 5! | >= 0 | >= 5! ----------------------------------------------------------------------------------------- groups | 1 | 1-2 | 2 | >= 2 ----------------------------------------------------------------------------------------- speed | fast | fast | slow | fast ----------------------------------------------------------------------------------------- expected | No | No | No | Yes ----------------------------------------------------------------------------------------- conf.int | prop | prop | odds | none -----------------------------------------------------------------------------------------
Six points for statistics of proportions and frequencies?
Descriptive:
1. center (modus)
2. spread (proportions)
3. visualization
Inferential:
4. testing (larger tables posthoc test: pairwise.prop.test)
5. effect size calculation, cohensW
6. report
Simpson’s paradox (Simpson 1951; Blyth 1972) is a phenomenon in probability and statistics in which a trend appears in several groups of data but ______________ when the groups are __________.
This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when __________________ are appropriately addressed in the statistical modeling (e.g., through ______________. (Wikipedia 2023)
Simpson’s paradox (Simpson 1951; Blyth 1972) is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined.
This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling (e.g., through cluster analysis. (Wikipedia 2023)
What is Simpsons Paradox?
problem of unbalanced designs or third variable (confounder) effects
Calculations
Approx: Rules of three, five and seven …?
- if the numerator is zero (zero observations) rule of three: CI95%[0,3/n]
- if the numerator is one (one observation) rule of five CI95%[0,5/n]
- if the numerator is two (two observations) rules of seven CI95%[0,7/n]
Example 1 out of 40 persons died during therapy the upper limit of the 95%CI for the population is 5/40:
> 5/40 [1] 0.125 > prop.test(1,40) 1‐sample proportions test with continuity correction data: 1 out of 40, null probability 0.5 X‐squared = 34.225, df = 1, p‐value = 4.909e‐09 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.001306388 0.147342139 sample estimates: p 0.025
________________________________________
Chatgpt:
This explanation is about approximate confidence intervals (CIs) for rare events based on small sample sizes, using rules of three, five, and seven as rough estimates. Let’s break it down:
The Rules of Three, Five, and Seven
These rules provide a quick way to estimate the upper bound of a 95% confidence interval (CI) for a proportion when the number of observed cases (numerator) is very small.
- Rule of Three (0 observations):
If you observe zero events in a sample of size nnn, the upper bound of the 95% confidence interval is 3/n3.
Example: If no patients died in a study of 40 people, the estimated upper limit for the death rate is 3/40=0.0753/40 = 0.0753/40=0.075 (or 7.5%). - Rule of Five (1 observation):
If you observe one event in nnn, the upper bound is 5n\frac{5}{n}n5.
Example: If 1 person out of 40 dies, the upper limit of the CI is 5/40=0.1255/40 = 0.1255/40=0.125 (or 12.5%). - Rule of Seven (2 observations):
If you observe two events in nnn, the upper bound is 7n\frac{7}{n}n7.
Example: If 2 out of 40 die, the estimated upper limit is 7/40=0.1757/40 = 0.1757/40=0.175 (or 17.5%).
These are simple approximations often used when doing quick back-of-the-envelope calculations for rare events.
________________________________________
Example: 1 out of 40 persons died during therapy
* The rule of five says the upper bound of the 95% CI is: 540=0.125(12.5%)\frac{5}{40} = 0.125 \quad (12.5\%)405=0.125(12.5%)
* The actual proportion test (prop.test) gives a 95% confidence interval of [0.0013, 0.1473], which is close to the estimate from the rule of five.
________________________________________
These rules are derived from the Poisson approximation to the binomial distribution, which works well for small numbers of rare events. They’re commonly used in biostatistics, epidemiology, and risk assessment when you have very few observations and need a quick estimate.
Would you like to see a deeper statistical explanation or a derivation of these rules?
(QUIZ 2)
Girls Boys -------------------------- Yes 3 2 No 14 9 -------------------------- Which of the following tests is appropriate to check for significant difference between the groups:: * chisq.test * fisher.test * kjs.test * prop.test
fisher.test
very few expected values in t smoker row (<5) → fisher is better than chisq.test
(QUIZ 2)
The effect size reported by the Fisher test is the ______. This effect size measure ______ is only usable or 2x2 contingency tables whereas ______ is applicable as well for larger contingency tables. The ______can sometimes be misleading in medicine. It is often more important to know the ______.
The effect size reported by the Fisher test is the odds ratio. This effect size measure _Cohen’s h _are only usable or 2x2 contingency tables whereas Cohen’s w is applicable as well for larger contingency tables. The _relative risk _can be sometimes misleading in medicine. It often more important to know the _number needed to treat (NNT) _.