29‐34 Biostatistics Flashcards
What is a population?
All individuals • Not to be confused with the “study population”, which is simply the final group of individuals selected for a study
What is a sample?
A subset or portion of the full population (“representatives”) • Useful when studying the complete population is not feasible • Random processes commonly utilized to draw sample
What are statistical analyses?
Comparisons made in relation to Null Hypothesis
What inferences will be made about the results?
o Inferences will be made about the sample‐derived measurements and their comparisons (in relation to Null Hypothesis) Inferences will also be made to the full population of similar subjects (generalizability)
On what variables will data be collected?
Dependent variable(s) [outcome variables] Independent variables
What is a null hypothesis?
A research perspective which states there will be no (true) difference between the groups being compared Most conservative and commonly utilized
What are the various statistical‐perspectives can be taken by the researcher?
• Superiority • Noninferiority • Equivalency
What is an alternate hypothesis?
A research perspective which states there will be a (true) difference between the groups being compared
What are 2 key attributes of data measurement?
- Magnitude (or Dimensionality) 2. Consistency of scale (or Fixed Interval) Equal, measurable spacing between units
What is another attribute of data measurement?
- Rational/Absolute Zero Each attribute can be assessed with a “Yes” or “No” response
What are the 3 categories for data (variables) measurement that ultimately determine the statistical test?
Nominal, Ordinal, Interval/Ratio
Define nominal
o Dichotomous/Binary; Non‐Ranked Named Categories
o No Magnitude / No Consistency of scale / No Rational Zero
o Nominal variables are simply labeled variables without quantitative characteristics
True or false, ALL data that is categorized into two categories is instantly nominal.
True
Define ordinal
Ranked Categories; Non‐Equal‐Distance
o Yes Magnitude / No Consistency of scale / No Rational Zero
Define interval/ratio
Order; Magnitude; Equal Intervals‐of‐scale (units)
o Yes Magnitude / Yes Consistency of scale / No or Yes Rational Zero (No‐Interval; Yes‐Ratio)
o Number of Living Siblings & Personal Age (in years)
True or false, after data is collected, we can appropriately go down in specificity/detail of data measurement (levels), but never up.
True
Is the ratio level absolute zero?
yes
In interval data measurement, what is meaningful?
distance
In what level can attributes be ordered?
ordinal
Which data measurement level is the weakest?
nominal
Which level are the attributes only named?
nominal
Which levels are discrete?
nominal and ordinal
Which levels are continuous?
interval
True or false, all statistical tests are selected based on level of data being compared.
true
What are the measures of central tendency & dispersion?
Mode / Median / Mean Outliers Minimum / Maximum / Range Interquartile Range (IQR)
Define variance
difference in each individual measurement value and the groups’ mean
What is standard deviation?
square root of variance value (restores units of mean)
What does the graphical representation depict?
SHAPE of data
What is normal distribution?
Symmetrical When a dataset is normally‐distributed the following values (PARAMETERS) are equal/near equal: • Mean / Median / Mode Equal dispersion of curve “tails” to both sides of mean, median, & mode
What are stats tests useful for normally‐distributed data called?
Parametric tests
What is a positively skewed distribution?
Asymmetrical distribution with one “tail” longer than another A distribution is skewed anytime the median differs from the mean • When mean is higher than median, “positive skew”. • Tail pointing to the right positive skew (skew to right): mean > median
What is a negatively skewed distribution?
Asymmetrical distribution with one “tail” longer than another A distribution is skewed anytime the median differs from the mean • When mean is lower than median, “negative skew”. • Tail pointing to the left positive skew (skew to right): mean
Define skewness
A measure of the asymmetry of a distribution o The perfectly‐normal distribution is symmetric and has a skewness value of 0
What is kurtosis?
A measure of the extent to which observations cluster around the mean. For a normal distribution, the value of the kurtosis statistic is 0. Positive kurtosis – more cluster Negative kurtosis ‐ less cluster
How do you handle interval data not normally‐distributed?
o Use a statistical test that does not require the data to be normally‐distributed (non‐parametric tests), or o Transform data to a standardized value (z‐score or log) hoping transformation allows data to be normally‐distributed
What are the three required assumptions of interval data?
- Normally‐distributed 2. Equal variances Multiple tests available to assess for equal variances between groups 3. Randomly‐derived & Independent
True or false, ALWAYS run Descriptive Statistics & Graphs for not normally-distributed interval data.
True
Define Power (1‐β)
The ability of a study design, its methodology, and the selected test statistic to detect a true difference if one truly exists between group‐comparisons, and therefore… The level of accuracy in correctly accepting/rejecting the Null Hypothesis (analogous to Sensitivity in screenings)
True or false, the larger the sample size, the greater the likelihood (ability) of detecting a difference if one truly exists.
true
Sample Size Determination
- Minimum difference between groups deemed significant o The smaller the difference between groups necessary to be considered “significant” (important), the greater number needed (“N”) 2. Expected variation of measurement (known or estimated) 3. Alpha (Type 1) & Beta (Type 2) Error Rates (Power) Add in anticipated drop‐outs or loss to follow‐ups
Two Basic Questions for Stats
- What is the single measurement value most likely to represent the true (yet unknown) difference or relationship between the groups being compared and what is the probability the difference has occurred by chance? A. Addressed by the p value derived from a statistical test 2. What is the plausible range of possible difference or relationship within which we believe the true difference or relationship may lie? A. Addressed by the confidence interval (CI)
Define p value.
Statistical tests determine possible differences or relationships between variables 1. A test statistic value is calculated, then 2. The test statistic value is compared to the appropriate table of probabilities for that test, then 3. A probability (p) value is obtained; based on the probability of observing, due to chance alone, a test statistic value as extreme or more extreme than actually observed if groups were similar (not different) The probability is selected by investigators before the study starts (a priori)
What is statistical significance?
If the p value is lower than the pre‐selected a priori value (customarily 5% (0.05))* then we say it’s statistically significant o Based on an acceptably‐low probability (less than 5%) that the value of the test statistic could be as large as it is by chance alone if the groups were similar if
What is a type I error?
Rejecting the Null Hypothesis when it is actually TRUE, and you should have accepted it! There really is no true differences between the groups being compared but you (in error) reject the Null Hypothesis thereby ultimately stating that you believe there is a difference between groups (when there really is NOT!) • Analogous to a false positive in medical screenings
What is a type II error?
Not rejecting the Null Hypothesis when it is actually FALSE, and you should have rejected it! There really IS a true difference between the groups being compared but you (in error) do NOT reject the Null Hypothesis thereby ultimately stating that you believe there is no difference between groups (when there really IS!) • Analogous to a false negative in medical screenings
What are the possible interpretations of a pre‐set (a priori) p value?
o The probability of making a Type 1 error if the Null Hypothesis is rejected o The probability of erroneously claiming a difference between groups when one does not really exist o The probability of the outcome of the group’s differences occurring by chance o The probability of obtaining group differences as great or greater if the groups were actually the same/equal o The probability of obtaining a test statistic as high/higher if the groups were actually the same/equal
Describe confidence intervals.
most common selections are 90%, 95%, or 99% o CI’s (a high and a low value) are calculated at an a priori percentage of confidence that statistically the real (yet unknown) difference or relationship resides o Based on: Variation in sample (V/SD), and Sample size (N)
True or false, journals are moving away from solely reporting p values; or showing them at all.
true
True or false, comparisons of groups generates only a single‐point estimate of the “true” yet unknown difference (0) or relationship (1) between groups.
true
Describe the interpretation of a 95% CI.
We are 95% confident that the “true” difference (0) or relationship (1) between the groups is contained within the confidence interval range.
Describe the interpretation of a 95% CI without a p value.
If CI crosses 1.0 (for RATIOS (OR/RR/HR) or 0.0 (for other comparisons (e.g., interval variables) = Not Significant [(p>0.05)]
What is the depicted graph called?

forest plot
When do you ask the following question?
Does “statistical” significance confer meaningful, “clinical” significance?
ALWAYS ask this question when reviewing the findings of a study
What are the 4 KEY QUESTIONS to Selecting the Correct Statistical Test?
- What TYPE OF DATA is being collected/evaluated?
- What TYPE OF COMPARISON/ASSESSMENT is desired?
- *HOW MANY GROUPS are being compared?
- *Is the data INDEPENDENT or RELATED (PAIRED)?
What is an example of an ordinal data measurement?
pain scale with faces or numbers
What are accompanying questions to the 4 key questions?
Does the data have MAGNITUDE? (yes/no)
Does the data have a fixed, measureable INTERVAL along the entire scale? (yes/no)
Is data from the same (paired) or different groups (independent)?
Describe the types of correlation tests.
– Nominal Correlation test = Contingency Coefficient
– Ordinal Correlation test = Spearman Correlation
– Interval Correlation test = Pearson Correlation
– p>0.05 for a Pearson Correlation just means there is no linear correlation; there may still be non‐linear correlations present!
Ø All Correlations can be run as a “partial correlation” to control for confounding
Describe correlation (r).
Provides a quantitative measure of the strength & direction of a relationship between variables
Values range from ‐1.0 to +1.0
Describe survival tests.
Compares the proportion of, or time‐to, event occurrences between groups Commonly represented by a Kaplan‐Meier curve
Describe the types of survival tests.
– Nominal Survival test = Log‐Rank test
– Ordinal Survival test = Cox‐Proportional Hazards test
– Interval Survival test = Kaplan‐Meier test
All can be represented by a Kaplan‐Meier curve
What is regressions?
o Provide a measure of the relationship between variables by allowing the prediction about the dependent, or outcome, variable (DV) knowing the value/category of independent variables (IV’s)
o Also able to calculate OR for a Measure of Association
Describe the types of regression.
– Nominal Regression test = Logistic Regression
– Ordinal Regression test = Multinomial Logistic Regression
– Interval Regression test = Linear Regression
What test is used for nominal data involving 2 groups of independent data?
(Pearson’s) Chi‐square test (x^2)
What test is used for nominal data involving _>_3 groups of independent data?
Chi‐square test of Independence (x^2)
What are the assumptions of the Chi‐square test?
- Usual chi‐square (binomial) distribution for nominal‐type data
- No cell with expected count of <5
What do the (Pearson’s) Chi‐square test and Chi‐square test of Independence compare?
Both tests compares group proportions and if they are different from that expected by chance
What test is used for nominal data involving ≥2 Groups with EXPECTED cell count of <5?
Fisher’s Exact test
Which of the following tests would be most appropriate if the researchers wished to compare the within‐ subjects HgbA1c from baseline to end‐of‐study (assume normal distribution & equal variances) for inhaled technosphere insulin compared to subcutaneous regular human insulin?
A. ANOVA
B. Paired t‐test
C. Wilcoxon signed rank test
D. Kendall test
E. Student‐Newman‐Keul test
B. Paired t test
Which of the following tests would be most appropriate if the researchers wished to compare the mean blood sugar between treatment groups (assume normal distribution & equal variances) for inhaled technosphere insulin compared to subcutaneous regular human insulin?
A. Cochran test
B. Fisher’s exact test
C. Kruskal‐Wallis test
D. Student t‐test
E. Mann‐Whitney test
D. Student t test
Which of the following tests would be most appropriate if the researchers wished to compare, between the 2 treatment groups, the number of days the patient was on therapy before they had a VTE recurrence?
A. ANOVA
B. Chi‐square test
C. Kruskal‐Wallis test
D. Multinomial logistic regression
E. Freidman test
A. ANOVA
Which of the following tests would be most appropriate if the researchers wished to compare the proportion of patients in each of the 2 treatment groups who developed (or didn’t) a recurrent VTE?
A. ANOVA
B. Chi‐square test
C. Kruskal‐Wallis test
D. Multinomial logistic regression
E. Freidman test
B. Chi-square test
If the researchers wished to assess for differences in the time‐to‐event (survival); the event being diagnosis of depression & onset of suicidal ideations, which of the following tests would be most appropriate (assume the data was not normally distributed)?
A. ANOVA
B. Cox proportional hazards test
C. Kaplan‐Meier product‐limit estimate
D. Multinomial logistic regression
E. Freidman test
B. Cox proportional hazards test
Researchers want to conduct a study to identify early predictors of which young children with ADHD are at greatest risk for depression and suicide ideations, judged as present/absent. They performed 7 different psychometric assessments of depression and suicidal behavior. Which of the following tests would be most appropriate for the researchers’ primary purpose?
A. Student t‐test
B. Linear regression
C. ANOVA
D. Logistic regression
E. McNemar test
D. Logistic regression
What is a Validation/Assessment Committee?
o Kappa statistic – agreement between evaluators (consistency of “decisions”, “determinations”)
o Kappa Interpretation: +1 = The observers perfectly “classify” everyone exactly the same way
0 = There is no relationship at all between the observers’ “classifications”, above the agreement that would be expected by chance
‐1 = The observers “classify” everyone exactly the opposite of each other
• Kappa (K) value can be + or ‐; + = good agreement; ‐ = poor agreement
List and describe the interval data post‐hoc tests for 3 or more Group Comparisons.
o Student‐Newman‐Keul test
Compares all pairwise comparisons possible
All groups must be equal in size
o Dunnett test
Compares pairwise comparisons against a single control
All groups must be equal in size
o Dunn test
Compares all pairwise comparisons possible
Useful when all groups are not of equal size
o Tukey or Scheffe tests
Compares all pairwise comparisons possible
All groups must be equal in size
- Tukey test slightly more conservative than the Stu.N.K.
- Scheffe test less affected by violations in normality and homogeneity of variances – most conservative
o Bonferroni correction
Adjusts the p value for # of comparisons being made
• Very conservative
List and describe the interval data tests for ≥3 groups of paired/related data.
o Repeated Measures ANOVA (1 DV)
Compares the means of all groups (along with intra‐ and inter‐ group variations) of related data against a single DV
o Repeated Measures MANOVA (≥2 DVs)
Compares the means of all groups (along with intra‐ and inter‐ group variations) of related data against multiple DV’s
• If 3+ group comparison significant, must perform a post‐hoc test to determine where differences are…
List and describe the interval data tests for ≥3 Groups of Paired/Related Data w/ Confounders?
o Repeated Measures ANCOVA
Compares the means of all groups (along with intra‐ and inter‐group variations) against a single DV while also controlling for the co‐variance of confounders
o Repeated Measures MANCOVA (≥2 DVs)
Compares the means of all groups (along with intra‐ and inter‐group variations) against multiple DV’s while also controlling for the co‐variance of confounders
List and describe the interval data tests for 2 Groups of Paired/Related Data?
o Paired t‐test
Compares the mean values between groups that are related
List and describe the interval data tests for ≥3 Groups of Independent Data w/ Confounders?
o Analysis of Co‐Variance (ANCOVA)
Compares the means of all groups (along with intra‐ and inter‐group variations) against a single DV while also controlling for the co‐variance of confounders
o Multiple Analysis of Co‐Variance (MANCOVA) (≥2 DVs)
Compares the means of all groups (along with intra‐ and inter‐group variations) against multiple DV’s while also controlling for the co‐variance of confounders
List and describe the interval data tests for ≥3 Groups of Independent Data?
o Analysis of Variance (ANOVA) (1 DV)
Both tests compares the means of all groups (along with intra‐ and inter‐group variations) against a single DV
o Multiple Analysis of Variance (MANOVA) (≥2 DVs)
Compares the means of all groups (along with intra‐ and inter‐group variations) against multiple DV’s
• If 3+ group comparison significant, must perform a post‐hoc test to determine where differences are…
List the interval data tests for 2 Groups of Independent Data?
o Student t‐test
List and describe the ordinal Post‐hoc Tests for 3 or more Group Comparisons.
o Student‐Newman‐Keul test
Compares all pairwise comparisons possible
All groups must be equal in size
o Dunnett test
Compares all pairwise comparisons against a single control
All groups must be equal in size
o Dunn test Compares all pairwise comparisons possible
Useful when all groups are not of equal size
List and describe the ordinal data tests for ≥3 Groups of Paired/Related Data.
o Friedman test
Both tests compares the median values between groups
• Each also effective for non‐normally distributed Interval data or don’t meet all parametric requirements
If 3+ group comparison significant, must perform a post‐hoc test to determine where differences are…
What are the KEY WORDS FOR “PAIRED” or “RELATED” DATA?
“Pre‐ vs. Post‐”, “Before vs. After”, “Baseline vs. End”, etc…
List the ordinal data test for 2 Groups of Paired/Related Data.
o Wilcoxon Signed Rank test
List and describe the ordinal data tests for ≥3 Groups of Independent Data.
o Kruskal‐Wallis test
Both tests compares the median values between groups
• Both also used for Interval data not meeting parametric requirements
If 3+ group comparison significant, must perform a post‐hoc test to determine where difference(s) is(are)…
List the ordinal data test for 2 Groups of Independent Data
o Mann‐Whitney test
List and describe the nominal data tests for ≥3 Groups of Paired/Related Data.
o Cochran
Same principle and assumptions as
2 yet mathematically factors in concept of paired, or related, data
Bonferroni test of Inequality (Bonferroni correction)
- Adjusts the p value for # of comparisons being made
- Very conservative
List the nominal data test for 2 Groups of Paired/Related Data.
o McNemar test
List and describe the nominal data tests for ≥3 Groups of Independent Data.
o For statistically significant findings (p<0.05) in 3 or more comparisons, one must perform subsequent analysis (post‐hoc testing) to determine which groups are different:
Multiple X^2 tests NEVER acceptable
• Risk of Type 1 error increases with each additional test! (almost guaranteed after 4‐5 tests)
Bonferroni test of Inequality (Bonferroni correction)
- Adjusts the p value for # of comparisons being made
- Very conservative