Biostatistics-Epidemiology Flashcards
Wilcoxon Signed Rank Test
Wilcoxon rank-sum test is used to compare two independent samples, while Wilcoxon signed-rank test is used to compare two related samples, matched samples, or to conduct a paired difference test of repeated measurements on a single sample to assess whether their population mean ranks differ.
Fischer Test
is a statistical significance test used in analysis of contingency tables. Useful for categorical data that results from classifying objects in two different ways.
Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis (e.g., p-value) can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.
Fisher is said to have devised the test following a comment from Muriel Bristol, who claimed to be able to detect whether the tea or the milk was added first to her cup. He tested her claim in the “lady tasting tea” experimen
Factorial trial
Factorial trial evaluates two or more treatments simultaneously
Example - ISIS-2 trial which evaluated Aspirin and Streptokinase in STEMI.
In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or “levels”, and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design. Such an experiment allows the investigator to study the effect of each factor on the response variable, as well as the effects of interactions between factors on the response variable.
For the vast majority of factorial experiments, each factor has only two levels. For example, with two factors each taking two levels, a factorial experiment would have four treatment combinations in total, and is usually called a 2×2 factorial design. In such a design, the interaction between the variables is often the most important. This applies even to scenarios where a main effect and an interaction are present.
If the number of combinations in a full factorial design is too high to be logistically feasible, a fractional factorial design may be done, in which some of the possible combinations (usually at least half) are omitted.
Other terms for “treatment combinations” are often used, such as runs (of an experiment), points (viewing the combinations as vertices of a graph, and cells (arising as intersections of rows and columns).
Confidence interval
percentage chance within which true value lies.
Example -
95% CI means - ranges within which there is a 95% chance that true value lies
similarly
95% CI around a difference are the range in which there is 95% chance that true difference lies.
Forest Plot
Forest plots are most commonly used in meta-analysis of many individual trials for data presentation. Separate trials are compared using it. Horizontal lines emerging from squares represent confidence intervals. Largest studies have narrow confidence intervals.
Area of squares is proportional to the number of events in each study.
Bradford Hill Criteria
The Bradford Hill criteria, otherwise known as Hill’s criteria for causation, are a group of nine principles that can be useful in establishing epidemiologic evidence of a causal relationship between a presumed cause and an observed effect and have been widely used in public health research. They were established in 1965 by the English epidemiologist Sir Austin Bradford Hill.
In 1996, David Fredricks and David Relman remarked on Hill’s criteria in their seminal paper on microbial pathogenesis.
9 Criterias in Bradford Hill Causation
In 1965, the English statistician Sir Austin Bradford Hill proposed a set of nine criteria to provide epidemiologic evidence of a causal relationship between a presumed cause and an observed effect. (For example, he demonstrated the connection between cigarette smoking and lung cancer.) The list of the criteria is as follows:
## Strength (effect size): A small association does not mean that there is not a causal effect, though the larger the association, the more likely that it is causal.
## Consistency (reproducibility): Consistent findings observed by different persons in different places with different samples strengthens the likelihood of an effect.
## Specificity: Causation is likely if there is a very specific population at a specific site and disease with no other likely explanation. The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship.[1]
## Temporality: The effect has to occur after the cause (and if there is an expected delay between the cause and expected effect, then the effect must occur after that delay).
## Biological gradient (dose–response relationship): Greater exposure should generally lead to greater incidence of the effect. However, in some cases, the mere presence of the factor can trigger the effect. In other cases, an inverse proportion is observed: greater exposure leads to lower incidence.[1]
## Plausibility: A plausible mechanism between cause and effect is helpful (but Hill noted that knowledge of the mechanism is limited by current knowledge).
## Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect. However, Hill noted that “lack of such [laboratory] evidence cannot nullify the epidemiological effect on associations”.
## Experiment: “Occasionally it is possible to appeal to experimental evidence”.
## Analogy: The use of analogies or similarities between the observed association and any other associations.
Some authors[3] consider, also, Reversibility: If the cause is deleted then the effect should disappear as well.
Scatter Plot
Can suggests various kind of correlation between variables with a certain comfidence interval.
A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.
Correlates may be rising, falling or null
Student t Test
The Student’s t test (also called T test) is used to compare the means between two groups and there is no need of multiple comparisons as unique P value is observed, whereas ANOVA is used to compare the means among three or more groups.[4,5] In ANOVA, the first gets a common P value.
Unpaired t Test: used to compare the average values of the two independent groups , Ex - Patients with disease and without it.
Paired t Test: used if members of the groups are paired. Ex - disease person with healthy person of same sex/age etc
Sensitivity
Numbers of true positives detected by the test divided by the numbers of all true positives in the population tested
Number needed to Treat (NNT)
Number needed to treat to prevent death
Example:
Patients dying on Aspirin Rx - 9.4%
Patients dying not on aspirin - 11.8%
Absolute Risk reduction = 11.8 - 9.4 = 2.4%
Relative Risk reduction = 2.4/11.8 = 0.21 = 21%
NNT = 1/0.024 = 42
means 42 patients needed to treat with aspirin to prevent 1 death
P value
probability of obscuring a difference of the observed magnitude if the null hypothesis is true ie it measures compatibility of the data with null hypothesis.
between 0 and 1
near to 0 = low compatibility with null hypothesis
Chi Square test
type of parametric test
used when - data is binary, samples not paired,
N cell > 5
Any statistical hypothesis test in which sample distribution of the test statistic is chi-square distribution when the null hypothesis is true - is used to determine whether there is significant difference between the expected frequencies and observed frequencies in one or more categories.
Spearman’s Rank Correlation
a non parametric measure of statistical dependance between two variables. It assesses how well the relationship between two variables can be described using a monotoni function.
Pearson Co-efficient of Linear Co-relation
a type of parametric test
used when - linear association between two variables denoted by gamma’. Inidcates how closely point lie to a line. Takes values between -1 and +4
Closer to zero, less linear association.
Negative values of gamma indicates one variable decreases as other increases (Example - CD count falls with increasing age)
Positive Predictive Value
Proportion of those who test positive to who actually have the disease
formula = a/a+b