Statistics Flashcards
Measure of Central Tendency
- Mode – defined as the value that occurs most often. Best for data which is allocated into distinct categories (nominal data). The mode is the value which occurs most frequently in a given set of data.
- Median - defined as the value that occurs at the middle of all values of the variable (half are greater, half are less). not affected by extreme values. Good for all levels of measurement except nominal data. Especially good for skewed distributions
- Mean - defined as arithmetic average. The most frequently used measure of central tendency. Uses all values of data. Highly sensitive to extreme values (especially skewed distributions)
Variance
- the measure of the spread where the mean is the measure of the central tendency.
- Variance is the corrected sum of squares about the mean [σ (x − mean)2 / (n − 1)].
- The variance is a quantity equal to the square of the standard deviation
Standard deviation
- the square root of the variance (the use of the square root gives the same dimension as the data).
- For reasonably symmetrical bell-shaped data, one standard deviation (SD) contains roughly 68% of the data, two SD contains roughly 95% of the data and three SD contains around 99.7% of the data (Figure 1.2).
- A normal distribution is defined uniquely by two parameters,
- the mean
- the SD of the population.
- Other features of a normal distribution include that it is symmetrical (mean = mode = median) and that the data are continuous.
- The standard deviation is a quantity calculated to indicate the extent of deviation for a group as a whole.
Standard error (SE) of the mean
defined as the SD divided by the square root of the sample size. Used in relation to a sample rather than the population as a whole. It can be thought of as being equivalent to the SD for the true mean, i.e. 68% confidence that the population mean lies within one SE of the calculated (sample) mean, 95% confidence that population mean lies within two SEs of the sample mean, 99.7% for three SEs. The formula does not assume a normal distribution.
Confidence interval
two SEs either side of the sample mean determines the 95% CI of the mean (i.e. we are confident that the true population mean lies within this range of values).
Confidence intervals are preferred to P values (see below) because:
- CIs relate to the sample size;
- a range of values is provided;
- CIs provide a rapid visual impression of significance;
- CIs have the same units as the variable. When an unknown value is sought, the confidence interval gives the statistician a set of parameters within which the “true” value is located. The confidence interval is used to indicate the reliability of an estimate
Type 1 Error ( Alpha Error)
- A type I (α) error occurs when a difference is found, but in reality there is not a difference (i.e. a false-positive result, and therefore the null hypothesis is rejected incorrectly).
- This is one of those 5% of cases where the difference occurred by chance.
- We can protect against type I errors by reducing significance levels, although this increases the risk of a type II error occurring.
- Note that the risk of a type I error decreases as the acceptable P value is lowered, but then bigger study samples are needed in order to protect against a type II error.
Bonferroni correction
A Bonferroni correction is a post-hoc statistical correction made to P values when several dependent or independent statistical tests are being performed simultaneously on a single data set.
Type-II errors, or beta errors
- false-negative result, and therefore the null hypothesis is accepted incorrectly
- occur when the null hypothesis is accepted when it should be rejected.
- This often occurs when studies are underpowered. In the example above, the null hypothesis is that repair of the capsule does not reduce dislocations within the first three months. Since the first study did not show a statistically significant difference, the null hypothesis was accepted.
- Since a more powered study showed that repair of the capsule does reduce dislocations, the null hypothesis should have been rejected in the initial study (if it was adequately powered).
- We can protect against type II errors with statistical power. Note that type I and type II errors are related inversely. Type II errors are common in orthopaedic studies
A type III (γ)
error occurs rarely when the researcher correctly rejects the null hypothesis but incorrectly attributes the cause
Power of Study
- The study power is defined as the ability of a study to detect the difference between two interventions if one in fact exists
- The power of a statistical test is correlated to the magnitude of the treatment effect, the designated type I (alpha) and type II (beta) error rates, and the sample size n.
- The power is equal to (1-beta) whereby beta is the false negative rate
- The statistical power is therefore the probability that the test will correctly reject the null hypothesis
- Factors affecting power analyses
- Size of the difference between the means (the larger the difference, the easier it is to detect a difference and the greater the power).
- Spread/variability of the data (the larger the spread, the less likely a difference will be detected).
- Acceptable level of significance (i.e. the P value that is set).
- Sample size (power increases with increasing sample size).
- Experimental design (e.g. within subjects versus between subjects).
- Type of data (parametric versus non-parametric).
Non-parametric data are observations which can be expressed as a dichotomous (yes or no) outcome such as gender.
Features of non-parametric tests •
- No assumptions are made about the origins of the data.
- No limitations on types of data.
- Rank order of values.
- Less likely to be significant. Decreased power for a given n.
- Cannot relate back to any parametric properties of the data.
Parametric (continuous)
- Parametric data, such as age, are observations for which difference between the numbers have meaning on a numerical scale
Features of parametric tests ·
- Assumes the data were sampled from a normal population.
- Observations must be independent.
- Populations must have the same variance.
- Can use absolute difference between data points.
- Increased power for a given sample size (n).
Evidence-based medicine (EBM) is an approach to medical practice intended to optimize decision-making by emphasizing the use of evidence from well-designed and well-conducted research.
Five steps of EBM
- formulate an answerable question
- gather the evidence
- appraise the evidence
- implement the evidence
- evaluate the process
Different Levels of Evidence
- Level 1 Randomized controlled trial (RCT) - a study in which patients are randomly assigned to the treatment or control group and are followed prospectively
- Meta-analysis of randomized trials with homogeneous results
- Level 2 Poorly designed RCT - follow up less than 80
- Prospective cohort study
- (therapeutic) - a study in which patient groups are separated non-randomly by exposure or treatment, with exposure occurring after the initiation of the study
- Meta-analysis of Level 2 studies
- Prospective cohort study
- Level 3
- A retrospective cohort study a study in which patient groups are separated non-randomly by exposure or treatment, with exposure occurring before the initiation of the study
- Case-control study - a study in which patient groups are separated by the current presence or absence of disease and examined for the prior exposure of interest
- Meta-analysis of Level 3 studies
- Level 4 Case series - a report of multiple patients with the same treatment, but no control group or comparison group
- Level 5 Case report, Expert opinion, Personal observation
A funnel plot
- graph designed to check for the existence of publication bias;
- funnel plots are commonly used in systematic reviews and meta-analyses.
- In the absence of publication bias, it assumes that studies with high precision will be plotted near the average, and studies with low precision will be spread evenly on both sides of the average, creating a roughly funnel-shaped distribution.
- Deviation from this shape can indicate publication bias