Stats Flashcards
Bias
Any factor that moves the findings of a study away from the truth
Binary data
Data where there are only two possible values such as survived/died; also known as dichotomous data
Blinding in a randomized controlled trial
When the treatment allocation is concealed from either the subject or the assessor or both
Case-control studies
Observational study that starts with cases with a disease and compares them with controls without the disease to investigate possible risk factors
Chi-squared goodness of fit test
A statistical test used to investigate whether a frequency distribution follows a specific theoretical distribution
Chi-squared test
A statistical test used to investigate the association between two categorical variables
Cluster Analysis
A statistical method used to identify groups or clusters of individuals who have common features in terms of known variables
Cluster randomization
When groups of individuals are allocated to treatments so that all subjects in a group receive the same treatment
Cohort study
Observational study that starts with a sample of individuals who are disease-free and measures possible causal factors at baseline and over time. The cohort of subjects is followed and their disease status is observed to investigate which factors are linked to the disease
Confidence interval (CI)
A range of values that indicates the precision of an estimate; for a 95% CI we can be 95% confident that the interval contains the true value
Continuous data
Data that lie on a continuum and so can take any value between two limits
Cox proportional hazards regression
A multifactorial regression model used with a time-to-event outcome
Crossover trial
A single group study where each patient receives each of two or more treatments in turn so that they act as their own control
Degrees of freedom (DF or df)
A quantity used in statistical testing and modelling that is related to the size of the sample and the number of parameters that have been estimated
Dummy variables
Used in regression modelling to enable a categorical predictor variable to be included, by converting a variable with n categories into n–1 binary variables, where one category is the reference category
Equivalence trial
A trial that aims to see if a new treatment is no better or worse than an existing one
Fisher’s exact test
A statistical test that can be used to investigate the association between two categorical variables when the sample is small
Forest plot
A graph used to display individual study estimates and confidence intervals, and the pooled estimate and confidence interval in a meta-analysis
Gold standard test
A diagnostic test that is regarded as definitive, i.e. it gives the correct answer
Funnel plot
A simple graphical method for exploring the results from studies to see if publication bias might be present
Hazard ratio
Hazard ratio In survival analysis, the ratio of hazards or risks of outcome in two groups
Heterogeneity
Where there is statistical variability between estimates such as may be found in a meta-analysis
Incidence
The number of new cases of a given condition occurring within a specific time period
Indirect standardization
Gives the standardized mortality ratio (SMR), which is the ratio of the observed number of deaths in the comparison population and the number expected if that population had the same age-specific death rates as the standard population
Intention to treat analysis
Statistical analysis where patients are analysed in the treatment group to which they were originally randomly allocated even if they did not actually receive that treatment
Logistic regression
A multifactorial regression model used with a binary outcome
Logrank test
A statistical test used to compare time-to-event data in two or more groups
Meta-analysis
A statistical analysis which combines the results of several independent studies examining the same question
Multifactorial methods
Statistical models fitted to datasets with one outcome variable and several predictor variables; used to disentangle effects
Multiple regression
A multifactorial regression model used with a continuous outcome
Negative predictive value
The proportion of those found negative on a diagnostic test who are truly negative
Normal distribution
A continuous probability distribution with a symmetrical bell shape, which is followed by many naturally occurring variables
Number needed to harm
The number of patients who need to be treated in order that one additional patient has a negative outcome
Number needed to treat
The number of patients who need to be treated in order that one additional patient has a positive outcome
Observational study
A study in which subjects are observed, with exposures and outcomes measured, without any intervention by the researcher
Odds
The probability of an event occurring divided by the probability of it not occurring
Odds ratio
A measure of the difference in odds between two groups, calculated by dividing the odds in one group by the odds in another group
One-way analysis of variance
A statistical test used to compare the means from three or more independent samples
Parallel group trial
A trial in which subjects are allocated to receive one of two or more possible treatments and the comparison of different treatments is made between treatment groups
Pearson’s correlation
A measure of the strength of linear relationship between two continuous variables
Placebo
An inert treatment which is indistinguishable from the active treatment
Poisson regression
A multifactorial regression model used to model rates
Positive predictive value
The proportion of those found positive on a diagnostic test who are truly positive
Posterior distribution
A probability distribution obtained by combining prior evidence with new information
Power
The probability that a statistical test will find a significant difference if a real difference of a given size exists, i.e. the null hypothesis is not true
Predictor variable
In regression analysis, a variable which is used to predict the value of an outcome variable
Prevalence
The proportion of individuals with a condition within a specific population at a given time (point prevalence) or over a given time period (period prevalence)
Principal components analysis
A statistical method used to reduce a dataset with many inter-correlated variables to a smaller set of uncorrelated variables that explain the overall variability almost as well
Publication bias
A bias that occurs when the papers which are published on a topic are an incomplete subset of all the studies which have been conducted on that topic
Rank correlation
A non-parametric measure of the relationship between two variables, using the ranks of the data rather than the data values themselves
Receiver operating characteristic (ROC) curve
A graph plotting the sensitivity against 1–specificity for a diagnostic test at different cut-off points
Relative risk (RR)
A measure of the difference in risk between two groups, calculated by dividing the risk in the exposed group by the risk in the unexposed group (also known as risk ratio)
Risk ratio
A measure of the difference in risk between two groups, calculated by dividing the risk in the exposed group by the risk in the unexposed group (also known as relative risk)
Selection bias
A statistical bias introduced by the way in which subjects are selected for a research study
Sensitivity
The proportion of those who have the disease who are correctly identified by the diagnostic test as positive
Sensitivity analysis
A way of testing assumptions made in statistical analyses by doing several analyses based on different assumptions, and comparing the results
Significance level
The probability that a statistical test rejects the null hypothesis when no real difference exists, i.e. the null hypothesis is true (type 1 error)
Simple linear regression
A statistical method to estimate the nature of the linear relationship between two continuous variables
Skewed data
Data that do not follow a symmetrical distribution
Specificity
The proportion of those who do not have the disease who are correctly identified by the diagnostic test as negative
Standard deviation (SD)
A measure of dispersion used for continuous data; is equal to the square root of the variance
Standard error (SE)
A measure of precision of an estimated quantity that is equal to the standard deviation of the sampling distribution of the quantity
Stem and leaf plot
A graph which uses the data values themselves to depict the shape of a frequency distribution
Superiority trial
A trial which aims to see if one treatment is better than another
t test
A statistical test used to compare the means from two independent samples
Transformation
A function applied to a dataset to better fit a specific probability distribution, for example applying a logarithmic transformation to skewed data to make it fit a Normal distribution
Two-way analysis of variance
A statistical method used to investigate the effects of two factors on a continuous outcome
Type 1 error
Getting a significant result in a sample when the null hypothesis is in fact true in the underlying population
Type 2 error
Getting a non-significant result in a sample when the null hypothesis is in fact false in the underlying population (‘false non-significant’ result)
Variable
A quantity that is measured or observed in an individual and which varies from person to person
Washout period
The time interval between the administration of different treatments in subjects in a crossover trial that prevents there being any carry-over effects of the current treatment when the next treatment starts
Wilcoxon matched pairs test
A statistical test comparing ordinal data from paired sample
Wilcoxon signed rank test
A statistical test comparing ordinal data from two independent groups; equivalent to the Mann Whitney U test
Z-test for proportions
A statistical test used to compare proportions from two independent samples
Stratification for prognostic factors
important prognostic factors that need to be accounted for in a particular trial, the random allocation can be stratified so that the treatment groups are balanced for the prognostic factors.
Minimization
Allocation takes place in a way that best maintains balance in important prognostic factors. At all stages of recruitment, the next patient is allocated to that treatment which minimizes the overall imbalance in prognostic factors
Advantages of parallel group study design
The comparison of the treatments takes place concurrently
Can be used for any condition, especially an acute condition which is cured or self-limiting such as an infection
No problem of carry-over effects
Disadvantages of parallel study group designs
The comparison is between patients and so usually needs a bigger sample size than the equivalent cross-over trial
Advantages of crossover study designs
Treatments are compared within patients and so differences between patients are accounted for explicitly
Usually need fewer subjects than the equivalent parallel group trials
Can be used to test treatments for chronic conditions
Disadvantages of crossover study designs
Cannot be used for many acute illnesses
Carry-over effects need to be controlled
Likely to take longer than the equivalent parallel designs
Statistical analysis is more complicated if subjects do not complete all periods
Zelen Randomised Consent Design
Subjects are randomly allocated to treatment or usual care
Only those subjects who are allocated to treatment are invited to participate and to give their consent
Subjects allocated to usual care (control) are not asked to give their consent
Among the treatment group, some subjects will refuse and so this design results in three treatment groups1,2
- Usual care (allocated)
- Intervention
- Usual care (but allocated to intervention)
The analysis is performed with patients analysed in the original randomized groups, i.e. 1 versus 2 + 3 (Research design Intention to treat analysis)