Applied Statistics Flashcards
What are the characteristics of the question used to test the hypothesis
PICOT
Patients or Population Intervention(s) or Exposure(s) Comparator Outcome Time
What are the 5 fundamental types of clinical questions
- Therapy
- Harm
- Differential diagnosis
- Diagnosis
- Prognosis
Summarise and classify the different types of study designs
EXPERIMENTAL
- RCT
- Pseudo - / Quasi - RCT
- Non-RCT
OBSERVATIONAL
- Descriptive
- Analytical
- –> Cohort
- –> Cross-sectional
- –> Case-control
What type of studies are the lowest level of evidence and what are these studies used for. What are its advantages and disadvantages
Animal studies
- lowest level of evidence
- Used as hypothesis generating studies
ADV
- Cheaper
- Adequate physiological/metabolic surrogate
- Limits human suffering due to experimentation
DISADV
- Metabolic pathways / pharmacokinetics differ
- Young/no comorbidities
- Defects of methodology (less rigorous / slowly manifesting effects)
Distinguish
- Ecological study
- Case report/series
- Cross sectional surveys
- Case-controlled studies
- Cohort studies
- Randomised Controlled Trials
- Systematic review
- Meta-analysis
- Ecological
- Observational
- Retrospective
- Looks at occurrence and associations in groups - Case report/series
- Observational
- Descriptive
- No control group - Cross sectional surveys (snap-shot)
- Observational
- Descriptive / Analytical / Diagnostic
- Large series of case reports
- No control group - Case-controlled studies
- Observational
- Retrospective
- Historical controls used
- Essentially: choose a group with a shared feature and compare it to another group without that feature.
- Uses Odd’s ratio to quantify risk - Cohort Studies
- Observational
- Longitudinal: Retrospective, Concurrent, Prospective
- Observes exposure, then observes development of disease
- Observes identical control group without exposure
- Uses relative risk to quantify risk - Randomised Controlled Trial
- Experimental
- Randomised
- Prospective
- Interventional
- Analytical - Systematic review
- Answers a defined research question by collecting and summarising all empirical evidence that fits pre-specified eligibility criteria - Meta-analysis
- Use of statistical methods to summarise the results of these studies
What are cross-over trials?
Each patient acts as their own control
Patients ‘cross-over’ from one treatment to the next following a ‘washout period’ between treatments
There is usually randomization
What are self controlled studies
Each patient is their own control
Post treatment measurements in each patient are compared to pre-treatment measurements
With regards to data collection (sampling), What two principles are paramount
- Internal validity
- Sampling should be free from selection bias - External validity
- Sample should represent broader real-world population
Describe and define 4 sampling strategies
- Simple random - Everyone has equal chance of being picked
- Stratified random - Divide into subgroups 1st then random selection
- Clustered random - Treat people as groups (School vs. ICU)
- Convenience sample - non-random selection: just as they come
What does randomisation mean
A representative sample can be chosen by RANDOM sampling, whereby each person is equally likely to be selected.
It means that no systematic bias is introduced and the samples selected should be representative of the populations of interest
What is the CONSORT or STROBE diagram
STROBE - STrengthening the Reporting of OBservational studies in Epidemiology
CONSORT - CONsolidated Standards of Reporting Trials
Figure 1 in any published study –> total number of patients eligible vs total number of patients included. If number included is low vs number eligible then this is strongly suspicious that the subset is biased , either through who is in the study, or who declined to participate
What is sampling error
If a study is repeated different sample chosen with slightly different characteristics and as such the result will differ slightly.
Sampling error becomes smaller as the sample size increases
What is the difference between a parameter and a statistic
A parameter refers to a property of a population
A statistic refers to a property of a sample
What are the conventional symbols for the mean and standard deviation of a population vs a sample
Population
- Mean: mu
- SD : sigma
Sample
- Mean: X-bar (x with a bar on top)
- SD: S
What is a histogram. List and describes the 3 main shapes of this entity
This is a graph that gives an indication of the distribution of data.
- Normally distributed = Gaussian = Bell shaped
- Left skew = long left tail = negative skew
- Right skew = long right tail = positive skew
On what types of data can parametric tests be used
Normally distributed data
(This includes log transformed right skewed data –> Gaussian)
Unfortunately, Left skewed data cannot be transformed easily.
What is the purpose of a histogram
To show the frequency and shape of continuous data.
Determining whether the data is normally distributed (or can be transformed to normality) allows for the use of parametric tests in data analysis.
Shows:
- Gaps
- Outliers
- Skewed data
What is the kurtosis of the data
This refers to the flat or pointed nature of the distribution of the data
How do you calculate a 95% confidence interval and why is this necessary
95% CI = Mean ± 2SD
Used to determine if the data presented is plausible
What is the indication for a Box and Whisker Plot
To graphically represent the median and interquartile range in non-normally distributed data.
Describe the data organisation of a box and whisker plot
Median - thick horizontal line within the box
Length of box represents the interquartile range (25% –> 75%)
Whiskers represent range
Outliers shown when they are more than 3 box lengths from the upper or lower end of the box
What is the most common transformation of non-normally distributed data?
Log transformation of positively skewed (right skewed) data. Creates a normal distribution curve for which parametric tests during data analysis.
What is the purpose of scatterplots
To provide a visual representation of the relationship between two variables
- Strength of the relationship
- Degree of linearity
- Association positive or negative
- Presence of outliers
What is the indication to use a scatterplot
To understand the nature of the relationship between two continuous variables
What is the correlation coefficient
Numerical value depicting the correlation between two continuous variables:
Expresses both the magnitude (0- 1) and the direction of the correlation (positive or negative)
What is the coefficient of determination and how is it calculated. Give an example
Coefficient of determination is the square of the correlation coefficient. If r = 0.7 then coefficient of determination = 0.49.
0.49 means that 49% of the variation can be explained by the two variables and 51% is due to other factors.
What are the criteria required for causation
- Causative occurrence must precede the effect
- If cause occurs then effect should occur
- If cause does not occur then effect should not occur
Correlation does not imply causality
What are the limitations of correlation and scatterplots
- Correlation does not imply causation
2. Lack of correlation does not mean that the variables are not correlated in a non-linear way
What is the difference between ordinal and continuous data
Ordinal data is categorical data with a set order it. The interval between categorical data is not known
Continuous data is not categorical and exists on an increasing or decreasing scale with known interval
What are the different types of correlation coefficients
Pearson’s (r) correlation coefficient
- Plots two continuous variables
Spearman’s (rho) correlation coefficient
- Plots two ordinal variables OR 1 continous and 1 ranked variable
Kendall’s correlation coefficient
- Plots two categorical variables