Critical Numbers Flashcards
What is a sample?
- Rarely collect information on everyone of interest
- So we can take a representative sample from the population of interest (population = group of people we are interested in, not whole population)
- We describe our sample using descriptive statistics
- We make inference about our population using inferential statistics
What is bias?
- Arises when imperfections in the research process cause our findings to deviate from the truth
- Can occur in all studies
- Can occur intentionally or unintentionally
- Impacts the validity and reliability
- We should consider it when critically evaluating the research of others
What is sampling bias?
Sample does not represent population of interest
What is recall bias?
Inaccurate recall of past events/ exposures/ behavious
What is information bias?
Incorrect measurement e.g miscalibrated machine
What is the Hawthorne effect?
Participants change their behaviours when they know they are being observed
What is attrition bias?
Differential dropout from studies e.g. sicker patients have to drop out so we end up only measuring the healthier participants
What is confounding?
- If it is unaccounted for, it can be a form of bias.
- These variables obscure the real effect of an exposure on an outcome
- Related to both exposure and outcome
What is an experimental study design?
The researchers have intervened in some way
What is an observational study design?
The researchers have not intervened, merely observed
What is a retrospective observational study design?
Looking back into the past
What is a cross-sectional observational study design?
A single snap shot in time
What is a prospective observational study design?
following up over time
What is a randomised controlled trial?
- Randomly allocate participants to different interventions and follow up
- Experimental and perspective
What is a Cluster randomised controlled trial?
Participants randomised in groups (e.g. by GP centre or therapist) rather than at the individual level
What is a cross over randomised controlled trial?
Participants receive both interventions in a randomised order.
What is a multi-arm and factorial randomised controlled trial?
Two or more interventions evaluated in a single study
What is an adaptive randomised controlled trial?
accruing information is used to inform planned design adaptations
What are the benefits of a randomised controlled trial?
- Randomisation reduces potential for confounding
- Can reduce bias
- Can determine casual effecta
What are the negatives of a randomised controlled trial?
- Randomisation can be unfeasible or unethical
- Require expert management and oversight, especially in ‘high risk’ interventions
- Expensive
What is a cohort study?
- Non-randomised (one group may be exposed, the other unexposed)
- Observational
- Typically prospective
What are the benefits of a cohort study?
- Useful when random allocation not possible
- Can work for rare exposures – select participants on the basis of exposure
- Can examine multiple outcomes
What are the negatives of a cohort study?
- May require long follow-up
- Can be expensive
- Not ideal for rare outcomes
What is a case-control study?
- Non-randomised
- Observational
- Retrospective (using the sample to look at cases to find the exposure not an outcome)
What are the benefits of a case-control study?
- Faster: use past data so do not require long follow-up
- Useful for rare outcomes: select participants on the basis of outcome
- Cheaper
What are the negatives of a case-control study?
- More prone to bias or poor quality data
- Harder to show causal relationship
- Not ideal for rare exposures
What is a cross-sectional study?
- Non-randomised
- Observational
- Single time point
Look at a sample at the unexposed and exposed outcomes and no outcomes
What are the benefits of a cross-sectional study?
- Relatively quick
- Cheap
- Can assess multiple exposures/outcomes
What are the negatives of a cross-sectional study?
- Susceptible to bias
- Cannot prove causality
- Not ideal for rare exposures/outcomes
What is an ecological study and what are the pros and cons?
The unit of observation is group (aggregate) rather than individual
e.g. Electoral ward, country
Some pros:
- Large-scale comparisons
- Can quantify geographical or temporal trends
Some cons:
- Ecological fallacy
- Cannot make inference at the individual level
What can categorical variables be?
-binary
- ordinal
-nominal
What can numeric variables be?
Discrete and continuous
What is binary (categorical data)?
Only two categories (e.g. positive and negative)
What is ordinal (categorical data)?
Categories with natural order (e.g. stage of cancer)
What is nominal (categorical data)?
Categories with no natural order (e.g. blood group)
What is discrete (numeric data)?
Observations can only take certain numerical values (e.g. number of children)
What is continuous (numeric data)?
Observations can take any value within a range (e.g. height)
What is a proportion?
The number with a characteristic or outcome divided by the total number. Used to describe probability or risk (scale 0-1)
What is a percentage?
Proportion multiplied by 100
What are odds?
The number with an exposure or outcome divided by the number without.
The ratio of the probability of an event occurring to the probability of it not occurring
The incidence of health-related events or outcomes is often presented as a rate. What is a rate?
A rate is the frequency per another unit of measurement . This allows us to account for variation .
Once an outcome has occurred an individual will not be at risk either forever or for some period of time.
Person-time risk is not always known and may be approximated
What is the risk difference?
Difference in proportions between groups
If there is no difference this will be 0
What is the risk ratio AKA relative risk?
The risk in one group divided by the risk in the other
If there is no difference the ratio will be 1
Ratios >1 indicate higher risk/odds in group of interest
Ratios<1 indicate lower risk/odds in group of interest
The more common the outcome, the more apparent the difference between risk and odds ratios
What is an odds ratio?
Odds in one group divided by the odds in the other
If there is no difference the ratio will be 1
Ratios >1 indicate higher risk/odds in group of interest
Ratios<1 indicate lower risk/odds in group of interest
The more common the outcome, the more apparent the difference between risk and odds ratios
What is the mean?
Sum of the values divide by the count
What is the median?
order the values then take the midpoint
What is the mode?
The most common value
How is the mean typically reported?
The standard deviation
How is the median typically reported?
A central range
What is standard deviation?
- Standard deviation – describes dispersion of values around the mean
- When describing samples the mean is denoted by ¯𝒙 and the SD by s
- When describing populations the mean is denoted by µ and the SD by σ
When reporting the median how do we quantify the variability of the data?
Range- lowest value and the highest value
Centiles- The median is the 50th centile. We can describe the spread using centiles around that e.g. 5th to 95th gives 90% central range
What is the IQR?
Interquartile range:
- the 25th to 75th centile, which gives the 50% range
What is a normal distribution curve?
The Gaussian distribution or the “bell-shaped curve”
If the normal distribution is normal, what will happen to the mean and median?
They will be the same
What happens to the normal distribution curve if the SD is bigger?
More wide spread curve and the apex is lower
What does positively/right skewed mean?
The sample has the same mean but the median is lower
What does negatively/left skewed mean?
The sample has the same mean but the median is higher
Is the mean affected by skew?
It is ‘pulled out’ by extreme values
Is the mean affected by skew?
will always have 50% of the data to either side
What is a parametric/ non-parametric statistical model?
Parametric – make distributional assumptions
Non-parametric – make no assumptions (distribution-free)
What does it mean if the normal distribution is equal?
Symmetric (mean, median and mode are equal)
What is the 68-95-99.7 rule?
68% of values lie within 1 SD of the mean
95% of values lie within 2 SD of the mean
99.7% of values lie within 3 SD of the mean
what is correlation?
Correlation – a measure of linear relationship between variables
- Quantified by the correlation coefficient r
- r is bound between -1 and 1
- The closer to 1/-1, the stronger the correlation
- the closer to 0, the weaker the correlation
- Can be positive (as one variable increases, so does the other)
- Or negative (as one variable increases, the other decreases)
- The ordering of the variables does not matter
Why do we take all these measurements on data?
- We can assess Normality
- We can identify outliers (also useful for identifying data entry errors)
- We can determine whether data might benefit from transformation
- We can assess collinearity
- We can choose a method of analysis best suited to our research question and data:
- Parametric – make distributional assumptions
Non-parametric – make no assumptions (distribution-free)
- Parametric – make distributional assumptions
What is statistical inference?
- Descriptive statistics relate to the sample
- Inferential statistics relate to the population
- We infer properties of the population by using sample statistics to derive estimates of population parameters and test hypotheses
- When making inference from a sample we need to account for uncertainty in our sample estimates
What is the problem with random sampling?
Produces variation - need to account for when making inference
What is the central limit theorem
If we were to take repeat samples and calculate the mean each time, those sample means will be Normally distributed around the true population mean even if the population itself is not normally distributed
What is the standard error?
The standard error is a type of standard deviation
(It is the standard deviation of the sampling distribution)
(Both are measures of spread)
The standard Deviation is for Describing
The standard Error is for Estimating
- The standard error indicates how different a sample mean is likely to be from the population mean
- It tells us the precision of estimation
- The smaller the standard error of the mean, the more precise our estimate of the mean
i.e. the closer it is likely to be to the true population mean
How do we calculate the standard error?
SD/ root (n)
What does the standard error calculation tell us?
Bigger then SD, bigger the standard error
Bigger the sample size, smaller the standard error
This makes sense because the less variable the data are, the more precise our estimation.
The more people we sample, the better the representation and therefore the more precise our estimation.
What is a confidence interval?
We can use the sample mean and standard error of the mean and properties of the Normal distribution to calculate a range of values we can be confident includes the true mean
This is called the confidence interval
We are now no longer just describing our sample – we are now making inference about the population parameter
What factors affect confidence interval width?
- Variability in the sample (SD)
- Sample size (n)
- The desired level of confidence: typically we use 95% but it could be 90%, 99%, etc.
How else should we calculate the confidence interval?
Means
Differences in means
Proportions
Differences in proportions
Correlation coefficients
Relative risks
Odds ratios
What is a hypothesis test?
- We can perform a statistical test to determine how likely the result we have observed is ‘real’
- Or if it is more likely there is no true difference and we are just seeing chance variation
- To do this we test the hypothesis of no difference between groups
- We then weigh up the strength of the evidence against that hypothesis
- And come to a conclusion
What is probability?
- Probability values range from 0 to 1
(though as you’ve seen we often x100 to express as a percentage) - A probability of 0 means an event is impossible
- A probability of 1 means an event is certain
- So the smaller the probability the less likely the outcome
What is the first step in completing a hypothesis test?
- Define the null hypothesis:
- This is typically the theory we want to disprove
- We will assume this hypothesis is true until we see sufficient evidence to the contrary
- Denoted H0
- In our example:
H0 = no difference in mean IQ between groups
What is the second step in completing a hypothesis test?
- Define the alternative hypothesis:
- This is the opposite theory to the null
- Denoted HA or H1
- In our example:
HA = there is a difference in mean IQ between groups
What is the third step in completing a hypothesis test?
Choose a significance level for the test:
- This is how we determine whether our result is statistically significant
- It is also the probability we make a false positive conclusion and reject the null hypothesis when it is in fact true
- So we need to minimise this risk
- Typically it is set around 0.05 (so 5%)
What is the fourth step in completing a hypothesis test?
Perform an appropriate statistical test:
- We then compare that test statistic to the distribution we would expect under the null hypothesis and work out the probability of our result if the null were true
What is the fifth step in completing a hypothesis test?
Decision time:
We use the probability value from the statistical test to weigh up the strength of the evidence against the null hypothesis
We call this probability value the p-value
The p-value is the probability of seeing an effect of the observed magnitude or greater if the null hypothesis were true
What happens if the p-value is high?
The result is probable under the null hypothesis… so it is likely the null hypothesis is true
What happens if the p-value is smaller than our significance level (so < 0.05 in our example)?
We reject the null hypothesis
- The smaller the p-value, the less likely it is we would see our observed result under the null hypothesis
What does the confidence interval give us with hypothesis testing?
- Gives our plausible range for the true population difference
- Can be used to determine statistical (and clinical) significance
- Thus is more informative than the p-value alone
What is the difference between clinical and statistical significance?
- Statistical significance just means an observed result is unlikely due to chance
- Clinical significance means the result is practically important