Interpreting Evidence Flashcards
Why do we interpret evidence ?
We interpret evidence to be able to understand and interpret quantitative analysis - statistics
What is interpreting evidence in Medicine and when do we use it in practice ?
Interpreting evidence is Evidence Based Medicine !
- We evaluate evidence presented in literature from new drug trials or interventions
- We understand and interpret information patients have found on the web
- We investigate the benefits of treatment options for a patient in a particular sub-group
How do we summarising data and how do we use in statistics and how can these vary?
- Mean (To find the mean, add all the numbers together then divide by the number of numbers)
- Mode (To find the mode, order the numbers lowest to highest and see which number appears the most often)
- Median (To find the median, order the numbers and see which one is in the middle of the list)
- Standard Deviation
- Interquartile Range
We can plot our data on graphs and see if the results of mean, mode and median are skewed or not, with frequency plotted against a variable
A Negative skew will have the mean shifted to the left / lower than the peak of the graph (in the negative direction) - Mean lower than median
A Positive skew will have the mean shifted to the right / higher than the peak of the graph (in the positive direction) - Mean higher than median
Skewed isn’t a normal distribution of data
Shows how few people have this value so mean probably isn’t best result to use
How would normal distribution look on a Histogram ?
Looks a bit like a bell as well, a bit pointy
It shows us a measure of variability in the results
Around 68% of observations will lie in that range between the mean +/- 1 S.D and 95% of all observations all lie in 2 S.D
Why do we calculate standard deviation
It shows us a measure of variability in the results
Around 68% of observations will lie in that range between the mean +/- 1 S.D and 95% of all observations all lie in 2 S.D. If someone is a standard deviation of more than 2 then they are super rare, like top 0.2% population
What is confidence interval and what number do we aim for ?
Confidence interval;
- Range of plausible values for the unknown population ‘parameter’ (mean in this case)
- Calculate it from standard error
- Standard error (S.E) = 2
95% confidence interval (standard) = sample mean +/1 1.96*s.e (or 2 if doing in head)
E.g;
- Lower limit = 50 - (1.962) = 46.1
- Upper limit = 50 + (1.962) = 53.9
- Plausible values for ‘true’ population mean is between 46.1 and 53.9
- Express this as: Mean (95% CI of mean) = 50 (46.1 to 53.9)
- Can calculate CI around means, prevalence, RRs, ORrs, etc
If we repeat the study 100 times and calculate a 95% CI each time we would expect 95 of these intervals to contain the ‘true’ population mean
Its a range of values we are 95% confident includes the ‘true’ mean of our population (5% of the time the confidence interval will not include true population parameter)
Powerful tool for making decisions about whether observed differences are likely to be due to chance alone or likely to be a true effect
How do you calculate relative risk ?
Relative risk (risk ratio) = Risk group 1 / Risk group 2
(If same risk then RR = 1)
Relative risk is independent of the original prevalence
Can be misleading - always state baseline (absolute) risks as well as relative risks
How do you calculate absolute risk ?
Absolute Risk Reduction = Risk group 1 - Risk group 2
How do you calculate number needed to treat (NNT)?
NNT = 1 / ARR
Ignore negative or positive ARR just use number
(Number Needed to Treat)
What are samples?
Sample;
- In practice can’t measure every individual, measure a smaller sample - preferably a random sample of individuals to represent the population of interest
- Importantly we use the sample to ESTIMATE the ‘true’ measure of a condition or event in the population
- We can hardly ever do full population work
What are is population?
Population;
- Total group individuals of interest to the research
E.g Type 1 diabetes age 18+ in Latvia
What is hypothesis testing ?
Null hypothesis;
* there is NO DIFFERENCE in haemoglobin levels between patients red with intravenous iron supplementation
Only when we fully reject the Null hypothesis can we accept the alternative one
Alternative hypothesis H1 (Research Hypothesis);
* there IS a difference in haemoglobin levels between patients in the
two treatment groups
What are p-values ?
P-value comes from T-tests
P-Value = the probability that the observed difference (between systolic BP in my clinic compared with the previous literature) occurred by chance alone… if the Null Hypothesis is true
When P-value is 5% / 0.05 or below we reject null hypothesis
Does a P-value of 0.013 mean that it is likely or unlikely that the difference between BP in my sample and literature is just due to chance if Null hypothesis is true?
We chose an arbitrary cut off p<0.05
When P-value for a test statistic is below 0.05 we ‘reject’ the Null Hypothesis (as there is no difference between my sample and the findings in literature)
The accept alternative hypothesis that there IS a difference and report that “there is a significant difference”
E.g if doing one on 2 statins must state which one lowers cholesterol more significant than other, important !
How do you use the Bonferroni correction any why?
We use Bonferroni to reduce ‘false positive’ results (i.e Type I error)
Solution - we don’t use 5% significance for each test - be more strict
- i.e use a more extreme p-value
Bonferroni correction;
- If do 5 tests then for each test only accept as significant tests with p-value < 0.005/5 = 0.01
- If do n tests for each test then only accept as significant tests with p-value < 0.05 / n
This means that across all n tests you have only a 5% chance of a false positive
What are t-tests, when their use is appropriate and how to interpret the results of such tests?
T-test allows us to statistically compare means between 2 groups;
- 1 dependent continuous variable (e.g height)
- 1 independent binary categorical variable (e.g sex)
T-test is used to determine whether two means are significantly different from each other
Give a probability (p-value) that such a difference in means (or greater difference) would be found by chance if the Null Hypothesis is TRUE
E.g compare the height of men and women, compare the mean from your data with published literature, compare blood pressure readings before and after exercise
Example in image
What are some non-parametric tests and their use?
for comparing means
What is the use of the t-test?
Be aware of extensions to the t-test for comparing more than two groups: 1- and 2-way ANOVA.
What is correlation?
Measures the strength of relationship between two numerical variables
e.g Is there a relationship between maternal and daughter age at menarche ?
- Measured by the correlation coefficient (r)
The correlation coefficient varies between -1 and + 1
- r = -1 a perfect negative correlation (as one variable increases the other decreases)
- r = +1 a perfect positive correlation (as one variable increases so does the other)
- For basic correlation use continuous data
A significant correlation coefficient is indicated by a p-value in statistical packages
Mother and child age at menarche r = 0.41 p<0.01
I.e there is a signifiant positive correlation between the age at menarche of mothers and children ?
What is Linear regression?
Linear Regression;
- Used to predict relationship between independent variables and an outcome (dependent) variable
- Must be linear relationship between independent and outcome
- Is an example of a model. Want to predict the change in outcome associated with a particular change in the independent variable
- Models estimate the regression coefficient which can be thought of as the slope of best - fitting straight line through a scatter plot of the data
- Closely related to correlation
The regression coefficient (B) has a confidence interval
P value for this co-efficient indicated probability that the ‘true’ slope of the line is = 0 (i.e Null hypothesis is NO SLOPE)
Significant p-value (p<0.05) indicated that there is a significant slope (i.e B not equal to 0)
How to you calculate the Risk / Risk of an event ?
Risk = Number with event / Number at risk
or
Risk = Number with disease / Total number at risk
What are Odds and how are the calculated ?
Odds = Number with event / number without event
Or Odds = number smokers / number non smokers
Controls can be alive, age and gender matched
Odds can be being a smoker
Different from risk but tries to estimate risk when we cant find risk - difference is denominator (number without event instead of number at risk with risk calculation)
How to we estimate from samples ?
In practice we usually have a sample of individuals;
- Use the mean sample to ESTIMATE the ‘true’ mean of the population
- Use the Relative Risk from a sample to ESTIMATE the ‘true’ relative risk in the population
- Use the prevalence in a sample to ESTIMATE the ‘true’ proportion of the population
- E.g sample of 100 patient with asthma used to estimate rate of inhaler use in Scotland
We will have more confidence that the sample mean/ OR etc. is a good estimate of the population mean/ OR etc… if the sample is large
Larger samples - more confidence
How good is a sample mean as an estimate of the population mean?
If we took repeated samples, the variability of the sample means could be measured
This is called Standard Error (SE) (e.g standard error of the mean)
Standard error shows how variable samples are from each other !
A large SE = there is much variability in sample means; that many lie a long way from the population mean
A small SE = there is not much variability between the sample means
We can calculate SE from the sample
SE = SD / √n
(SD = standard deviation)
You will be given SD and n!
Larger sample leads to smaller SE better!)
SE is always smaller than SD because there is less variability between sample means than between individual values
Can also calculate the standard error or a proportion, odds ratio, etc
What types of Variability may we see?
Between people;
- e.g differential effectiveness of treatment
- e.g do or do not develop particular side effect
- e.g differential response to environment
Within people;
- e.g measures of blood pressure over a day
- e.g strength of left and right hands
Can we distinguish when we are seeing a real effect of a new treatment from the ‘random’ or ‘natural’ variation that is always going to be present ?
If we see a difference between treatment groups - could that just be by chance alone?
If the confidence internals do not overlap like this example what can we conclude?
We have 95% confidence intervals around the two means do not overlap
The mean haemoglobin level under intravenous supplementation is significantly higher than with oral supplementation
How would you interpret the data in the image shown if the image ?
Aspirin within the first six weeks significantly reduces the odds of a secondary fatal stroke
As the ads ratio is 0.36 and 95% CI is 0.2 - 0.63 so even at 0.63 its less than 1 which is the control without treatment showing it decreases TIA cases in weeks 0-6.
How would you interpret the data in the image shown if the image ?
Aspirin within the first 6 weeks significantly reduces the odds of a secondary fatal stroke (as CI below control)
Aspirin 6-12 weeks after TIA: No evidence that this treatment significantly lowers the odds of a subsequent fatal stroke (As CI was greater than 1 so some patients aren’t better off)
What are some criterial for coming up with statistical tests that we can do to compare groups?
Comparisons;
- Comparing our results with a gold standard
- Comparing one sample with another after an intervention
Question: When is a difference statistically significant ?
- I.e when do we reject the Null hypothesis ?
Answer - ideally want a simple yes or no answer
Want a test statistic that will allow us to make a decision
- Do we have enough evidence to be able to REJECT the Null hypothesis ?
We can chose tests based upon these questions
We have many tests available, skill is in knowing which is appropriate for your outcome
- Important to understand type of data (e.g categorical or continuous, ordinal etc - binary, etc)
- Important to think about the distribution of the outcome - normal or non-normal
We can use T-test (parametric) or Chi square test (non-parametric)!
What are Chi-square tests, when their use is appropriate and how to interpret the results of such tests?
Chi-square test is a Non parametric test
- Allows us to statistically determine if the difference between the observed and expected numbers (if there was no association) in each cell is significant (given the sample size!)
A difference implies a ‘relationship’ or ‘association’
- i.e values of one variable may influence values of the other
Assesses statistical significance of observed differences in proportions between mutually exclusive categories
Chi-square test;
- Pearson X2 = 4.072 df = 1 p = 0.044
I.e reject the hypothesis that there is no relationship between gender and total cholesterol status
Accept alternative hypothesis that there IS a difference between men and women with respect to cholesterol status
(This is an example of Chi-square test for independence)
What are odds ratios and when do we use them ?
Odds Ratios
Used in case-control studies or observational studies and regression models (don’t have result for people at risk so use odds ratio)
E.g case control study exploring the relationship between smoking (last 5 years) and mortality by age
Want to know if relationship between smoking and early mortality
Odds of being smoker = number smokers / number non smoker = 70 / 250 = 0.28 are smokers
Odds Ratio (OR) = Odds smoker if CASE (if died) / Odds smoker if CONTROL (if survived)
= 1.67 (died early) / 0.28 (matched control) = 6 odds ration
Among those who die by the age of 60, the odds of being a smoker are 6 times higher than those int he control group (who lived)
Association (relationship) between smoking and early mortality
What are the general rules and outlines for Odds Ratio?
- If odds are equal in case and control group, Odds Ratio (OR) = 1
- Similar to risk but remember are not same
- If events are rare then OR is a good approximation to the RR
- Like RR they are independent of baseline risk (Prevalence)
- Used in some types of regression (logistic) and therefore found in literature frequently
Used a lot as work cant always be done respectively
What is Analysis of Variance (ANOVA)?
T- tests can only deal with 2 means
Extensions of T-test called Analysis of Variance (ANOVA) can deal with more than 2 means
One way ANOVA - you measure the mean number of cells from replicates of 100ul cell suspension under 4 different experimental conditions;
- Results: F-test and P-value
Two way ANOVA - two independent variables and one outcome;
- E.g 4 experimental conditions, 3 different labs
- Results: F-test and P-value
What are Non-parametric statistical tests and when do we use them?
T-test, ANOVA - have underlying assumption that int he population the outcome has a normal distribution
What to do when;
1). May not know the distribution
2). May know the distribution is not normal
3). May have a small sample, which is unlikely to have a normal distribution
4). Using ordinal scales not continuous data
5). Have outliers and therefore not normal
We use Non-parametric statistical alternatives
What Non-parametric statistical alternatives do we have for what 4 tests?
- Independent samples t-test = Mann-Whitney U test
- Paried t-test = Wilcoxon Signed-Rank test
- ANOVA = Kruskal-Wallis test
- Pearson Correlation = Spearman Correlation
What are the 2 types of Chi-square test we can do?
Chi-square test for independence;
- Association between two categorical variables (e.g Is cholesterol status associated with gender?)
Chi-squared test for goodness of fit’
- Tests the difference between frequencies of a single categorical variable and some hypothesised frequency
- E.g is the frequency of depression sufferers in our sample (20%) the same as the proportion quoted in the literature
Are multiple tests a good idea and what problems may we run into?
In a sample of 100 men (50 smokers, 50 non-smokers) we test the following (in image) (each time using the significance level)
Each time we testa difference we use the 5% level of significance;
- i.e 5% of the time would you expect to see that a big difference by chance alone if the Null Hypothesis is in fact True?
Type 1 error is rejecting true Null Hypothesis
- I.e a “falsee positive” finding
If we do many tests we increase our chance of a false positive
At 5% level of significance could expect on in every 20 tests to be a false positive
So we look to Bonferroni correction for multiple testing