Data Analysis Flashcards
Why should you graph original data and not averages?
You can see for yourself where’s it the average is
You can see for yourself what the spread looks like
You can see if there is anything odd/interestiing about the data eg outliers, evidence of skewing etc
How can you find true confiemdence intervals?
Using a T test
Why do many sets of experimental observations for approximately to a normal distribution?
The mathematical explanation is the central limit therom
Why/what data may result in a normal distribution
If a variable is affected by a lot of different random factors
Each has a small effect
The effects are additive
The distribution will approximate to a normal distribution
Sd
A statistical tool that tells you how tightly (close together) all the variations examples are clustered around the mean in a set of data
It is a more sophisticated indicator of the precision of a set of given measurements
What does it mean if effect size is not large enough compared to variation?
It means that random variations in readings might account for the difference between treated and controlled
What do statistical tests such as T tests do?
They ask how does variation compare with effect size
Why should error bars be treated as a rough indication?
With low density samples, they underestimate the uncertainty
Error bars are not additive
All statistics only give a rough indication of confidence
Why is it wrong to think that if error bars don’t overlap the result is significant and Vice Versa
Biological significance is best shown by effect size, not by statistical significance
The idea that error bars just touching is equivalent to p=0.05 does not apply to small sample numbers
Idea is just silly
Confidence intervals
These get round the problem that SEM error bars are not additive.
By using a t test, we can get the best estimate, even when we have a small number of observations
The T test can calculate the 95% C.I for the difference between the means
What is 95% C.I
C.I interval gives us an idea of how much larger or smaller and tells us that 95% of time the real answer should be within a his interval.
What does a small p value <0.05 mean?
It indicates strong evidence against the null hypothesis, so you reject null hypothesis. This means that the probablity of the data due to only chance/random variation is less than 5%
What does a large p value >0.05 indicate ?
It indicates weak evidence against the null hypothesis so you fail to reject the null hypothesis
Advantages and disadvantages of confidence intervals
Adv;
-combine numerical information in effect size, statistical confidence and possible variation in the ‘real’ effect size
-ideal for simple comparisons such as treated vs control
-now the preferred approach in clinical research and epidemiology
Diaadv:
Harder to apply to more complex experiments, eg more than one control, more than one treatment
How can the effect size be compared with the variation? For statistical reliability
a) by eye from the raw data
b) using sem error bars
c) more precisely using a 95% confidence interval
CI is more precise than error bars and more informative than a p value
For sample sizes above 20 how do we treat the confidence intervals
We treat them as weak to moderate as long as the error bars don’t overlap
Sem small compared to effect size
The real answer may be a little larger or smaller than our estimate, but not much
Strong statistical confidence
Sems are moderate compared to effect size
The real answer may be quite a bit larger or smaller than our Estimate, but it is unlikely to be zero
Moderate statistical confidence
If sample size over 20, then double the larger error bar and it doesn’t overlap with the other one
Sems are substantial compared to effect size
The real answer may be quite a lot larger or smaller than our own estimate. It could even be negative or zero
Weak statistical confidence
The error bars close but not overlapping
Sems are large compared to effect size
The real answer may be a lot larger or smaller than our estimate. It could even be zero or negative
Weak to very weak statistical confidence
Error bars overlapping
What do significance tests do?
They mathematically compare the variation with the effect size. They calculate a number called the p value. This is the probability of checking whether effect size is due to random variation or not
P value strength P-0.001 P-0.01 P-0.05 P>0.05
P-0.001 statistically very strong data
P-0.01 moderately strong data
P-0.05 statistically weak data
P>0.05 weak to very weak data
Type 1 error vs type 2 error
Type 1 error is rejecting the null hypothesis, stating that the effect is significant when in fact there is no real effect
Type 2 error is accepting H0 and declaring that a result is not statistically significant, when in fact there is a real effect.
Give 3 reasons why null hypothesis significance testing has proved extremely popular. For each reason, explain why this may cause problems with the correct interpretations of the data
People don’t like uncertainty and nhst appears to give a definite answer. This can lead to type 1 and type 2 errors because it is often the wrong answer
People don’t like making decisions so nhst let’s computer make decision. The problem is you should make the decision based on all the evidence
Ambitious-nhst allows you to publish more papers even if some conclusions are actually wrong
Often lazy- tells you a result is significant which implies that you don’t need to do more experiments, problem can arise when there is only weak evidence
When do we use a students T test
When we are studying the difference between two groups of observations
- > measurements on treated and control samples
- > measurements on patients and normal controls
It is designed for cases where you have small numbers and observations
And you cannot tell the actual distribution of the data. It assumes a normal distribution
It compares effect size with variation and uses this to calculate a 95% confidence interval and a p value for null hypothesis
A one tailed T test gives a smaller value.
When do you use a paired t test
By looking st before and after results in single individuals, we can ignore the variation between the individuals and study the effect and the variability of the effect itself s
Paired data can be analysed by a paired t test
Why are non parametric tests not always useful
Non parametric tests don’t work well with small numbers of observations. They do not provide a 95% CI
How to draw a confidence interval
Draw x axis with appropriate scale
Draw a line at effect size to represent the mean
The CI is saying that our mean could be as small or big as []
Draw lines at these points
Then join the lines together
When is linear regression used
Linear regression is used when one variable is precisely fixed in advance eg time, dose etc
Correlation is used when both variables are measurements which may have random errors or random variation
How many litres in one decilitre
0.1L in 1 dl
What can you do to reduce probability of making type 1 error
Reduce significance level from 0.05 to 0.01