Theme 3 - stats, how we use it to interpret our data Flashcards
give an example of how biology has an inherent variability in its data
e.g. women aged 50-70 are screened every three years for breast cancer, 4 in 100 will be positive but under further examination, only 1 in 4 will be a confirmed case
what does standard error mean do?
allows us to determine how representative an average is to the whole population
what is the definition of standard deviation?
the difference between the data values and the mean so the further a data point is from the average, the larger the SD
how does sample size affect the sample mean?
the larger the sample, the closer the sample mean is to the true mean
what is the central limit theorem?
the idea that the larger the sample, the closer it gets to the true mean and so shows us it is normally distributed
how can you turn data into a normal distribution?
by repeatedly sampling the data set, finding the mean and plotting it to make it sample means
how do you calculate standard error mean (SEM)?
by taking the SD and dividing it by the sample size
what is the SEM?
a reflection of how accurate the sample is to the population
what do length of inferential error bars suggest?
how much uncertainty there is in the data:
-wide bars indicate large error
-short inferential bards indicate high precision
what is n in stats?
the number of independent subjects in an experiment and not the number of replicates
what are the rules for error bars?
-they’re meaningless unless defined in the figure legend
-the n number should always be stated
-they should only be shown for independently repeated experiments and never for replicates
what are the two stats tests that will test for normal distribution and how do you know what to use whem
-shapiro wilk test (for smaller sample sizes)
-kolmogorv-smirnov test (for sample sizes above 50)
what is the null hypothesis?
there is no difference between the populations/ there’s no effect to be observed
what does the experimental hypothesis state?
there is a difference between our populations/ an effect is observed
what determines the validity of a hypothesis?
the inability to prove that it is false
what is a control group? when might it need to be modified?
-a control group is a group where as many variables as possible are kept the same so that the only thing that is different is the experiment variable (the thing being tested)
-needs to be modified in observational studies by including adjustments in the statistical analysis to account for the confounding variables
how does randomisation in a blind study work?
- details of everyone taking part are put into a computer
- the computer puts each person into a treatment group at random
- the computer programme takes into account details such as age and e.g. stage of cancer to make sure all groups are as similar as possible
what is a statistical test?
a test to determine if the observed finding are applicable to the wider population and not simply due to chance
what is a type 1 error in statistical infernece?
a false positive error where there us no difference but we see one so we should have accepted the null hypothesis instead of reject
what is a type 2 error?
a false negative error where there was a difference but did not detect it so we should have rejected the null hypothesis instead of accept it
what is the alpha value?
the chance of making a type 1 error (often referred to as the p value)
how is the alpha value different to the p value in statistical inference?
the alpha value is prior to the study whereas the p value is the observed result after the study is completed
what is the beta value? how can you use this to find the ‘power’ of the study?
the chance of making a type 2 error and the power of the study is the inverse of the beta value
why will the p value never be 0?
there is always a chance of making a type 1 error
what is the definition of the p value?
the probability of the data/ observations arising due to chance when the null hypothesis is true
what does it mean when the p value is less that 0.05?
the p value is sufficiently low and the so we can be confident the result being seen is real and not due to chance hence its statisticaly significant
what is definition of sensitivity?
the proportion of diseased individuals that are correctly identified to have the condition
what is sensitivity as a porportion?
a/(a+c)
specificity definition
the proportion of non-diseased individuals that are correctly identified to not have the condition
what is specificity as a proportion?
d/(b+d)
what is the positive predictive value (PPV)?
the proportion of individuals with a positive test result that actually have the disease
what is the equation for proportion of positive predictive value PPV?
a/(a+b)
what is the negative predictive value (NPV)
the proportion of individuals with a negative test result that actually do not have the disease
what is the equation for negative predictive value (NPV) as a proportion?
d/(c+d)
what should the sensitivities and specificities of a good test include?
high sensitivities and specificities
how does sensitivity and specificity affect ppv and npv
-high sensitivity = high ppv
-high specificity = high npv
how are ppv and npv influenced by prevalence of the condition?
a high prevelance will increase the ppv and reduce NPV while a low prevelance will lower ppv and increase npv