statistics Flashcards
why is data analysed
to separate the truth from the error
what 2 things does error/uncertainty occur from
- Measurements - resolution error or calibration uncertainty
- Sampling
what are the two types of uncertainty
random and systematic
what is and what causes random uncertainty
a scatter of measurements about a best value - poor resolution, noise of equipment, fatigue
what is systematic uncertainty and what causes it
constant error (bias) caused by poor calibration or methodology mistakes
which type of uncertainty can be removed from data
systematic
define precision
a tendency to have values clustered closely together
define accuracy
a tendency to mimic ‘true value’
define reproducibility
the likelihood that your data is reproducible from a replicate experiment
what is precision affected by
the ability to refine measurements e.g. weighing to a certain number of significant figures
what affects accuracy
systematic errors
what affects reproducibility
random error
what is absolute uncertainty
actual magnitude of uncertainty - an approximate value based on precision of measurements
how do you calculate absolute uncertainty
Δx ≈ xmax - xmin / n
what is relative uncertainty and how do you calculate it
it is a fraction or percentage of the measured value - multiply by 100
how do you communicate an uncertainty and what is the exception
round to 1 s.f and round the related measurement to the same d.p - if the uncertainty starts with a 1 do 2 s.f
how can you remove uncertainty
repeat measurements to form series
remove outliers
define an outlier
a value that is significantly deviated from the rest of the data - has to be the biggest or smallest value
how can we highlight outliers
plot values on a scatter plot to reveal those that separate from the cluster
what are the three types of statistical distributions
- Normal (parametric)
- Non-normal (non-parametric) = Binomial
- Poisson
how is most continuous biological data distributed
normally
what data falls under binomial distribution
data in proportions or counts that have only two states e.g. dead or alive
what data falls under poisson distribution
rare events or very large samples with data in counts
how do you calculate frequency, and frequency density from a histogram
frequency = area of column
frequency density = column height
how do you calculate frequency density
frequency/ width of frequency interval
what can frequency statistics show
if data is sharp or broad, symmetric or skewed and, single or bimodal
where can most data be found in normal distribution
in the middle - around the mean
how much data lies within 1 standard deviation of the mean when it is normally distributed
2/3
how much data lies within 2 SD’s of the mean when it is normally distributed
95%
how much data lies within 3 SD’s of the mean when it is normally distributed
99%
how do we test for normal distribution
check to see whether 2 SD’s from the mean is within possible range for variables
define probability
how likely an outcome is - 0 (never) to 1 (always)
what is the equation for probability
number of selected outcomes / total number of possible outcomes
define independent events
one event does not influence the probability of another event
what affects the combination of probabilities
whether the events are independent or not
what is P(AnB)
P(A) x P(B)
what is P(AuB)
P(A) + P(B)
why is probability important
use it to calculate likelihoods of finding evidence to give guilt or innocence
what is the likelihood ratio
how likely evidence is to support guilt or innocence
high LR = guilt
low LR = innocence
what is the likelihood ratio equation
probability of evidence given guilt / probability of evidence given innocence
what is the one exception to the likelihood ratio and why
DNA evidence as DNA is specific to individuals