Lecture 11: Statistics Flashcards
Data analysis and error
> Data is analysed to separate the truth from the error
Error/uncertainty occurs from:
- Measurements – resolution error or calibration uncertainty
○ Reduce error by taking more accurate readings
- Sampling – reduce error by enlarging number studied
Types of uncertainty- random
> scatter of measurements about a best value
From poor resolution, noise of equipment, fatigue
Cannot remove
Types of uncertainty- Systematic
> from poor calibration or methodology mistake e.g. errors in equipment change depending on temperature
Gives constant error called bias
Can be removed
The 3 factors affecting error
> Precision
Accuracy
Reproducibility
Factors affecting error: Precision
> Precision is tendency to have values clustered closely together
- Significant figures
- Affected by ability to refine measurement e.g. weighing to 1g or 0.001g requires different balances
Factors affecting error: accuracy
> Accuracy is tendency to mimic “true value”
- Affected by systematic error e.g. contamination
- Not easily verified
- Agreement between methods?
Factors affecting error: reproducibility
> Reproducibility is “repeatability”
- Affected by random error
- Affects sensitivity/discrimination
- Estimated by replication
Measurement of uncertainty
- Absolute uncertainty is actual magnitude of uncertainty
- Is approximate value based on precision of measurements
> Calculate the change in values, n is the number of values
> Relative uncertainty is fraction or percentage of the measured value
- Is approximate value based on precision of measurements
Communicating uncertainty
> Quote an uncertainty rounded to 1 s.f. and then round the related measurement to this level of significance
Except for uncertainties beginning with a 1 where a further figure may be quoted
If no uncertainty given, implied uncertainty is next significant figure
How to remove uncertainty/error
> Repeat measurements to form series
- Random errors cause numbers to cluster around the mean
Some values significantly deviate
- Called outliers
- Plot values on scatter plot
to show outliers
»_space;Find those separate
from clustered values
Types of statistical distribution
> Normal (parametric) data
- Most continuous biological data is normally distributed
Non-normal (non-parametric) data
- Binomial
»_space; Data in proportions or counts
»_space; There are only 2 states
- Poisson
»_space; Data is in counts
»_space; Rare events or very large samples
Frequency distribution
> Frequency = count the occurrences of each distinct outcome
For range, add frequencies together
Show in histogram
- Narrow spaces between columns for clarity
- Area of column equal to frequency
Column height is frequency density
Shows if data Is shar or board, symmetrical or skewed, single or bimodal
Frequency equation
frequency density=
frequency/width of frequency interval
Normal distribution and frequency data
> Continuous quantitative data
- Length, height, weight etc.
- Plot frequency (y-axis) against variable (x-axis)
- Less data points at edges
Most data in middle around mean
Variables X and Y are related through mean and standard deviation
Standard deviation and the normal distribution
> Approx 2/3rds of data lie within 1 SD of the mean
Approx. 95% of data lie within 2 SD of mean
Approx. 99% of data lie within 3 SD of the mean
How to test for normal distribution
> Check to see whether 2SD from mean is within possible range for variable
Interpretive databases
> Forensic science often based on small sets of experimental data
Can use data from database of surveys or technical information from manufacturers
Compare your data to database
Probability
> All outcomes equally likely
Count the number of outcomes
Probability is between 0 (outcome never occurs) and 1 (outcome always occurs)
Expressed as fraction, decimal or %
Probability equation
Probability =
number of selected outcomes/
total number of possible outcomes
Probability of specified outcomes
> Probability of outcome A written as P(A)
Probability of outcome B written as P(B)
P (A and B) = P(A) x P(B)
P(A or B) = P(A) + P(B)
Why is probability important?
> Use it to calculate likelihoods of finding evidence
-Probability of evidence given guilt
-Probability of evidence given innocence
Ratio of these is called likelihood ratio
LR =
Probability of evidence given guilt/ Probability of evidence given innocence
High number suggests guilt
Low number suggests innocence