STATS Flashcards
Population
A population is the entire collection of objects or outcomes about which information is sought
Sample
A sample is a subset of a population, containing the objects or outcomes that are actually observed
Simple random sample
A simple random sample (SRS) of size n is a sample chosen by a method in which each collection of n population items is equally likely to comprise the sample
N
n is the number of values in your sample. If you measure the heights of students in a class of 27, then n = 27
Median
Median (also called the “midrange” – the middle number of the ordered set of values
– If n is odd, then the median is middle number * 1,2,4,5,6 median = 4
– If n is even, then the median is the average of values in middle position
* 2,3,4,7,9,10 median = 4+7=11/2=5.5
Mode
Mode – the most frequently occurring value in a sample
– 2,2,2,3,3,3,3,5,5,6,6 mode = 3
Range
Range – the difference between the largest
and smallest values in a sample
– 23, 33, 35, 55, 70 range is 70-23=47
Mean
the sum of the values divided by the number of values. __
– 1,3,4,5,7 sum = 20 X = 20/5 = 4
Means are not always meaningful by themselves!
Variance
how far a set of random numbers are from their mean
Standard deviation
the square root of the variance
discrete data
acountofwholeevents,objects or persons. For example, the number of people with a certain illness is a discrete quantity
Continuous data
themeasureofaquantitysuch as length, volume, or time, which can occur at any value. For example, the concentration of glucose in the blood is a continuous quantity. Even if the instrument you are using rounds off values to whole numbers, these quantities are still continuous.
Standard deviation
Expresses the degree to which each data result
tend to vary about the mean value
– SD is the square root of the variance of the sample
– SD - measures precision
– SD - used to set confidence limits upon which control result acceptability is determined
Standard deviation formula
Standard deviation formula
Range of standard deviation
Example-
Average: 90mg/dl
Normal range-
Average for population- +/- 2SD
example= glucose
Average: 90mg/dL
1SD= 10mg/dL
90 + 2SD= 110mg/dL
90- 2SD= 70mg/dL
normal range= 70-110 mg/dL
Exceptions to Range
When any value is not normal
- drug screens, disease markers, Morphology
When the test is qualitative
- HIV, throat culture, genetic testing
When monitoring response to medication
- refer to therapeutic range
Level Jennings chart
– Graphically display the assay (QC) values of replicated
controls vs. time or consecutive runs
– Confidence limits are calculated from the mean and SD. It is customary to use +/- 2 SD as the confidence limits. (95% confidence limit)
Level Jennings chart
Quality control charts are
Quality control charts are used to record the results of measurements on control samples, to determine if there are systematic or random errors in the method being used.
Accuracy
The closeness to which a value comes to the true value- established by calibration
Precision
The reproducibility of a value
- evaluated by use of QC materials- evaluates the degree of fluctuation in the measurements
To be reliable
A method must be both accurate and precise
Measures of precision
– Variance
– Standard Deviation
– Coefficient of Variation – F-Test
Measures of accuracy
– T-Test
– Linear Regression Analysis
Sensitivity
a measurement that determines the probability of actual positives
= # true positives/[# true positives + # of false negatives]
Specificity
– a measurement that determines the probability of actual negatives
= # true negatives/[# of true negatives + # of false positives]
Sampling errors
one of the major difficulties in obtaining reliable results involves the sample collection procedure
– Time of day the sample is obtained
– The patient’s position, state of physical activity – Storage condition and aging of sample
Procedural errors
– Aging of chemicals/reagents
– Personal bias (limited experience)
– Laboratory bias (because of variations in standards, reagents, environment, methods, and equipment)
– Experimental error (resulting from change in method, instruments, or personnel)
Procedural errors
– Aging of chemicals/reagents
– Personal bias (limited experience)
– Laboratory bias (because of variations in standards, reagents, environment, methods, and equipment)
– Experimental error (resulting from change in method, instruments, or personnel)
Outliers
-Outliers are data points that are much larger or smaller than the mean of sample points
* Outliers should not be deleted without considerable thought and documentation.
Evaluating QC data - Co-efficient of Variation (CV)
Allows comparison of different test methods and to compare data from one laboratory to that of another lab by expressing the SD of each set as a percentage of the mean
Evaluating QC data - Co-efficient of Variation (CV)
Allows comparison of different test methods and to compare data from one laboratory to that of another lab by expressing the SD of each set as a percentage of the mean
CV formula
- CV is expressed as a percent %.
- CV = (SD/X)*100
– SD = SD of a procedure
– X=mean
Acceptable CV of 5% or less (most labs use 3%)
Using CV- Index of variability
- Procedures with increasing CV values demonstrate decrease precision, since this reflects greater variability among the replicate samples
Example of CV index of variability
– Procedure A has a SD of 3mg/dl and a mean value of 100.
– Procedure B has a SD of 5mg/dl and a mean of 250
– Procedure A has a SD of 3mg/dl and a mean value of 100.
– Procedure B has a SD of 5mg/dl and a mean of 250
Which procedure would you recommend and why
CV ProcedureA=(3/100)100=3%
– CV ProcedureB=(5/250)100=2%
– Procedure B would be recommended, based on the lower CV value, indicating greater precision of the test with less variability.
Evaluation of Peer QC data - SDI
Standard Deviation Index—Useful to evaluate performance when comparing to another lab’s performance
Standard Deviation Index formula
SDI= Lab mean- peer group mean/ Peer group Standard deviation
Acceptable results= -1.0 to +1.0 SDI
methods comparison stats
- Compare the new method against old
- Are differences significant?
- Two types: – Graphs
– T-Test
Scatterplots
A graph that can be used to give a rough impression of the shape of a sample, giving good indication of where the sample values are concentrated and where gaps are.
Histograms
a graphical display that gives an idea of sample “shape”, indicating regions where sample points are concentrated and regions where they are sparse
Symmetry and skewness
- A histogram is symmetric if the mean and median are approx. equal…its right half mirrors the left half.
- Only one peak is termed “unimodal”, 2 peaks is “bimodal”
Symmetry and skewness
Histograms that are not what
- Histograms that are not symmetric are skewed
Right or positively skewed
- A histogram with a long right-hand tail is said to be skewed to the right or positively skewed
– Themeanisgreaterthanthemedian
Left or negatively skewed
A histogram with a long left-hand tail is said to be skewed to the left or negatively skewed
– Whenthemeanislessthanthemedian
Students t-Test
- Comparison of data
- Are they “significantly” different?
- Is the difference “statistically significant”?
- Possibility that differences are due to chance?
Student t-Test
Null hypothesis
(H0) No significant difference
between the numbers (differences are due to chance)
Alternate Hypothesis
(Ha) There IS a significant difference between the methods (differences are not due to chance alone)
- We ran a serum cortisol low control (20 μg/mL) for nine consecutive days on two different instruments
17, 21, 23, 18, 19, 20, 18, 22, 23
16, 19, 24, 23, 17, 19, 23, 21, 24
Are these two sets of numbers close enough
- If they are, we ACCEPT the Null Hypothesis (the differences
are due to chance and are not significant) p> 0.05 - If they are not, we REJECT the Null Hypothesis (the differences are NOT due to chance which means that these two methods are not equal (alternative hypothesis) p< 0.05
- We ran a serum cortisol low control (20 μg/mL) for nine consecutive days on two different instruments
17, 21, 23, 18, 19, 20, 18, 22, 23
16, 19, 24, 23, 17, 19, 23, 21, 24
Are these two sets of numbers close enough
- If they are, we ACCEPT the Null Hypothesis (the differences
are due to chance and are not significant) p> 0.05 - If they are not, we REJECT the Null Hypothesis (the differences are NOT due to chance which means that these two methods are not equal (alternative hypothesis) p< 0.05
Probability
- Can be different for different analyses
probability
Usually use a what
- Usually use 95% probability:
– There is a 5% chance or greater probability the
results are due to pure chance (random variance)
– If the value falls below 5%, then it is no longer just random variance
– In other words, if you calculate a t-Test value, and your p< 0.05 (or p<5%), then you REJECT the Null Hypothesis (differences ARE statistically significant)
- Old method for glucose:
100, 112, 125, 111, 77, 89
Mean
variance
Mean = 102
Variance = 302
New method for glucose
101, 113, 120, 88, 105, 93
Mean = 103
Variance = 144
Variance
A measure of how far a set of numbers are from their mean.
Standard deviation
Old method for glucose: 100, 112, 125, 111, 77, 89
Mean = 102 Variance = 302 SD = 17
Standard deviation
Old method for glucose: 100, 112, 125, 111, 77, 89
Mean = 102 Variance = 302 SD = 17
CV
* Old method for glucose:
100, 112, 125, 111, 77, 89
Mean = 102
Variance = 302
SD = 17
%CV = 16.7%
T-tests *
Old method for glucose 100, 112, 125, 111, 77, 89
New method 101, 113, 120, 88, 105, 93
p=0.887
p>0.05, so there is no significant difference
Advanced analysis
- Linear Regression analysis - Accuracy
- More than one sample – Analysis of variance (ANOVA)
- Goodness of Fit – Chi squared test
- Multivariate analysis