Evidence Based Medicine Flashcards
Allows us to draw from the sample, conclusions about the general population
Statistics
An efficient way to draw conclusions when the cost of gathering all of the data is impractical
Taking Samples
Assume that an infinitely large population of values exists and that your sample was randomly selected from a large subset of that population. Now use the rules of probability to
Make inferences about the general population
States that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough
The Central limit theorem
What does the Central Limit Theorem say?
The sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough
If samples are large enough, the sample distribution will be
Bell shaped (Gaussian)
Statistics come in what two basic flavors?
Parametric and Non-parametric
A class of statistical procedures that rely on assumptions about the shape of the distribution (i.e. normal distribution) in the underlying population and about the form or parameters (i.e. mean and std. dev) of the assumed distribution
Parametric Statistics
A class of statistical procedures that does not rely on assumptions about the shape or form of the probability distribution from which the data were drawn
Non-parametric Statistics
Summarize the main features of the data without testing hypotheses or making any predictions
Descriptive statistics
Descriptive statistics can be divided into what two classes?
Measures of location and measures of dispersion
A typical or central value that best describes the data
Measures of location
What are the measures of location?
- ) Mean
- ) Median
- ) Mode
Describe spread (variation) of the data around that central value
Measures of dispersion
What are the measures of dispersion?
- ) Range
- ) Variance
- ) Std. Dev
- ) Std. Error
- ) Confidence Interval
No single parameter can fully describe the distribution of data in the
Sample
The sum of the data points divided by the number of data points
- More commonly referred to as “the average”
- Data must show a normal distribution
Mean
What are often better measures of location if the data is not normally distributed?
Median and Mode
The value which has half the data smaller than that point and half the data larger
Median
When choosing the median for odd number of data points, you first
Rank the order, then pick the middle #
When choosing the median for even number of data points, you
- ) Rank the numbers
- ) Find the middle two numbers
- ) Add the two middle numbers and divide by 2
Less sensitive for extreme data points and is thus useful for skewed data
Median
The value of the sample which occurs most frequently
Mode
The mode is a good measure of
Central Tendency
Not all data sets have a single mode, some data sets can be
bi-modal
On a box plot, 50% of the data falls between Q1 (25th percentile) and Q3 (75th percentile), the area encompassing this 50% is called the
Interquartile range (= Q3-Q1)
Used to display summary statistics
Box plots
To find the quartiles, put the list of numbers in order, then cut the list into four equal parts, the quartiles are at the
Cuts
The second quartile is equal to the
Median
Do not provide information on the spread or variability of the data
Measures of location
Describe the spread or variability within the data
Measures of dispersion
Two distinct samples can have the same mean but completely different levels of
Variability
The difference between the largest and the smallest sample values
-Depends only on extreme values and provides no information about how the remaining data is distributed
Range
Is the range a reliable measure of the dispersion of the whole data set?
No
The average of the square distance of each value from the mean
Variance
Makes the bigger differences stand out, and makes all of the numbers positive, eliminating the negatives, which will reduce the variance
Squaring the Variance
When calculating the variance, what is the difference between using N vs. N-1 as the denominator?
N gives a biased estimate of variance, where as (N-1) gives an unbiased estimate
In the calculation for variance, what does N represent?
N = size of population (biased)
In the calculation for variance, what does (N-1) represent?
(N-1) = size of the sample (unbiased)
The most common and useful measure of dispersion
Standard deviation
Tells us how tightly each sample is clustered around the mean
Standard deviation
When samples are tightly bunched together, the Gaussian curve is narrow and the standard deviation is
Small
When the samples are spread apart, the Gaussian curve is flat and the standard deviation is
Large
Means and standard deviations should ONLY be used when data are
Normally distributed
How can we determine if the data are normally distributed?
Calculate the mean plus or minus twice the standard deviation. If either value is outside of the possible rage, than the data is unlikely to be normally distributed
Approximately what percentage of data lies within:
- ) 1 standard deviation of the mean
- ) 2 Standard deviations of the mean
- ) 3 Standard deviations of the mean
- ) 68.3%
- ) 95.4%
- ) 99.7%
If data is skewed, we should use
Median
What are two more sophisticated, yet more complex, methods of determining normality?
D’Agostino & Pearson omnibus and Shapiro-Wilk Normality tests
D’Agostino & Pearson omnibus and Shapiro-Wilk Normality tests are not very
Useful
What we want is a test that tells us whether the deviations from the Gaussian ideal are severe enough to invalidate statistical methods that assume a
-Normality tests don’t do this
Gaussian distribution
How can we determine whether our mean is precise?
Find the Standard Error
A measure of how far the sample mean is away from the population mean
Standard error
The standard error of the mean (SEM) gets smaller as
Sample size gets larger
If the scatter in data is caused by biological variability and you want to show that variability, use
Standard Deviation (SD)
If the variability is caused by experimental imprecision and you want to show the precision of the calculated mean, use
Standard Error of the mean (SEM)
Say we aliquot 10 plates each with a different cell line and measure the integrin expression of each, would we want to use SD or SEM?
SD
Say we aliquot 10 plates of the same cell line and measure the integrin expresion of each, would we want to use SD or SEM?
SEM
An estimate of the range that is likely to contain the true population mean
-combine the scatter in any given population with the size of that population
Confidence intervals
Generates an interval in which the probability that the sample mean reflects the population mean is high
Confidence intervals
Means that there is a 95% chance that the confidence interval you calculated contains the true population mean
95% confidence interval
If zero is included in a confidence interval for a change in a disease due to a drug, then it means we can not exclude the possibility that
There was no true change
An observation that is numerically distant from the rest of the data
An outlier
Can be caused by systematic error, flaw in the theory that generated the data point, or by natural variability
An outlier
What is one popular method to test for an outlier?
The Grubbs test
How do we use the Z value obtained by the Grubbs test to test for an outlier?
Compare the Grubbs test Z with a table listing the critical value of Z at the 95% probability level. If the Grubbs Z is greater than the value from the table, then you can delete the outlier
To test for an outlier, we compare the Grubbs test Z with a table listing the critical value of Z at the 95% probability level. If the Grubbs Z is greater than the value from the table, then the P value is
Less than 5% and we can delete the outlier
What constitutes “good quality” data
Data must be: reliable and valid
What measurements assess data reliability?
Precision, accuracy, repeatability, and reproducibility