Review (2) Flashcards
normal distribution
- symmetrical bell shape
- data likely to be dependent on many, often small, random factors
- if data sets are large, chances are that a parametric test will apply since the data should follow a symmetrical bell curve
- small data sets may be skewed to one side and thus the parametrics may not apply (use non-parametric counterparts)
properties of a normally distributed variable: mean 1SD 2SD 3SD
■ mean = median
■ 1SD (1 Z score) on each side of the mean encompasses 34% of all values (68% for both sides)
■ 2SDs (2 Z scores) encompass 47.7% or (95% for both sides)
● also described as 97.5th percentile
■ 3SDs encompass 49.8% or (99.6% for both sides)
Central Limit Theorem
● Under certain conditions, the sum of a large number of random variables will have an approximately normal distribution
● Other distributions can be approximated by normal distribution (chi-square, t-test, etc)
■ median separates the lower and upper half of observations
■ it provides an estimate of the uncertainty of the mean of a population based upon a sample mean
Central Limit Theorem: Properties
■ distribution of sample means will be approximately normal regardless of whether the distribution of the original values in the population is normal or not
■ mean of the means of all possible samples equals the population mean
■ the standard deviation of the distribution of the means of all samples (standard error of the mean) is equal to the standard deviation of the population divided by the square root of the sample size
Difference between standard deviation and standard error (of means)
■ standard error is NOT a measure of variability
■ standard deviation is a measure of variability
Degree of Freedom
● in standard deviation: DoF = n -1 for a sample, DoF = n for a population
● one less than the total number of values in a sample
● definition: number of values that are free to vary in a sample
○ if there are a 100 values and you know the mean and standard deviation, if you know 99 values you can determine the 100th. Therefore, it is said that there is 1 fixed value and the rest are variable.
Degrees of freedom for ANOVA test
○ DFtotall=total number of observations - 1
○ DFbetween groups=total number of groups -1
○ DFwithin groups=number of groups x (number of observations per group -1)
○ DFtotal=DFwithin groups+DFbetween groups
Underlying concept behind tests of significance
■ differences can be “explainable”
● effect of being in a particular group
■ differences can be “unexplainable”
● due to natural variation or unmeasured group difference
■ if explained variation is significantly higher than unexplained variation, we can conclude that the groups really are different
■ based upon the ratio of explained variation to unexplained variation
● if ratio is large, groups are statistically different
● if small, groups are not statistically different.
Parametric vs. Non-Parametric
parametric: follows a normal distribution
● usually comparing group means
non-parametric: does not follow a normal distribution
● usually comparing group medians
Non-parametric test advantages and disadvantages
● Non-parametric test advantages
○ fewer assumptions to fulfill
■ variables do not have to follow a distribution
○ useful for dealing with outliers
○ intuitive and easier to do by hand with smaller samples
○ can be used for categorical data
● Non-parametric test disadvantages
○ less efficient than parametric counterpart
■ following a distribution allows you to take advantage of its properties
■ lack of power
○ hypothesis test over effect estimation
○ too many ties problematic
Null vs. alternative hypotheses
● Null Hypothesis (Ho ): hypothesize that there is no difference between the two groups
○ Four elements of the sound clinical trial: PICO
■ Patient population or problem
■ Intervention (Tx, usually)
■ Comparative Intervention (if necessary)
■ Outcomes (precisely defined)
● Alternate Hypothesis (H1): hypothesize that the two groups are different
p-value
probability of observing group difference if differences occurred by natural variation (p-value)
if this probability is sufficiently low, then conclude there is a group effect
p-value cutoff depends on question being asked
When p value is less than alpha, then we can
alpha
beta
reject the null hypothesis
Alpha - “the chance we are willing to accept of being wrong by finding a difference between two treatments when none really exists.” (often alpha = 0.05)
Beta - “the chance we are willing to accept of being wrong by not finding a difference between two treatments when there really is a difference”
P-value fallacy
ex. if we assume a p-value of 0.04:
correct interpretation
incorrect interpretation
Assuming that the p-value equals the chance of there being no difference b/t Tx
really → p-value = chance of obtaining difference in means
Correct interpretations of p value: P(results | Ho) - “Assuming the null hypothesis is true, there is a 4% chance of obtaining the difference in means we observed or a greater difference in the trial”
therefore, it is not likely to receive the measured difference between the groups, if the groups were really equivalent. Therefore, we can infer that the two groups are different
Correct interpretations of p value: P(results | Ho) - “Assuming the null hypothesis is true, there is a 4% chance of obtaining the difference in means we observed or a greater difference in the trial”
therefore, it is not likely to receive the measured difference between the groups, if the groups were really equivalent. Therefore, we can infer that the two groups are different
Confidence Interval
○ measure of precision of estimate of the difference based on the results of the study
○ measure of the magnitude of the difference
○ true diff b/t two population means lies in an interval
■ 95% CI = diff in sample means will lie in this interval 95 out of 100 times
○ size of interval depends on difference b/t sample means, level of significance and corresponding t value, & st error of the diff in sample means
○ “If a procedure were to be repeated on multiple times (repeated sampling), the results should fall within the interval 95% of the time”
■ estimate of precision
○ “We can be 95% sure that the true value will fall within this range”
○ statistic that quantifies the uncertainty in measurement
○ If the CI of RR excludes 1, then the RR is statistically significant, if it includes 1, then the RR is statistically non significant
○ 90% CI is narrower than 95%CI
○ larger sample size → narrower CI
how CI is related to p-value
advantages of CI over p-value
-How is it related to p-value
more versatile than p-values
if CI → no diff; p-value → no diff
● Advantages of confidence interval over p-value
○ Provide a measure of precision and magnitude of estimates.
○ Less prone to misinterpretation.
○ In general, preferred way to express statistical significance of results of studies of therapies.
PICO Method → Developing a clinical question
P = Patient population or problem I = Intervention (usually a Tx) C = Comparison Intervention (if necessary) O = 1+ precisely defined outcomes
Sample size
= 16/ (phi)2
An approximate sample size for each group is obtained with the formula n = 16/square of the non-centrality parameter.
The non-centrality parameter is the minimum magnitude of effect worth detecting divided by the standard deviation of the outcome variable squared= (magnitude effect/ standard deviation)^2.
If we want to detect a difference of 10 and our sd = 14, our phi = 10/14