Lecture 15: Statistical Analysis Flashcards
Why is statistics used frequently in epi?
-Often expose differences in risk of disease between groups
-Need to demonstrate statistical association not due to chance, using the right tools and interpreting the results correctly
What is a mean and why is ti used?
-Measure of central tendency (aka average)
-there are limitations, doesn’t tell you true story, outliers can give skewed views so depends on how the values are distributed
What is the central limit theorem (CLT)?
-if took a sample of n and calculated the mean values repeatedly would get a sampling distribution of the mean
-the distribution of the mean would be normally distributed, have a mean, variance (overall oration around the mean), standard deviation (variability of the data on one direction)
-Basis of the Z-test
What are the limitations of the CLT?
-Need a large sample size
-Need to know SD of the population which is often unknown
What is a t-test?
-To test means
-we use the t-test to test if a single mean is statically different from zero, and test if two means are signally different from one another
-Can include negative values
What are ratios?
-Ratio is a comparison of two quantities ex probability of occurring and not occurring (dice)
What are proportions?
-is an equality of two ratios A:B=C:D
What is a chi-square test?
-Used to test proportions and ratios
How do you know what test to use: t-test vs chi-square?
-T-test: means or average ex a researcher wants to test a new drug to determine if it improves AVERAGE life expectancy in cancer patients
-Chi-square: proportions or ratios
ex given %
What are p-values?
-The probability that a test statistic (ex chi-square or t-test) would be as large, or larger than observed if the null hypothesis were true
-Probability that the observed differences (Between groups) might have arisen due to chance or sampling variation
-DOESNT NOT say anything about the magnate of the difference b/w 2 groups or the importance of the results
What is the null hypothesis?
-No difference b/w groups
-No association b/w exposure and outcome
-No difference b/w a measure and zero
-Therefore the p-value is the probability of seeing the test statics we’ve calculated if the null hypothesis was true (ie there is no true difference b/w groups)
What does a high vs low p-value mean?
-High p-value (>0.05): likely the sample results are consistent with a true null hypothesis so we CANT reject the null which means there is NO difference/association
-Low p-vaule (<0.05): likely the sample results are NOT constant with the null hypothesis so we REJECT the null, and state there IS a difference/ association
What does this 0.05 or 5% represent when talking about p-values?
-Alpha is the significance level: probability of making type 1 error
-Need a rule to decide whether an observed difference is not likely due to chance (ie significant) or likely due to chance (not significant) ie if we reject the null when we shouldn’t so say there is a difference when isn’t
-alpha often set to 0.05; confidence= 1-alpha
-5% is ‘industry standard’
What are confidence intervals (CIs)?
-confidence intervals: reasoned statement about the true population measure (how confident we are that it actually happened)
-Range of values around the sample estimate we calculated, and the true population measure os included within this range
-Usually 95% confident ex we are 95% confident that the true measure in the sample population is somewhere b/w 20-28% and a 5% chance that the true measure is <20% and >28% ie outside those values
How does odds ratio (OR) relate to confidence intervals?
-if OR = 1 then there is NO effect/association or if the Confidence interval includes 1 then its also not significant ie CI: 0.3-2.4 OR is not significant