Correlation, regression, sample size & power Flashcards
Advantage(s) of a 2x2 table
Advantages:
- Ease of interpretation
- No distributional assumptions
- Can easily stratify by other variables
- Can calculate OR or RR
Disadvantage(s) of a 2x2 table
sometimes requires arbitrary grouping of a continuous variable (loss of information)
Advantage(s )of Correlation and Regression
Advantages:
- Maintain continuity of data
- Model one variable as a function of the other
Disadvantage(s) of Correlation and Regression
Disadvantages:
- Only measures linear relationship
- Only useful when both variables are continuous
Pearson Correlation Coefficient
to determine whether two continuous variables (X and Y) are linearly related.The correlation coefficient:
- measures linear relationship between X and Y
- ranges between -1 (perfect negative correlation) and 1 (perfect positive correlation).
coefficient of determination (r2)
r2 is the proportion of the total variability in Y that can be explained by the linear association between Y and X
Multiple Linear Regression
One dependent continuous variable (Y), several independent variables (X1, X2, …). Allows us to predict Y based on several variables.
Logistic Regression (either simple or multiple)
Similar to linear regression, except that the dependent variable Y is dichotomous (e.g. Y=1 for diseased, Y=0 for not diseased), and we model the probability that Y=1.
Type I error
incorrectly rejecting H0 when H0 is true, i.e. finding a statistically significant association based on a sample of data when there is truly not an association.
Type II error
incorrectly failing to reject H0 when H1 is true i.e. finding not statistically significant association based on a sample of data when there truly is an association.
α
Probability of a type I error. Also called significance level.
β
Probability of a type II error.
Power
1 - β is the chance of detecting a difference if the difference really exists
Sample size determination steps
- Type of alternative hypothesis (1 or 2 sided)
- Significance level (α)
- The difference between treatments that you wish to detect (delta) (i.e. the minimum difference that you consider clinically significant).
- Power: the chance of rejecting the null hypothesis if the true difference is delta.
- Standard deviation: an estimate of the standard deviation of the variable of interest (e.g. standard deviation of the mean difference between treatments). Obtained from small pilot study or literature.