Bio-statistics linear regression Flashcards
The relationship between outcome (Y) and a covariate (X) can be
either linear or non‐linear
the outcome is ……
and the exposure is ……
The outcome is continuous.
The exposure can be continuous or categorical.
A scatter‐plot can help determine:
Is the relationship between outcome & covariate linear?
How strong is the strength of the relationship?
correlation coefficient
The strength of the relationship can be negative, zero or positive,
and assessed by the correlation coefficient.
Outcome” (Y) other names
Response variable
Dependent variable
“Exposure” (X) other names
Covariate Independent variable Predictor Explanatory variable Risk factor
Similarities between CORRELATION AND REGRESSION
Create a scatter plot of outcome vs exposure. Observe the pattern.
Outcome is continuous
Exposure: continuous or categorical
Hypothesis test used for both.
Correlation: r = 0 vs r ≠ 0.
Regression: = 0 vs ≠ 0.
AIM: To find if there is an association between the chosen exposure and outcome.
Differences between CORRELATION AND REGRESSION
Correlation: r ranges from ‐1 to +1. Strength of relationship. Regression: B‐coefficient can be any value. Equation: outcome & exposure. Predict the value of outcome from a certain exposure value. Two types of regression: Simple Multiple
LINEAR REGRESSION – STEPS
- Graph the data. Check linear relationship.
- Calculate correlation coefficient
- Do linear regression analysis
- Evaluate the model
Coefficient of determination (R2)
Residual plot
Normal probability plot
CORRELATION COEFFICIENT
Correlation coefficient, p, quantifies the linear relationship between a pair of variables. The correlation coefficient can be between ‐1 and +1. Stats package (Graph Pad, SPSS, Stata) used to obtain “r” . Degrees of freedom: n ‐ 2
What is the Hypothesis test for correlation:
Null: Correlation = 0
Alternative: Correlation ≠ 0
HOW TO INTERPRET A CORRELATION COEFFICIENT?
r < 0.00 (Negative numbers)
Negative relationship. As X increases, Y decreases.
r > 0.00 (Positive numbers)
Positive relationship. As X increases, Y also increases.
Ranges of r (magnitude)
Ranges of r (magnitude) 0 to 0.3 = fairly weak 0.3 to 0.7 = fairly strong 0.7 to 0.9 = strong Above 0.9 = very strong
THREE ASSUMPTIONS OF LINEAR REGRESSION
The outcome (Y) variable follows a normal distribution.
Check by histogram or boxplot.
The relationship between outcome (Y) and covariate (X) is linear.
Check with a Scatterplot.
There is constant variance of the outcome across different values of the covariate.
Check with a residual plot
Two types of linear regression models:
Simple – one risk factor.
Multiple – at least two risk factors