Multiple linear regression Flashcards
What is Multiple Linear Regression (MLR)?
MLR models the relationship between one dependent variable (y) and multiple independent variables (x1, x2, …, xn)
The equation for a multiple regression model is not provided in the text.
What is the difference between Simple Linear Regression and Multiple Linear Regression?
Simple Linear Regression involves one dependent variable (y) and one independent variable (x), while MLR involves one dependent variable and multiple independent variables
Simple Linear Regression is a foundational concept for understanding MLR.
What is confounding in the context of regression analysis?
Confounding occurs when the relationship between an explanatory variable and an outcome is distorted by another variable
Example: Free time influences both exercise and weight.
How can confounding affect the interpretation of regression results?
Confounding can lead to attributing the entire observed relationship to causation when it’s actually influenced by another variable
Example: Increased free time leads to more exercise and reduced weight, which can falsely suggest exercise alone causes weight loss.
What are the steps to identify confounding in regression models?
- Check if the confounder is associated with both the independent and dependent variables. 2. Use regression models to quantify these associations
Example: Analyzing the relationship between poverty, crime rate, and percentage of single parents.
What is the significance of sample size in multiple regression models?
A sufficient sample size is crucial; typically, at least 10 observations per variable are recommended
This helps eliminate bias and improve causal inference.
What does the Coefficient of Determination (R²) represent in multiple regression?
R² indicates the proportion of variance in the dependent variable that can be explained by the independent variables
A higher R² value suggests a better fit of the model.
What is the first assumption of Multiple Regression?
Linearity: The relationship between the dependent variable and each continuous independent variable must be linear.
What is the second assumption of Multiple Regression?
Normal Distribution of Residuals: The residuals should be normally distributed.
What is the third assumption of Multiple Regression?
Homoscedasticity: Residuals should have constant variance across all predicted values.
What is the fourth assumption of Multiple Regression?
Independence of Observations: Each observation in the dataset must be independent of others.
How can you check for homoscedasticity in regression analysis?
By plotting standardized residuals against standardized predicted values and looking for a random scatter with no discernible pattern.
What is the implication of finding heteroscedasticity in a regression model?
It indicates that the variance of residuals is not constant, which violates the assumption of homoscedasticity
Remedies include transforming the dependent variable or using robust standard errors.
In the context of regression, what does a P-P plot indicate?
A P-P plot helps assess the normality of residuals by showing how closely the points align with the diagonal
Points closely aligned with the diagonal support the assumption of normality.
What is a practical implication of violations in regression assumptions?
Violations can lead to inaccurate conclusions and necessitate adjustments in the model.