Linear Regression Theory Flashcards
What is Linear Regression?
Linear Regression is a supervised learning algorithm used for predicting a continuous (numerical) output based on one or more input features. It models the relationship between the dependent variable (Y) and independent variable(s) (X) by fitting a straight line
What is the equation of Linear Regression
y = mx + c
where
y=dependent variable
x=independent variable
m=Slope (change in Y per unit increase in X)
c=Intercept (value of Y when X = 0)
What is the basic idea of Linear Regression?
The key idea in Linear Regression is to find the optimal values of b0, b1, b2… such that the predicted values y^ align as closely as possible with the actual values y, minimizing the error.
Here slope m will adjust the steepness of the straight line
Intercept c Determines the starting position of the line (where it crosses the Y-axis).
We can find the minimal m and c values by Gradient Descent.
What are the 5 key assumptions of Linear Regression?
Linearity – The relationship between X and Y is linear.
Independence – Residuals are independent.
Homoscedasticity – Constant variance of residuals.
Normality of Errors – Residuals are normally distributed.
No Multicollinearity – Independent variables are not highly correlated.
What is the cost function used in Linear Regression?
Mean Squared Error (MSE)
J(θ)= 1/n ∑ ( y - y^ )2
What are the two main types of Linear Regression?
Simple Linear Regression – One independent variable.
Multiple Linear Regression – More than one independent variable.
What are key evaluation metrics for Linear Regression?
MSE (Mean Squared Error)
RMSE (Root Mean Squared Error)
R² Score (Coefficient of Determination)
Adjusted R² (for multiple regression models)
What is Overfitting and Underfitting?
Overfitting: The model learns noise, performs well on training but poorly on new data.
Underfitting: The model is too simple and fails to capture patterns in the data.
What is the Bias-Variance Tradeoff?
High Bias = Underfitting
High Variance = Overfitting
The goal is to find a balance between both.
How can you detect multicollinearity?
Variance Inflation Factor (VIF):
If VIF > 5, it indicates high multicollinearity.
Correlation Matrix:
High correlations between independent variables suggest multicollinearity.
How do outliers affect Linear Regression?
They can distort coefficients and predictions by pulling the straight line towards it.
Solutions: Remove outliers, use Robust Regression, or apply Log Transformation.
What are the three types of Gradient Descent?
Batch Gradient Descent – Uses the entire dataset for each update.
Stochastic Gradient Descent (SGD) – Updates after each individual data point.
Mini-Batch Gradient Descent – Uses small random batches for updates.
What is the purpose of regularization in Linear Regression?
Regularization helps prevent overfitting by adding a penalty to the loss function, which reduces the magnitude of the regression coefficients.
What does Ridge Regression L2 do?
Ridge Regression adds an L2 penalty (sum of squared coefficients) to shrink the coefficients, reducing model complexity and preventing overfitting.
What happens when the regularization parameter λ is increased in Ridge Regression?
Higher λ → Coefficients shrink more so the model is turns too simple.
High bias, low variance (underfitting risk).
Lower λ → Coefficients remain similar to standard regression, no effect on straight line
Low bias, high variance (overfitting risk
λ = 0 → Ridge Regression becomes ordinary least squares (OLS).
Need to find a balanced λ to find a generalized model
What is Lasso Regression?
Lasso Regression adds an L1 penalty (sum of absolute coefficients) that forces some coefficients to become exactly zero, performing feature selection.
What is the importance of coefficients in Linear Regression?
They can tell the impact of particular feature in determining the target variable
What happens when the regularization parameter λ is increased in Lasso Regression?
Higher λ → More coefficients become exactly zero, leading to feature selection.
Lower λ → Less shrinkage, similar to standard regression.
λ = 0 → Lasso Regression becomes ordinary least squares (OLS).