Linear Regression Flashcards
What is Linear Regression?
Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation.
What is the equation of Simple Linear Regression?
The equation is Y = β0 + β1X + ε, where Y is the dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope, and ε is the error term.
What is the difference between Simple and Multiple Linear Regression?
Simple Linear Regression has one independent variable, while Multiple Linear Regression has two or more independent variables.
What is the objective of Linear Regression?
The objective is to find the best-fitting line that minimizes the error between predicted and actual values, usually using the least squares method.
What is the cost function used in Linear Regression?
Linear Regression uses the Mean Squared Error (MSE) as the cost function: MSE = (1/n) * Σ (y_i - ŷ_i)^2.
How do you implement Linear Regression using Scikit-Learn?
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test)
What does the fit() method do in Scikit-Learn’s LinearRegression?
It trains the model by finding the optimal coefficients (weights) for the linear equation.
What does the predict() method do in Scikit-Learn’s LinearRegression?
It uses the trained model to predict output values for given input features.
What is R-squared (R²) in Linear Regression?
R² measures how well the model explains the variance in the dependent variable. A value close to 1 indicates a good fit.
What are the assumptions of Linear Regression?
Linear Regression assumes: 1) Linearity, 2) Independence, 3) Homoscedasticity, 4) Normality of residuals, and 5) No multicollinearity.
What is Multicollinearity in Linear Regression?
Multicollinearity occurs when independent variables are highly correlated, making it difficult to determine their individual effects on the dependent variable.
How can you detect Multicollinearity?
Using the Variance Inflation Factor (VIF). A VIF > 10 suggests high multicollinearity.
How can you handle Multicollinearity?
By removing highly correlated features, using Principal Component Analysis (PCA), or Ridge Regression.
What is the difference between OLS and Gradient Descent?
Ordinary Least Squares (OLS) calculates optimal coefficients directly, while Gradient Descent iteratively adjusts coefficients to minimize the cost function.
How do you evaluate a Linear Regression model?
Using metrics like R², Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).
How do you check for Homoscedasticity?
By plotting residuals vs. fitted values. A random pattern indicates homoscedasticity, while a funnel shape suggests heteroscedasticity.
What is the purpose of the intercept in Linear Regression?
The intercept (β0) represents the expected value of Y when all independent variables are zero.
How do you extract model coefficients in Scikit-Learn?
print(model.coef_) # Prints the slope coefficients print(model.intercept_) # Prints the intercept
How do you split data into training and testing sets in Scikit-Learn?
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
What is Ridge Regression and how does it help?
Ridge Regression is a type of Linear Regression that includes an L2 penalty to reduce overfitting by shrinking coefficients.