Linear Regression & Regression Diagnostics Flashcards Flashcards by Ikuro Njung'e

What is statistical modeling?

A process of using mathematical models to represent real-world data relationships.

It helps understand and predict outcomes.

How well did you know this?

Not at all

Perfectly

Define inferential modeling.

The use of statistical techniques to make predictions or inferences about a population based on a sample.

Common in regression analysis.

How well did you know this?

Not at all

Perfectly

What is linear regression?

A statistical method used to model the relationship between a dependent variable and one or more independent variables.

The equation is y = β₀ + β₁X + ε.

How well did you know this?

Not at all

Perfectly

What is simple linear regression?

A linear regression model with only one independent variable.

Example: Predicting house price based on square footage.

How well did you know this?

Not at all

Perfectly

What is multiple linear regression?

A linear regression model with two or more independent variables.

Example: Predicting sales based on advertising spend and product price.

How well did you know this?

Not at all

Perfectly

What does the slope in linear regression represent?

The rate at which the dependent variable changes with respect to the independent variable.

A positive slope indicates a positive relationship.

How well did you know this?

Not at all

Perfectly

Fill in the blank:

The y-intercept (β₀) represents __________.

The predicted value of y when x = 0.

Often not meaningful in some contexts.

How well did you know this?

Not at all

Perfectly

What is the main assumption of simple linear regression?

That there is a linear relationship between the dependent and independent variable.

Checked using scatter plots.

How well did you know this?

Not at all

Perfectly

What does the error term (ε) in regression represent?

The difference between the observed and predicted values.

Also known as residuals.

How well did you know this?

Not at all

Perfectly

What is the coefficient of determination (R²)?

A measure of how well the model explains the variance in the dependent variable.

R² values range from 0 to 1.

How well did you know this?

Not at all

Perfectly

What is Adjusted R²?

A modified R² that adjusts for the number of predictors in the model.

Unlike R², it penalizes adding unnecessary predictors.

How well did you know this?

Not at all

Perfectly

What is residual analysis in regression?

The process of analyzing the differences between observed and predicted values.

Helps detect model issues.

How well did you know this?

Not at all

Perfectly

What is a normal Q-Q plot used for?

Checking if residuals follow a normal distribution.

A straight 45-degree line suggests normality.

How well did you know this?

Not at all

Perfectly

What is homoscedasticity?

When residuals have constant variance.

Checked using residual vs. fitted plots.

How well did you know this?

Not at all

Perfectly

What is heteroscedasticity?

When residual variance is not constant.

Indicates a violation of homoscedasticity.

How well did you know this?

Not at all

Perfectly

How can you fix heteroscedasticity?

Apply transformations like:

log, square root
use weighted least squares regression.

Non-constant variance can distort regression results.

How well did you know this?

Not at all

Perfectly

What is multicollinearity?

Study These Flashcards

When two or more independent variables are highly correlated.

Leads to unstable coefficient estimates.

How do you detect multicollinearity?

Study These Flashcards

Using Variance Inflation Factor (VIF).

VIF > 5 or 10 suggests high multicollinearity.

How can you fix multicollinearity?

Study These Flashcards

Remove one of the correlated variables, use PCA, or ridge regression.

Keeping highly correlated features can distort model interpretation.

What is the normality assumption in regression?

Study These Flashcards

Residuals should be normally distributed.

Checked using Q-Q plots or Shapiro-Wilk test.

What is the Durbin-Watson test used for?

Study These Flashcards

Detecting autocorrelation in regression residuals.

A value close to 2 suggests no autocorrelation.

What is Cook’s Distance?

Study These Flashcards

A measure to identify influential outliers.

Points with Cook’s Distance >1 may be problematic.

What is Ridge Regression?

Study These Flashcards

A regression technique that adds an L2 penalty to shrink coefficients.

Helps in handling multicollinearity.

What is LASSO Regression?

Study These Flashcards

A regression technique that adds a L1 penalty, shrinking some coefficients to zero.

Helps with feature selection.

What is **Elastic Net Regression**?

A mix of **Ridge** and **LASSO** regression. ## Footnote Uses both L1 and L2 penalties.

What is the purpose of **polynomial regression**?

To model a **non-linear relationship using higher-degree terms**. ## Footnote Example: y = β₀ + β₁X + β₂X² + ε.

When should you use **logistic regression** instead of **linear regression?**

When predicting **a binary outcome**. ## Footnote Example: Predicting yes/no responses.

What is **cross-validation** in regression?

A method to assess **how well the model generalizes to new data**. ## Footnote Common technique: k-fold cross-validation.

How can **overfitting** be prevented in regression?

By using **regularization techniques** like *Ridge* or *LASSO*. ## Footnote Overfitting leads to poor generalization.

What is the main goal of **regression diagnostics**?

To verify if **model assumptions are met**. ## Footnote Helps improve model reliability.

What is the **F-test** used for in regression?

To check if **at least one predictor variabl**e is significantly contributing to the model. ## Footnote A low p-value (<0.05) suggests significance.

What does a **small p-value** for a regression coefficient mean?

That the **predictor variable is significantly** contributing to the model. ## Footnote Typically, p < 0.05 is considered significant.

What is **mean squared error (MSE)**?

The **average squared difference** between actual and predicted values. ## Footnote Lower MSE means better model performance.

What is **root mean squared error (RMSE)**?

The **square root of MSE, measuring average prediction error in the same units as the dependent variable.** ## Footnote More interpretable than MSE.

What does it mean if **residuals are large**?

The model’s predictions are **not** very accurate. ## Footnote Large residuals suggest potential outliers or a poor model fit.

What is the **main assumption** of the least squares method?

That **residuals are normally distributed** and **have constant variance.** ## Footnote Used for estimating regression coefficients.

Why should you be cautious about **extrapolating** in linear regression?

The model is **only valid** within the range of **observed data**. ## Footnote Predictions outside this range may be unreliable.

What is **Bayesian regression**?

A type of regression that **incorporates prior beliefs** using probability distributions. ## Footnote Used when dealing with small datasets or uncertainty.

What is **robust regression**?

A regression method that **reduces the influence of outliers.** ## Footnote More resistant to violations of normality and homoscedasticity.

What is the main advantage of using **StatsModels** for regression?

Provides detailed **statistical summaries** and **diagnostics.** ## Footnote Offers better inference than Scikit-learn.

Linear Regression & Regression Diagnostics Flashcards

Systematically introducing and reinforce key concepts in Linear Regression, Regression Diagnostics, and Statistical Modeling (40 cards)