Linear Regression & Regression Diagnostics Flashcards

Systematically introducing and reinforce key concepts in Linear Regression, Regression Diagnostics, and Statistical Modeling

1
Q

What is statistical modeling?

A

A process of using mathematical models to represent real-world data relationships.

It helps understand and predict outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define inferential modeling.

A

The use of statistical techniques to make predictions or inferences about a population based on a sample.

Common in regression analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is linear regression?

A

A statistical method used to model the relationship between a dependent variable and one or more independent variables.

The equation is y = β₀ + β₁X + ε.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is simple linear regression?

A

A linear regression model with only one independent variable.

Example: Predicting house price based on square footage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is multiple linear regression?

A

A linear regression model with two or more independent variables.

Example: Predicting sales based on advertising spend and product price.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the slope in linear regression represent?

A

The rate at which the dependent variable changes with respect to the independent variable.

A positive slope indicates a positive relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fill in the blank:

The y-intercept (β₀) represents __________.

A

The predicted value of y when x = 0.

Often not meaningful in some contexts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the main assumption of simple linear regression?

A

That there is a linear relationship between the dependent and independent variable.

Checked using scatter plots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the error term (ε) in regression represent?

A

The difference between the observed and predicted values.

Also known as residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the coefficient of determination (R²)?

A

A measure of how well the model explains the variance in the dependent variable.

R² values range from 0 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Adjusted R²?

A

A modified R² that adjusts for the number of predictors in the model.

Unlike R², it penalizes adding unnecessary predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is residual analysis in regression?

A

The process of analyzing the differences between observed and predicted values.

Helps detect model issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a normal Q-Q plot used for?

A

Checking if residuals follow a normal distribution.

A straight 45-degree line suggests normality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is homoscedasticity?

A

When residuals have constant variance.

Checked using residual vs. fitted plots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is heteroscedasticity?

A

When residual variance is not constant.

Indicates a violation of homoscedasticity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you fix heteroscedasticity?

A

Apply transformations like:

  • log, square root
  • use weighted least squares regression.

Non-constant variance can distort regression results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is multicollinearity?

A

When two or more independent variables are highly correlated.

Leads to unstable coefficient estimates.

18
Q

How do you detect multicollinearity?

A

Using Variance Inflation Factor (VIF).

VIF > 5 or 10 suggests high multicollinearity.

19
Q

How can you fix multicollinearity?

A

Remove one of the correlated variables, use PCA, or ridge regression.

Keeping highly correlated features can distort model interpretation.

20
Q

What is the normality assumption in regression?

A

Residuals should be normally distributed.

Checked using Q-Q plots or Shapiro-Wilk test.

21
Q

What is the Durbin-Watson test used for?

A

Detecting autocorrelation in regression residuals.

A value close to 2 suggests no autocorrelation.

22
Q

What is Cook’s Distance?

A

A measure to identify influential outliers.

Points with Cook’s Distance >1 may be problematic.

23
Q

What is Ridge Regression?

A

A regression technique that adds an L2 penalty to shrink coefficients.

Helps in handling multicollinearity.

24
Q

What is LASSO Regression?

A

A regression technique that adds a L1 penalty, shrinking some coefficients to zero.

Helps with feature selection.

25
What is **Elastic Net Regression**?
A mix of **Ridge** and **LASSO** regression. ## Footnote Uses both L1 and L2 penalties.
26
What is the purpose of **polynomial regression**?
To model a **non-linear relationship using higher-degree terms**. ## Footnote Example: y = β₀ + β₁X + β₂X² + ε.
27
When should you use **logistic regression** instead of **linear regression?**
When predicting **a binary outcome**. ## Footnote Example: Predicting yes/no responses.
28
What is **cross-validation** in regression?
A method to assess **how well the model generalizes to new data**. ## Footnote Common technique: k-fold cross-validation.
29
How can **overfitting** be prevented in regression?
By using **regularization techniques** like *Ridge* or *LASSO*. ## Footnote Overfitting leads to poor generalization.
30
What is the main goal of **regression diagnostics**?
To verify if **model assumptions are met**. ## Footnote Helps improve model reliability.
31
What is the **F-test** used for in regression?
To check if **at least one predictor variabl**e is significantly contributing to the model. ## Footnote A low p-value (<0.05) suggests significance.
32
What does a **small p-value** for a regression coefficient mean?
That the **predictor variable is significantly** contributing to the model. ## Footnote Typically, p < 0.05 is considered significant.
33
What is **mean squared error (MSE)**?
The **average squared difference** between actual and predicted values. ## Footnote Lower MSE means better model performance.
34
What is **root mean squared error (RMSE)**?
The **square root of MSE, measuring average prediction error in the same units as the dependent variable.** ## Footnote More interpretable than MSE.
35
What does it mean if **residuals are large**?
The model’s predictions are **not** very accurate. ## Footnote Large residuals suggest potential outliers or a poor model fit.
36
What is the **main assumption** of the least squares method?
That **residuals are normally distributed** and **have constant variance.** ## Footnote Used for estimating regression coefficients.
37
Why should you be cautious about **extrapolating** in linear regression?
The model is **only valid** within the range of **observed data**. ## Footnote Predictions outside this range may be unreliable.
38
What is **Bayesian regression**?
A type of regression that **incorporates prior beliefs** using probability distributions. ## Footnote Used when dealing with small datasets or uncertainty.
39
What is **robust regression**?
A regression method that **reduces the influence of outliers.** ## Footnote More resistant to violations of normality and homoscedasticity.
40
What is the main advantage of using **StatsModels** for regression?
Provides detailed **statistical summaries** and **diagnostics.** ## Footnote Offers better inference than Scikit-learn.