3: Regression Flashcards

Question 1

Q

What is a multiple regression?

Answer

A

The regression analysis of a dataset with multiple independent variables.

Question 2

Q

What type of relationship can exist between dependent and independent variables in a regression analysis?

Answer

A

The relationship between the dependent variable and the independent variable(s), which is retrieved through a regression analysis, can be either linear or nonlinear.

Question 3

Q

What is the main objective in the linear and nonlinear regression?

Answer

A

The objective is to find the optimum values for the coefficients.

Question 4

Q

What steps does it follow?

Answer

A

In general, the regression analysis follows 4 steps: model selection, model fitting, model prediction, and model evaluation. (SFPE)

Question 5

Q

What is model selection about?

Answer

A

The choice of the model (i.e., relationship) shape. This means to decide whether the regression is linear or nonlinear, and simple or multiple.

Question 6

Q

What is model fitting about?

Answer

A

Finding the unknown coefficients of the chosen model.

Question 7

Q

What is model prediction about?

Answer

A

Estimating the target variable for some other hitherto unseen dataset elements.

Question 8

Q

What is model evaluation about?

Answer

A

Checking how close the model’s predictions are to the desired target values.

Question 9

Q

What is regression?

Answer

A

Regression is a supervised learning approach, where the given dataset is labeled (i.e., has one or more target variables). The aim of the regression analysis is to find an expectation for the relationship between the target variable (i.e., dependent variable) and the existent dependent variables within the dataset.

Question 10

Q

What is the residue?

Answer

A

the difference between the predicted value and the actual observation

Question 11

Q

What is the cost function in regression?

Answer

A

The evaluation metric for the simple regression model is the sum of the squared residues (i.e., sum of the squared errors ), which we aim to minimize.

Question 12

Q

What is the coefficient of determination (R, or its squared value R^2, ) employed for?

Answer

A

It is used to measure the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It tells you how well the regression model fits the data. It ranges from zero to one, where a value close to one indicates a good quality of fit of the regression model.

Question 13

Q

Please give three metrics that can be used for evaluating regression models.

Answer

A

R Squared, RMSE, MSE, and MAE

Question 14

Q

What is the logistic regression model?

Answer

A

In case the preconditions aren’t met (the dependent variable doesn’t follow a binominal distributon or takle on categorical values (yes, no). Example: Loan Approval; YES & NO.

Question 15

Q

Please give range of values of a logistic function.

Answer

A

The values of a logistic function will range from 0 to 1.

Question 16

Q

What is the difference in the analysis between the Quantile and linear regression?

Answer

Study These Flashcards

A

Instead of minimizing the sum of the squares of the residuals , the objective in quantile regression is to minimize a “weighted” sum of the absolute errors at each quantile of the dependent variable, as given below.

Question 17

Q

Please describe how quantile regression is performed on a given dataset.

Answer

Study These Flashcards

A

In quantile regression, the dependent variable is divided into segments (i.e., quantiles) from its lowest value to its highest value, and a linear regression model is developed for each quantile.

Question 18

Q

What are regularization techniques used for?

Answer

Study These Flashcards

A

To address overfitting and improve robustness, regularization techniques are used, which add a penalty term to the loss function.

Question 19

Q

What is Ridge Regression?

Answer

Study These Flashcards

A

Ridge regression adds a penalty term to the loss function that penalizes the sum of the squares of the model coefficients. This penalty is controlled by a constant, denoted as
𝜆. A higher value of 𝜆 places greater emphasis on reducing the magnitudes of the coefficients, which may result in higher residuals but helps to prevent overfitting.

Question 20

Q

Explain Ridge Regression in simple terms.

Answer

Study These Flashcards

A

Control with a Constant: There’s a constant (let’s call it λ) that controls how strict this penalty is. If λ is big, the model really wants to keep the coefficients small, even if it means that the predictions might not be perfect (it’s okay with some mistakes). Goal: The main goal is to keep the model from being too complex and to avoid overfitting, which happens when the model learns the training data too well but doesn’t perform well on new data. So, in simple terms, ridge regression helps keep the model simpler and more general by discouraging large coefficients, leading to better performance on new data.

Question 21

Q

What libary is used for regression?

Answer

Study These Flashcards

A

Scikit-learn

Question 22

Q

What is Lasso regression about?

Answer

Study These Flashcards

A

Lasso regression is similar to ridge regression but with one key difference: it minimizes the absolute values of the model’s coefficients rather than their squares. This approach reduces both large and small coefficients, often driving some coefficients to exactly zero. As a result, lasso regression creates simpler, sparse models by selecting only the most important variables. Choosing the right value of the regularization parameter 𝜆 is crucial, as it balances model complexity and performance, helping to prevent overfitting by evaluating the model across different 𝜆values.

Question 23

Q

Please explain lasso and ridge regularizations.

Answer

Study These Flashcards

A

L1 or LASSO regularization: Here, the absolute values of the coefficients are added to the cost function. This regularization technique gives sparse results, which leads to feature selection as well.L2 or Ridge regularization: Here, the squares of the coefficients are added to the cost function.

3: Regression Flashcards

(23 cards)