3: Regression Flashcards

1
Q

What is a multiple regression?

A

The regression analysis of a dataset with multiple independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of relationship can exist between dependent and independent variables in a regression analysis?

A

The relationship between the dependent variable and the independent variable(s), which is retrieved through a regression analysis, can be either linear or nonlinear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the main objective in the linear and nonlinear regression?

A

The objective is to find the optimum values for the coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What steps does it follow?

A

In general, the regression analysis follows 4 steps: model selection, model fitting, model prediction, and model evaluation. (SFPE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is model selection about?

A

The choice of the model (i.e., relationship) shape. This means to decide whether the regression is linear or nonlinear, and simple or multiple.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is model fitting about?

A

Finding the unknown coefficients of the chosen model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is model prediction about?

A

Estimating the target variable for some other hitherto unseen dataset elements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is model evaluation about?

A

Checking how close the model’s predictions are to the desired target values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is regression?

A

Regression is a supervised learning approach, where the given dataset is labeled (i.e., has one or more target variables). The aim of the regression analysis is to find an expectation for the relationship between the target variable (i.e., dependent variable) and the existent dependent variables within the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the residue?

A

the difference between the predicted value and the actual observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the cost function in regression?

A

The evaluation metric for the simple regression model is the sum of the squared residues (i.e., sum of the squared errors ), which we aim to minimize.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the coefficient of determination (R, or its squared value R^2, ) employed for?

A

It is used to measure the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It tells you how well the regression model fits the data. It ranges from zero to one, where a value close to one indicates a good quality of fit of the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Please give three metrics that can be used for evaluating regression models.

A

R Squared, RMSE, MSE, and MAE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the logistic regression model?

A

In case the preconditions aren’t met (the dependent variable doesn’t follow a binominal distributon or takle on categorical values (yes, no). Example: Loan Approval; YES & NO.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Please give range of values of a logistic function.

A

The values of a logistic function will range from 0 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference in the analysis between the Quantile and linear regression?

A

Instead of minimizing the sum of the squares of the residuals , the objective in quantile regression is to minimize a “weighted” sum of the absolute errors at each quantile of the dependent variable, as given below.

17
Q

Please describe how quantile regression is performed on a given dataset.

A

In quantile regression, the dependent variable is divided into segments (i.e., quantiles) from its lowest value to its highest value, and a linear regression model is developed for each quantile.

18
Q

What are regularization techniques used for?

A

To address overfitting and improve robustness, regularization techniques are used, which add a penalty term to the loss function.

19
Q

What is Ridge Regression?

A

Ridge regression adds a penalty term to the loss function that penalizes the sum of the squares of the model coefficients. This penalty is controlled by a constant, denoted as
𝜆. A higher value of 𝜆 places greater emphasis on reducing the magnitudes of the coefficients, which may result in higher residuals but helps to prevent overfitting.

20
Q

Explain Ridge Regression in simple terms.

A

Control with a Constant: There’s a constant (let’s call it λ) that controls how strict this penalty is. If λ is big, the model really wants to keep the coefficients small, even if it means that the predictions might not be perfect (it’s okay with some mistakes). Goal: The main goal is to keep the model from being too complex and to avoid overfitting, which happens when the model learns the training data too well but doesn’t perform well on new data. So, in simple terms, ridge regression helps keep the model simpler and more general by discouraging large coefficients, leading to better performance on new data.

21
Q

What libary is used for regression?

A

Scikit-learn

22
Q

What is Lasso regression about?

A

Lasso regression is similar to ridge regression but with one key difference: it minimizes the absolute values of the model’s coefficients rather than their squares. This approach reduces both large and small coefficients, often driving some coefficients to exactly zero. As a result, lasso regression creates simpler, sparse models by selecting only the most important variables. Choosing the right value of the regularization parameter 𝜆 is crucial, as it balances model complexity and performance, helping to prevent overfitting by evaluating the model across different 𝜆values.

23
Q

Please explain lasso and ridge regularizations.

A

L1 or LASSO regularization: Here, the absolute values of the coefficients are added to the cost function. This regularization technique gives sparse results, which leads to feature selection as well.L2 or Ridge regularization: Here, the squares of the coefficients are added to the cost function.