Supervised - regression Flashcards

Question 1

Q

What is Regression analysis

Answer

A

is a type of predictive modeling technique
which is used to find the relationship between a dependent variable (usually known as the “Y” variable) and either one
independent variable (the “X” variable) or a series of
independent variables

Question 2

Q

objective/ what is it for: Regression model

Answer

A

estimates the nature of the relationship
between the independent and dependent variables
❖ Strength of the relationship and its significance!

Question 3

Q

Explanatory variables?

Answer

A

independent variables

Question 4

Q

variables to be explained?

Answer

A

dependent variables

Question 5

Q

Regression analysis can be used for

Answer

A

Predicting specific outcomes from changes, like estimating production needs for a product.
Projecting future trends or values, such as forecasting stock prices.
Assessing the influence of different factors on an outcome, like measuring the impact of advertising on sales during an event.

Question 6

Q

The simplest model is

Answer

A

linear regression

Question 7

Q

linear regression objective

Answer

A

Fit data with the best hyper-plane which “goes through” the points

Question 8

Q

in the following regression mode: y = b0 + b1x +e:
1. The relationship between x and y is
2. what are the Two parameters to estimate
3. what is e

Answer

A

a linear or straight-line relationship
The slope of the line β1
and the y-intercept
β0( least squares)
is the unexplained, random, or error component

Question 9

Q

The estimates are determined by

Answer

A

Drawing a sample from the population
of interest
❖ Calculating sample statistics
❖ Producing a straight line that cuts into
the data

Question 10

Q

The best line is

Answer

A

the one that minimizes the Sum of Squared
Differences (SSD) between the points and the line

Question 11

Q

The regression line’s representation of the data is evaluated through several methods

Answer

A

Coefficient of Determination (R-squared): This metric quantifies how much of the variance in the dependent variable is explained by the independent variable. Higher R-squared values, closer to 1, indicate better explanatory power.

Residual Analysis: By examining the differences between actual and predicted values (residuals), we assess how well the model captures the data’s variability. Smaller residuals suggest a better fit.

Visual Inspection: Plotting the regression line on a scatter plot allows for a visual assessment of its fit. A good fit is indicated by the line passing through the center of data points, capturing the overall trend.

Significance of Regression Coefficients: Evaluating the significance of coefficients, particularly the slope, determines if the relationship between variables is statistically meaningful. If coefficients are significant, the model provides valid insights into the relationship between X and Y.

Question 12

Q

what is Coefficient of Determination

Answer

A

measure of how well the
regression line represents the data
❖ The percentage of variability in Y that can be explained by
variability in X
❖ The further the line is away from the points, the less it can explain the
variability
❖ The Coefficient of Determination lies between 0 and 1

Question 13

Q

Outliers, Investigate possibilities, Identify outliers from, residual, standard residual

Answer

A

Outliers: Unusually small or large observations
Investigate possibilities: recording error, sample membership, validity
Identify outliers from scatter diagram
Suspect outlier if |standard residual| > 2
Residual: Difference between actual value and estimated value
Standard residual: Residual divided by standard deviation of residuals

Question 14

Q

One of the most challenging aspects of machine learning is

Answer

A

finding the
right set of features, or variables, that can accurately capture the
relationship between inputs and outputs

Question 15

Q

define feature selection

Answer

A

is the process of selecting a subset of relevant features
from the original set of features to improve model performance
❖ In essence, it is about identifying the most informative features that can
help the model make accurate predictions

Question 16

Q

popular techniques for feature selection

Answer

Study These Flashcards

A

stepwise
regression

Question 17

Q

what is stepwise regression

Answer

Study These Flashcards

A

a method that iteratively adds or removes
features from a model based on their statistical significance

Question 18

Q

when is stepwise stopped

Answer

Study These Flashcards

A

repeated until a set of features that maximizes the model performance is
identified

Question 19

Q

advantages of stepwise

Answer

Study These Flashcards

A

when dealing with a large
number of features, as it can help to reduce the number of features in the
model without sacrificing accuracy

Question 20

Q

limitation of stepwise

Answer

Study These Flashcards

A

assumes that the relationship between
the features and the target variable is linear, which may not always be the
case in real-world scenarios

Question 21

Q

Types of Stepwise Regression

Answer

Study These Flashcards

A

Forward selection: Starts with empty feature set, adds most statistically significant feature iteratively until model performance can’t be improved further.
Backward elimination: Begins with full feature set, removes least statistically significant feature iteratively until model performance can’t be improved further.
Bidirectional elimination: Combines forward and backward selection, alternates between adding and removing features until no further improvements in model performance can be made.

Question 22

Q

Forward Selection steps

Answer

Study These Flashcards

A

Start with empty or intercept-only model.
Conduct separate regressions for each predictor.
Select predictor with strongest relationship to target.
Forward Selection:
- Add selected predictor to model.
- Repeat with remaining predictors.
- Stop based on predefined rule or validation metrics.

Question 23

Q

Answer

Study These Flashcards

A

Supervised - regression Flashcards

(23 cards)