Supervised - regression Flashcards

1
Q

What is Regression analysis

A

is a type of predictive modeling technique
which is used to find the relationship between a dependent variable (usually known as the “Y” variable) and either one
independent variable (the “X” variable) or a series of
independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

objective/ what is it for: Regression model

A

estimates the nature of the relationship
between the independent and dependent variables
❖ Strength of the relationship and its significance!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explanatory variables?

A

independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

variables to be explained?

A

dependent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regression analysis can be used for

A
  1. Predicting specific outcomes from changes, like estimating production needs for a product.
  2. Projecting future trends or values, such as forecasting stock prices.
  3. Assessing the influence of different factors on an outcome, like measuring the impact of advertising on sales during an event.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The simplest model is

A

linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

linear regression objective

A

Fit data with the best hyper-plane which “goes through” the points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

in the following regression mode: y = b0 + b1x +e:
1. The relationship between x and y is
2. what are the Two parameters to estimate
3. what is e

A
  • a linear or straight-line relationship
  • The slope of the line β1
    and the y-intercept
    β0( least squares)
  • is the unexplained, random, or error component
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The estimates are determined by

A

Drawing a sample from the population
of interest
❖ Calculating sample statistics
❖ Producing a straight line that cuts into
the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The best line is

A

the one that minimizes the Sum of Squared
Differences (SSD) between the points and the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The regression line’s representation of the data is evaluated through several methods

A

Coefficient of Determination (R-squared): This metric quantifies how much of the variance in the dependent variable is explained by the independent variable. Higher R-squared values, closer to 1, indicate better explanatory power.

Residual Analysis: By examining the differences between actual and predicted values (residuals), we assess how well the model captures the data’s variability. Smaller residuals suggest a better fit.

Visual Inspection: Plotting the regression line on a scatter plot allows for a visual assessment of its fit. A good fit is indicated by the line passing through the center of data points, capturing the overall trend.

Significance of Regression Coefficients: Evaluating the significance of coefficients, particularly the slope, determines if the relationship between variables is statistically meaningful. If coefficients are significant, the model provides valid insights into the relationship between X and Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is Coefficient of Determination

A

measure of how well the
regression line represents the data
❖ The percentage of variability in Y that can be explained by
variability in X
❖ The further the line is away from the points, the less it can explain the
variability
❖ The Coefficient of Determination lies between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outliers, Investigate possibilities, Identify outliers from, residual, standard residual

A

Outliers: Unusually small or large observations
Investigate possibilities: recording error, sample membership, validity
Identify outliers from scatter diagram
Suspect outlier if |standard residual| > 2
Residual: Difference between actual value and estimated value
Standard residual: Residual divided by standard deviation of residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

One of the most challenging aspects of machine learning is

A

finding the
right set of features, or variables, that can accurately capture the
relationship between inputs and outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define feature selection

A

is the process of selecting a subset of relevant features
from the original set of features to improve model performance
❖ In essence, it is about identifying the most informative features that can
help the model make accurate predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

popular techniques for feature selection

A

stepwise
regression

17
Q

what is stepwise regression

A

a method that iteratively adds or removes
features from a model based on their statistical significance

18
Q

when is stepwise stopped

A

repeated until a set of features that maximizes the model performance is
identified

19
Q

advantages of stepwise

A

when dealing with a large
number of features, as it can help to reduce the number of features in the
model without sacrificing accuracy

20
Q

limitation of stepwise

A

assumes that the relationship between
the features and the target variable is linear, which may not always be the
case in real-world scenarios

21
Q

Types of Stepwise Regression

A
  • Forward selection: Starts with empty feature set, adds most statistically significant feature iteratively until model performance can’t be improved further.
  • Backward elimination: Begins with full feature set, removes least statistically significant feature iteratively until model performance can’t be improved further.
  • Bidirectional elimination: Combines forward and backward selection, alternates between adding and removing features until no further improvements in model performance can be made.
22
Q

Forward Selection steps

A
  • Start with empty or intercept-only model.
  • Conduct separate regressions for each predictor.
  • Select predictor with strongest relationship to target.
  • Forward Selection:
    • Add selected predictor to model.
    • Repeat with remaining predictors.
    • Stop based on predefined rule or validation metrics.
23
Q
A