11 - Regression Modeling Flashcards
What are the two remaining tasks in the Modeling Phase?
Estimation task and Association task
What is the most widespread method for performing the estimation task?
Linear regression
What does simple linear regression approximate?
The relationship between a numeric predictor and a continuous target using a straight line
What does multiple regression modeling approximate?
The relationship between a set of p > 1 predictors and a single continuous target using a p-dimensional plane or hyperplane
What is the usual form of the multiple regression model equation?
y = β0 + β1x1 + β2x2 + … + βpxp
What do the x’s and β’s represent in the multiple regression model?
x’s represent predictor variables and β’s represent unknown model parameters
What approach does the Data Science Methodology use to validate model results?
Cross-validation
What does the regression equation in descriptive regression modeling look like?
ŷ = b0 + b1x1 + b2x2 + … + bpxp
What is the provisional regression equation for estimating Sales_per_Visit?
Sales per Visit = b0 + b1(Days) + b2(CC) + b3(Web)
What is the typical p-value cutoff for retaining variables in a regression model?
0.05
Which predictor variable was determined to not belong in the model?
Web Account
What is the final regression model after removing Web Account?
Sales per Visit = 73.6209 + 0.1637(Days) + 22.1357(CC)
How much does having a store credit card increase Sales per Visit?
$22.1357
What is the estimated increase in Sales per Visit for each additional day between purchases?
$0.1637
What method is used to fit a multiple regression model in Python?
Ordinary Least Squares (OLS)
Which command is used to add a constant term to predictor variables in Python?
sm.add_constant(X)
What is the purpose of the summary() command in regression modeling?
To obtain the results of the regression model
In R, how do you specify the formula for a linear regression model?
lm(formula = Target ~ Predictors, data = dataset)
What is the standard error of the estimate denoted as?
s
What is the Mean Absolute Error (MAE) used for?
To measure the average distance between actual and predicted values
What does R² represent in regression modeling?
The proportion of variability in the response accounted for by the predictors
What does Radj² penalize in multiple regression models?
Having too many unhelpful predictors
What was the Radj² value for the regression model discussed?
0.064
Fill in the blank: The estimated sales per visit for a customer base is $73.6209 plus _______ times the number of days between purchases.
$0.1637