Data Science using Python and R - 11 Flashcards

1
Q

What are the two remaining tasks in the Modeling Phase?

A

Estimation task and Association task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the most widespread method for performing the estimation task?

A

Linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does simple linear regression approximate?

A

The relationship between a numeric predictor and a continuous target using a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does multiple regression modeling approximate?

A

The relationship between a set of p > 1 predictors and a single continuous target using a p-dimensional plane or hyperplane

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the usual form of the multiple regression model equation?

A

y = β0 + β1x1 + β2x2 + … + βpxp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do the x’s and β’s represent in the multiple regression model?

A

x’s represent predictor variables and β’s represent unknown model parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What approach does the Data Science Methodology use to validate model results?

A

Cross-validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the regression equation in descriptive regression modeling look like?

A

ŷ = b0 + b1x1 + b2x2 + … + bpxp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the provisional regression equation for estimating Sales_per_Visit?

A

Sales per Visit = b0 + b1(Days) + b2(CC) + b3(Web)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the typical p-value cutoff for retaining variables in a regression model?

A

0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which predictor variable was determined to not belong in the model?

A

Web Account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the final regression model after removing Web Account?

A

Sales per Visit = 73.6209 + 0.1637(Days) + 22.1357(CC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How much does having a store credit card increase Sales per Visit?

A

$22.1357

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the estimated increase in Sales per Visit for each additional day between purchases?

A

$0.1637

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What method is used to fit a multiple regression model in Python?

A

Ordinary Least Squares (OLS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which command is used to add a constant term to predictor variables in Python?

A

sm.add_constant(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of the summary() command in regression modeling?

A

To obtain the results of the regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In R, how do you specify the formula for a linear regression model?

A

lm(formula = Target ~ Predictors, data = dataset)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the standard error of the estimate denoted as?

A

s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the Mean Absolute Error (MAE) used for?

A

To measure the average distance between actual and predicted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does R² represent in regression modeling?

A

The proportion of variability in the response accounted for by the predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does Radj² penalize in multiple regression models?

A

Having too many unhelpful predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What was the Radj² value for the regression model discussed?

A

0.064

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Fill in the blank: The estimated sales per visit for a customer base is $73.6209 plus _______ times the number of days between purchases.

A

$0.1637

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

True or False: The typical size of the prediction error is given by the statistic s.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does Radj² represent in a regression model?

A

The proportion of the variability in the response accounted for by the predictors, adjusted for the number of predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the value of Radj² in the regression model discussed?

28
Q

What does a Radj² value of 0.064 indicate about the model’s predictors?

A

6.4% of the variability in Sales per Visit is accounted for by the predictors Days since Purchase and Credit Card.

29
Q

How do you specify variable values for a customer in a Python regression model?

A

Use the np.column_stack() command with the values in the order Constant, CC, Days.

30
Q

What command is used to predict sales using the regression model in Python?

A

model02.predict(cust01)

31
Q

How do you obtain predicted values for all customers in the test data set in Python?

A

Change the input of the predict() command to the test data predictor variable data frame, X_test.

32
Q

What is the formula to calculate the standard error of the estimate in Python?

A

np.sqrt(model02.scale)

33
Q

What is the purpose of the MAE in regression analysis?

A

To measure the average magnitude of the errors between predicted and actual values.

34
Q

What is the calculated MAE from the regression model discussed?

35
Q

What command is used to run stepwise regression in R?

A

stepAIC(object = model01)

36
Q

What does stepwise regression do?

A

It adds helpful predictors into the model one at a time and checks if they all still belong.

37
Q

What is a common baseline model for regression comparisons?

A

The simple y = y model.

38
Q

How do you compute MAE for the baseline model?

A

MAE = Error / n

39
Q

What conclusion can be drawn if MAERegression is less than MAEBaseline?

A

The regression model outperforms the baseline model.

40
Q

What is the value of MAEBaseline calculated in the example?

41
Q

What is the relationship between Customer A and Customer B in terms of estimated sales per visit?

A

Customer A has 100 more days between purchases than Customer B.

42
Q

What is the significance of the variable names in the data frame when predicting in R?

A

They must exactly match the names of the predictor variables in the model.

43
Q

True or False: Stepwise regression always finds the optimal set of predictors.

44
Q

What is the first step to perform stepwise regression using R?

A

Install and open the MASS package.

45
Q

What command in R generates the predicted number of sales per visit?

A

predict(object = model02, newdata = cust01)

46
Q

What does the ‘Residual standard error’ indicate in the output from summary() in R?

A

It represents the standard error of the estimate.

47
Q

What is the typical result of using stepwise regression on a dataset?

A

It converges on a final model by removing unhelpful predictors.

48
Q

What is the MAE calculated from the regression model based on the test data set?

49
Q

What does a higher MAE for the baseline model indicate?

A

It indicates that the baseline model performed worse than the regression model.

50
Q

In stepwise regression, what happens if a previously helpful predictor is added that is no longer helpful?

A

It may be dropped from the model.

51
Q

What is the importance of verifying that the regression model outperforms the baseline model?

A

To ensure that the predictors are helpful in estimating the response.

52
Q

What is the command to calculate MAE in R using the MLmetrics package?

A

MAE(y_pred = ypred, y_true = ytrue)

53
Q

What is the significance of the variable ‘Web’ in the context of the regression model?

A

It may not belong in the model based on stepwise regression results.

54
Q

What does the command summary(model01_step) provide?

A

It gives the full summary of the final model after stepwise regression.

55
Q

What should you do after running a regression model with the training set?

A

Validate the model using the test data set.

56
Q

What does the coefficient for Sex represent in regression analysis?

A

The coefficient for Sex indicates the expected change in the dependent variable for a one-unit increase in the Sex variable, holding all other variables constant.

57
Q

What does the coefficient for Education indicate in regression analysis?

A

The coefficient for Education shows the expected change in the dependent variable for a one-unit increase in the Education variable, holding all other variables constant.

58
Q

What is the significance of the value of s² in regression analysis?

A

The value of s² represents the variance of the residuals, indicating how much the predicted values deviate from the actual values.

59
Q

What does Radj represent in regression analysis?

A

Radj (Adjusted R-squared) measures the proportion of variance explained by the model while adjusting for the number of predictors used.

60
Q

How do you determine if the regression model outperformed its baseline model?

A

Compare MAEBaseline and MAERegression; if MAERegression is lower than MAEBaseline, the regression model outperformed the baseline.

61
Q

What is the purpose of using the white_wine_training and white_wine_test data sets?

A

These data sets are used to train and validate the regression model predicting Quality based on Alcohol and Sugar.

62
Q

What should you do after running a regression predicting Quality based on Alcohol and Sugar?

A

Obtain a summary of the model to evaluate whether both predictors belong in the model.

63
Q

Complete the sentence: ‘The estimated Quality equals….’

A

The estimated Quality equals the regression equation using the coefficients for Alcohol and Sugar.

64
Q

What does the coefficient for Alcohol indicate in the regression model?

A

The coefficient for Alcohol indicates the expected change in Quality for a one-unit increase in Alcohol, holding Sugar constant.

65
Q

What does the coefficient for Sugar indicate in the regression model?

A

The coefficient for Sugar indicates the expected change in Quality for a one-unit increase in Sugar, holding Alcohol constant.