Section 1: General Model Building Steps Flashcards

Question

examples of loss functions

Answer 1

a) Square Loss (most common for numeric targets) b) Absolute loss c) Zero-one loss (mostly for categorical variables)

Answer 2

table showing prediction versus reference (actual) counts

Answer 3

proportion of correctly classified obs

Answer 4

proportion of misclassified obs.

Answer 5

proportion of +ve obs. correctly classified as +ve

Answer 6

proportion of -ve obs. correctly classified as -ve

Answer 7

proportion of +ve predictions truly belonging to +ve class

Answer 8

accuracy = n_/n (specificity) + n+/n (sensitivity)

Answer 9

a) Model assessment (able to preform without a test set) b) Hyperparameter tuning

Answer 10

parameters with values supplied in advance; not optimized by the model fitting algorithm

Answer 11

a) prediction performance b) interpretability c) ease of implementation

Answer 12

a) classifier implicitly places more weight on the majority class and tries to fit those observations well, but the minority class may be the +ve class b) a high accuracy can be deceptive

Answer 13

a) Undersampling - keep all obs. from minority class but draw fewer obs. from majority class (con: you have less data) b) Oversampling - keep all obs. from majority class, but draw more observations from minority class (con: more data, computational burden)

Answer 14

+ve class becomes more prevalent in the balanced data --> predicted probabilities for +ve class will increase --> for a fixed cutoff, sensitivity increases but specificity decrease

Answer 15

Model is trying too hard to capture not only the signal, but also the noise specific to the training data

Answer 16

Small training error, but large test error

Answer 17

An overfitted model fits training data well, but does no generalize well to new, unseen data (poor predictions)

Answer 18

as complexity increases, variance increases, bias decreases, training error decreases, and the test error has a U-shape

Answer 19

difference between the expected value of the predication and the true value

Answer 20

amount of variability of the prediction

Answer 21

part of the test error caused by the model not being flexible enough to capture the signal (underfitting)

Answer 22

part of the test error caused by the model being too complex (overfitting)

Answer 23

specific to categorical variables

Answer 24

applies to both numeric and categorical variables

Answer 25

two categorical variables can always be ordered by dimension

Answer 26

not always possible to order two variables by granularity

Answer 27

to check that the selected model has no obvious deficiencies and the model assumptions are largely satisfied

Answer 28

1) (purely random) have no systematic patterns 2) (homoscedasticity) have approximately constant variance upon standardization 3) (normality) be approximately normal (for most target distributions)

Answer 29

1) predicted vs actual values of the target - the two sets of values should be close (can check this quantitatively or graphically) 2) benchmark model - show that the recommended model outpreforms a benchmark model, if one exists (e.g., intercept-only GLM, purely random classifier), on the test set

Answer 30

1) Adjust the business problem - changes in external factors may cause initial assumptions to shift, so we modify business problem to incorporate the new conditions 2) consult with the subject matter experts - seek validation of model results from external subject matter experts 3) gather additional data - enlarge training data with new obs. and/or variables, and retain the model to improve robustness 4) apply new types of models 5) refine existing models 6) field test proposed model

Section 1: General Model Building Steps Flashcards

(54 cards)