Modeling definitions Flashcards

1
Q

Modeling Considerations - Before

A

Descriptive analytics focuses on studying the past to identify relationships and patterns among the variables

Predictive analytics focuses on anticipating the future by using models to make accurate predictions

Prescriptive analytics focuses on the outcome of decisions

A project can involve all three types, but it is important to recognize which ones are essential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Modeling Considerations - During

A

When things do not go according to plan:
-adjust the business problem
-consult an expert
-collect more data
-attempt different models
-refine current models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Modeling Considerations - After

A

Modeling work concludes by either implementing or abandoning the model

Implementation is seldom straightforward. In a situation where others need to be convinced in the model’s ability, consider conducting a field test so that the model is applied to a real setting but without acting on the results

In understanding that something critical is missing for success, abandoning would avoid wasting resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Supervised vs Unsupervised Learning

A

Studies the data with a target -> Everything centers around analyzing the target through the predictors, hence it is the focus of predictive analytics

Analyzes the data without a target -> The idea is to identify patterns that may exist in the data, but there are no clear objectives or ways to verify the quality of the findings.
–These techniques can be used to create features
–Features used to model predictor impact on a target should NOT be based on the target which would result in target leakage. This is where the model would need to know the target in order to predict the target, which is inappropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regression vs Classification

A

Many supervised learning problems can be divided into two types
-Regression involves continuous or count targets
-Classification involves a binary target

Sometimes, we might prefer to reframe a classification problem in terms of a regression problem, such as treating a binary target as numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Decomposing the Target

A

Studying the target by its two parts
1. systematic component
-describes the value that the target gravitates towards (f)
-when viewing the target as a random variable, f is its mean
-want f to be a function of the predictors
-this theory proposes that the mean target depends on the predictors, thus f captures the systematic relationship between the target and the predictors
-In practice, f is unknown, and the first step to making model predictions is to estimate f (f^)

  1. random component
    -captures things about the target that cannot be explained with any predictor, sometimes denoted as e (epsilon)

Y = f(x_1, …x_p) + e OR ‘signal’ + noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parametric vs Non-Parametric

A

Parametric - specifies a functional form for f that includes free parameters
->data is used to estimate these parameters; the downside to this approach is that choosing a functional form can be arbitrary, so the chosen form may be significantly different than the true f
->(MLR, Stepwise selection and Regularization, GLM)

Non-parametric - makes no assumption about f’s function form; there are no parameters to estimate
->having no functional form to depend on means that f^ relies solely on the data; these methods require an abundance of observations to be effective
->(regular decision trees, ensembles of decision trees)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Flexibility

A

Flexibility describes how closely is f^ able to follow the data. A more flexible f^ follows the data closer than a less flexible f^.

-For parametric methods, higher flexibility often comes form having more free parameters in the functional form.

-For non-parametric methods, they typically have a flexibility measure that is unique to each method and are generally considered more flexible than parametric methods because they are not confined to a functional form for f.

Flexibility and accuracy do not always go hand-in-hand. It may be possible to create an f^ that makes perfect predictions on past data, but that is not the objective. An f^ with good predictive ability on future data is usually one that is not overly flexible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpretability

A

Describes how easy it is to understand f^. When a model has complicated components in f, the relationship between the target and predictors becomes more difficult to understand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Flexibility vs Interpretability

A

Flexibility is often inversely related to interpretability. A highly flexible f^ can follow the data closely, but the complex mathematical parts that enable it are often challenging to interpret.

Less flexible, more interpretable -> Stepwise selection and Regularization

Moderately flexible and interpretable -> MLR, GLM, Regular decision trees

More flexible, less interpretable -> ensembles of decision trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Comparing models

A

glm_lasso is a simpler model. It only considers whether the animal is a cat or dog, its age, and whether it arrived via Public Assist. glm_drop includes three additional predictors, making it more cumbersome to explain to a non-technical audience at the animal shelter.

The higher AUC suggests that glm_drop is slightly better at classifying adoptions in this train/test data partition. However, the glm_lasso model has fewer predictors, protecting against overfitting and adding confidence that the model performance will be stable with unseen data.

The interpretability and robustness of glm_lasso outweigh the slight decrease in predictive performance.

o If a more flexible model has worse test metric than a less flexible model, the more flexible model is likely overfit to the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly