ML misc Flashcards

1
Q

What is a Decision Tree model? What is its primary hyperparameter and what does it DO?

A

max_depth is the primary hyperparameter

Decision Tree splits the observations into 2^max_depth groups (e.g., 4 for max_depth of 2, or 8 for max_depth of 3). Which / how many observations are put into each of the N groups is what it decides after minimizing the cost function. In other words, it optimizes the LENGTH of each “horizontal line” (i.e. really an N-dimensional hyperplane)

Each of the N groups is then deterministically assigned a predicted/flat Y of whatever the average Y is for these observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are two ways to model what you suspect is a non-linear relationship between your predictors and outcome variable?

A
  1. Linear regression with quadratic polynomials or higher.
  2. Decision Tree / Random Forest model, which, ironically, uses many localized linear approximations to model nonlinear behavior.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a primary difference between ML vs inferential statistics?

A

ML is much more focused on PREDICTIONS for new data, rather than statistics’ greater focus on EXPLAINING the existing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between KMeans and K Nearest Neighbors?

A

They have almost nothing in common:

KMeans is unsupervised (clustering); the K stands for the # of clusters (which you choose).

K Nearest Neighbors is supervised (regression or classification): it estimates (classifies) the y for data points based on their X’s proximity in space to other data points’ Xs, for which the y’s are already known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly