ML misc Flashcards

Question 1

Q

What is a Decision Tree model? What is its primary hyperparameter and what does it DO?

Answer

A

max_depth is the primary hyperparameter

Decision Tree splits the observations into 2^max_depth groups (e.g., 4 for max_depth of 2, or 8 for max_depth of 3). Which / how many observations are put into each of the N groups is what it decides after minimizing the cost function. In other words, it optimizes the LENGTH of each “horizontal line” (i.e. really an N-dimensional hyperplane)

Each of the N groups is then deterministically assigned a predicted/flat Y of whatever the average Y is for these observations.

Question 2

Q

What are two ways to model what you suspect is a non-linear relationship between your predictors and outcome variable?

Answer

A

Linear regression with quadratic polynomials or higher.
Decision Tree / Random Forest model, which, ironically, uses many localized linear approximations to model nonlinear behavior.

Question 3

Q

What is a primary difference between ML vs inferential statistics?

Answer

A

ML is much more focused on PREDICTIONS for new data, rather than statistics’ greater focus on EXPLAINING the existing data.

Question 4

Q

What’s the difference between KMeans and K Nearest Neighbors?

Answer

A

They have almost nothing in common:

KMeans is unsupervised (clustering); the K stands for the # of clusters (which you choose).

K Nearest Neighbors is supervised (regression or classification): it estimates (classifies) the y for data points based on their X’s proximity in space to other data points’ Xs, for which the y’s are already known.

ML misc Flashcards

(4 cards)