05_supervised learning methods Flashcards

Question

What are cons for linear models? (2)

Answer 1

- limited flexibility: data distribution must be brought into a form that is linear (regression) or linearly separable (classification) - susceptible to overfitting if not combined with regularizer

Answer 2

Scaling of the data! bc we need the distance in the model

Answer 3

are non-parametric and simply rely on distances between data points distances can be defined as metrics nearest neighbor methods utilize distances between datapoints for classification and regression tasks

Answer 4

Euclidean distance ∂ = √∂x1^2 + ∂x2^2

Answer 5

k-nearest neighbor (knn) classifiers predict class affiliation of an unseen data point based on majority voting of its k nearest neighbors in a seen data set with ground-truth labels are not trained in the general sense: distance of each unseen data point from all seen data points is calculated

Answer 6

k has an impact on how well the model generalizes to unseen data: we have to perform a hyperparameter search

Answer 7

as regressors that are able to interpolate between and smoothen available data (only have to be aware that this exists)

Answer 8

is done through varying its hyperparameter k a low k is susceptible to small-scale variations and noise a high k may miss local details

Answer 9

- easy to understand, implement and results are highly interpretable - non-parametric - works reliably even with small data sets

Answer 10

- calculating distances computationally intensive for large data sets - performs poor on sparse data sets; prone to the curse of dimensionality

Answer 11

number of data points should grow exponentially with data dimensionality if parameter space is insufficiently sampled, the model does not have enough data points for training properly --> if we have too many features

Answer 12

a decision tree is a rule-based structure for prediction of scalar output from (potentially) multi-dimensional input data

Answer 13

hyperparameters: - tree depth (how many layers do we have) - number of leaves (outputs, how many classes do we have)

Answer 14

a greedy divide-and-conquer strategy is adopted to train decision trees on a training data set in a recursive fashion 1) identify the "most important feature" (greedy) 2) split the samples across this feature (divide) 3) if all samples of a branch are of the same class, create a leaf, unless number of leafs has been reached, and stop 4) if not all samples of a branch are of the same class, recursively apply algorithm again to that branch until maximum depth reached

Answer 15

generally, it means the feature that makes the most difference to the classification of a single sample there are different implementations of this definition eg utilizing information entropy or other useful measures

Answer 16

No Single decision trees typically generalize only to some extent due to their limited depth and size; they have low model capacity

Answer 17

ensemble methods - group the trees, which increases their capacity by combining a large number of decision trees and letting them make decisions in an averaged vote or majority vote, we increase their capacity

Answer 18

Trees in a random forest are shallower than other decision tree models. the trees therefore act as "weak learners" (intentionally) that perform badly by themselves. however, combining a large number of weak learners performs much better than individual trees. the intuition behind is that weak learners "on average" compensate for their individual shortcomings.

Answer 19

random forests (decision tree ensembles) that are built successively in such a way that every newly created tree compensates for the shortcomings of the previous trees gradient-boosting refers to the fact that new base learners (individual devision trees) are fitted to the model's pseudo-residuals, based on the gradient of the loss of the ensemble --> loss decreases with each added tree

Answer 20

they are very successful in regression and classification tasks and still represent state-of-the-art in traditional ML common implementations: - XGBoost - LightGBM

Answer 21

- extremely versatile and robust - can be trained on small amounts of data - non-parametric - interpretability: tree-based models are to compute "feature importances"

Answer 22

- decision boundaries and regression predictions may be discrete instead of continuous

05_supervised learning methods Flashcards

(46 cards)