7 - Tools for Machine Learning Flashcards

Question

What is quantile regression?

Answer 1

Quantile regression focuses on identifying specific quantiles, such as the median, which are robust to outliers.

Answer 2

Mean wealth can be significantly affected by outliers, while median wealth remains unchanged, providing a more stable measure.

Answer 3

K-NN regression predicts outcomes based on the K most similar observations, using local information instead of the entire data set.

Answer 4

K represents the number of observations used to make a prediction, such as K = 20 meaning the model looks at the 20 most similar data points.

Answer 5

The most common approach is to calculate the Euclidean distance between data points.

Answer 6

K-NN can handle nonlinear relationships and does not assume linearity, making it flexible for various types of data.

Answer 7

Naive Bayes is a probabilistic machine learning model used for classification problems, based on Bayes' theorem.

Answer 8

Naive Bayes assumes that each feature is independent of all other features in the data set.

Answer 9

Naive Bayes predicts the probability of a binary outcome, such as whether health expenditure will exceed $20,000.

Answer 10

The training process involves estimating prior probabilities of each class and conditional probabilities of each feature given the class.

Answer 11

CART modeling divides the population into smaller subpopulations and makes predictions on those smaller groups.

Answer 12

CART starts with the entire population and repeatedly searches for the best variable and value to split the population until stopping rules are met.

Answer 13

* No assumptions required about features * Can predict various types of variables * Less sensitive to outliers * Can handle missing data * Easily interpretable

Answer 14

CART models can be sensitive to small changes in data, do not produce coefficients like regression models, and require careful evaluation of variable importance.

Answer 15

A random forest is a collection of multiple CART models built from random samples of the data.

Answer 16

Randomness is applied in creating random samples of data and in selecting input variables at specific nodes.

Answer 17

These techniques combine multiple machine learning models to improve predictive performance.

Answer 18

Techniques for combining the predictions of several models into a single, more accurate prediction. ## Footnote Ensembling methods can improve the robustness of predictions.

Answer 19

A method where different models are created and their predictions are used as input variables to a new model for final predictions. ## Footnote The first models are known as level 1 models, and the final model is called the level 2 model.

Answer 20

The initial models created in model stacking that serve as inputs to the level 2 model.

Answer 21

To combine predictions from level 1 models to produce a final result.

Answer 22

A special case of ensembling called bootstrap aggregation that involves resampling data sets from the original data set. ## Footnote Bagging can improve prediction accuracy and help avoid overfitting.

Answer 23

Making several data sets from the original data set by resampling observations.

Answer 24

A technique where models learn by giving misclassified observations more weight in the next training iteration.

Answer 25

To improve predictions by focusing on difficult-to-predict observations while maintaining performance on easier ones.

Answer 26

* Adaboost * Arcboost

Answer 27

* Linear regression model * K-NN model * Regression tree model

Answer 28

ensembling

Answer 29

By leveraging the strengths of different models to produce a more accurate and robust prediction.

Answer 30

A trivia game show partnership where two individuals complement each other's knowledge.

Answer 31

Patterns in the residuals

Answer 32

Improved model performance.

Answer 33

Determining whether it is superior to parametric regression approaches in specific scenarios.

Answer 34

* Polynomial terms * Interaction terms