Machine Learning Flashcards by Marisa Triana

Method for computer-aided, automated learning of models that
represent relationships between the features 𝑋1, … , 𝑋𝑝 and the
associated target variable 𝑌 .

Machine Learning

How well did you know this?

Not at all

Perfectly

The three types of learning that Machine Learning combines.

Unsupervised learning, supervised learning and reinforcement learning.

How well did you know this?

Not at all

Perfectly

The two types of supervised learning tasks.

Classification and regression

How well did you know this?

Not at all

Perfectly

A qualitative / categorical supervised learning task is a ___ task.

Classification

How well did you know this?

Not at all

Perfectly

Which type of task should we use to predict the outcome of a COVID test (positive or negative), a spam filter, the credit worthiness of a customer, the result of a match (win, tie, lose)?

A Classification task

How well did you know this?

Not at all

Perfectly

A quantitative / numerical supervised learning task is a ___ task.

Regression

How well did you know this?

Not at all

Perfectly

Which type of task should we use to predict mileage of electric cars, traffic volume at a street segment, expected waiting time (e.g., until e-scooter is fully charged), climate change (incl. global warming, or emission of greenhouse gases), amount of energy that is produced by wind parks?

A Regression task

How well did you know this?

Not at all

Perfectly

The two machine learning intentions.

Prediction and explanation

How well did you know this?

Not at all

Perfectly

It defines what should be modeled within the ML pipeline, it comprises feature data and target variable.

A task

How well did you know this?

Not at all

Perfectly

An algorithm that trains the model based on training data, it encapsulates the actual supervised learning model.

A learner

How well did you know this?

Not at all

Perfectly

It is an instantiation of the learner to the data (with model parameters that are optimal for the given data).

A model

How well did you know this?

Not at all

Perfectly

(True or false) Hyperparameters are defined within the learner.

True. A learner is a (data independent) general construct defining which ML algorithm (and
hyperparameters) will be used.

How well did you know this?

Not at all

Perfectly

Performance measure that quantifies deviation between true and predicted values of the target variable.

Loss function

How well did you know this?

Not at all

Perfectly

The aggregation of all losses across (training) data.

Empirical risk

How well did you know this?

Not at all

Perfectly

The best parametrization of the learner when training a model.

Risk minimization

How well did you know this?

Not at all

Perfectly

MLR3 learner in which for each (test) data point x = (𝑥1, … , 𝑥𝑝), i.e., each observation for which we want to make a prediction, compute the neighborhood 𝑁𝑘 (𝑥1, … , 𝑥𝑝) of its 𝑘 nearest
neighbors.

Study These Flashcards

K Nearest Neighbor Classifier

(True or false) There is no training step for k-NN models, just storing the training data to process it during the predict step.

Study These Flashcards

True

If a model has learned too many details from the training data, which don’t necessarily generalize to unseen data sets, it is called an ____ model.

Study These Flashcards

Overfitted

If a model is too general to capture all the relevant patterns from the data, then it is called an ____ model.

Study These Flashcards

Underfitted

What kind of fit does a model have if it has low error in the training data as well as in the test data.

Study These Flashcards

Underfit

One of the main disadvantages of this learner is that it can be computationally expensive for large datasets as it needs to store a lot of data per observation.

Study These Flashcards

KNN Learner

Trick that supports classifiers as knn and SVMs to map the problem into a higher-dimensional space, then classifies the transformed problem and then use the outcome of the aforementioned classification for the original problem.

Study These Flashcards

The kernel trick

The name given to data in which classes are distributed unevenly.

Study These Flashcards

Imbalanced data.

Two typical countermeasures to handling unbalanced data.

Study These Flashcards

Over-/undersampling,
Tuning class thresholds

A performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

Confusion matrix

A plot of test sensitivity as the y coordinate versus its specificity or false positive rate (FPR) as the x coordinate, is an effective method of evaluating the performance of diagnostic tests.

Receiver Operating Characteristics (ROC) Curve

In the receiver operating characteristics (ROC) curve, the ideal classifier lies at the ___ corner.

Top left

(True or false) In the context of regression larger values of R2 adjusted are better.

True

(True or false) Decision boundaries are defined for regression problems.

False. Decision boundaries are defined for classification problems only.

(True or false) Both supervised and unsupervised machine learning models can have hyperparameters.

True

When your data is noisy, what would you do if you wanted to apply k-NN?

Increase k

Machine Learning Flashcards

(31 cards)