Machine Learning Flashcards

1
Q

Method for computer-aided, automated learning of models that
represent relationships between the features 𝑋1, … , 𝑋𝑝 and the
associated target variable π‘Œ .

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The three types of learning that Machine Learning combines.

A

Unsupervised learning, supervised learning and reinforcement learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The two types of supervised learning tasks.

A

Classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A qualitative / categorical supervised learning task is a ___ task.

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which type of task should we use to predict the outcome of a COVID test (positive or negative), a spam filter, the credit worthiness of a customer, the result of a match (win, tie, lose)?

A

A Classification task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A quantitative / numerical supervised learning task is a ___ task.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which type of task should we use to predict mileage of electric cars, traffic volume at a street segment, expected waiting time (e.g., until e-scooter is fully charged), climate change (incl. global warming, or emission of greenhouse gases), amount of energy that is produced by wind parks?

A

A Regression task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The two machine learning intentions.

A

Prediction and explanation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It defines what should be modeled within the ML pipeline, it comprises feature data and target variable.

A

A task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An algorithm that trains the model based on training data, it encapsulates the actual supervised learning model.

A

A learner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

It is an instantiation of the learner to the data (with model parameters that are optimal for the given data).

A

A model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(True or false) Hyperparameters are defined within the learner.

A

True. A learner is a (data independent) general construct defining which ML algorithm (and
hyperparameters) will be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Performance measure that quantifies deviation between true and predicted values of the target variable.

A

Loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The aggregation of all losses across (training) data.

A

Empirical risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The best parametrization of the learner when training a model.

A

Risk minimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MLR3 learner in which for each (test) data point x = (π‘₯1, … , π‘₯𝑝), i.e., each observation for which we want to make a prediction, compute the neighborhood π‘π‘˜ (π‘₯1, … , π‘₯𝑝) of its π‘˜ nearest
neighbors.

A

K Nearest Neighbor Classifier

17
Q

(True or false) There is no training step for k-NN models, just storing the training data to process it during the predict step.

18
Q

If a model has learned too many details from the training data, which don’t necessarily generalize to unseen data sets, it is called an ____ model.

A

Overfitted

19
Q

If a model is too general to capture all the relevant patterns from the data, then it is called an ____ model.

A

Underfitted

20
Q

What kind of fit does a model have if it has low error in the training data as well as in the test data.

21
Q

One of the main disadvantages of this learner is that it can be computationally expensive for large datasets as it needs to store a lot of data per observation.

A

KNN Learner

22
Q

Trick that supports classifiers as knn and SVMs to map the problem into a higher-dimensional space, then classifies the transformed problem and then use the outcome of the aforementioned classification for the original problem.

A

The kernel trick

23
Q

The name given to data in which classes are distributed unevenly.

A

Imbalanced data.

24
Q

Two typical countermeasures to handling unbalanced data.

A
  • Over-/undersampling,
  • Tuning class thresholds
25
A performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.
Confusion matrix
26
A plot of test sensitivity as the y coordinate versus its specificity or false positive rate (FPR) as the x coordinate, is an effective method of evaluating the performance of diagnostic tests.
Receiver Operating Characteristics (ROC) Curve
27
In the receiver operating characteristics (ROC) curve, the ideal classifier lies at the ___ corner.
Top left
28
(True or false) In the context of regression larger values of R2 adjusted are better.
True
29
(True or false) Decision boundaries are defined for regression problems.
False. Decision boundaries are defined for classification problems only.
30
(True or false) Both supervised and unsupervised machine learning models can have hyperparameters.
True
31
When your data is noisy, what would you do if you wanted to apply k-NN?
Increase k