Machine Learning Flashcards

1
Q

Method for computer-aided, automated learning of models that
represent relationships between the features 𝑋1, … , 𝑋𝑝 and the
associated target variable π‘Œ .

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The three types of learning that Machine Learning combines.

A

Unsupervised learning, supervised learning and reinforcement learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The two types of supervised learning tasks.

A

Classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A qualitative / categorical supervised learning task is a ___ task.

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which type of task should we use to predict the outcome of a COVID test (positive or negative), a spam filter, the credit worthiness of a customer, the result of a match (win, tie, lose)?

A

A Classification task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A quantitative / numerical supervised learning task is a ___ task.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which type of task should we use to predict mileage of electric cars, traffic volume at a street segment, expected waiting time (e.g., until e-scooter is fully charged), climate change (incl. global warming, or emission of greenhouse gases), amount of energy that is produced by wind parks?

A

A Regression task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The two machine learning intentions.

A

Prediction and explanation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It defines what should be modeled within the ML pipeline, it comprises feature data and target variable.

A

A task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An algorithm that trains the model based on training data, it encapsulates the actual supervised learning model.

A

A learner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

It is an instantiation of the learner to the data (with model parameters that are optimal for the given data).

A

A model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(True or false) Hyperparameters are defined within the learner.

A

True. A learner is a (data independent) general construct defining which ML algorithm (and
hyperparameters) will be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Performance measure that quantifies deviation between true and predicted values of the target variable.

A

Loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The aggregation of all losses across (training) data.

A

Empirical risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The best parametrization of the learner when training a model.

A

Risk minimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MLR3 learner in which for each (test) data point x = (π‘₯1, … , π‘₯𝑝), i.e., each observation for which we want to make a prediction, compute the neighborhood π‘π‘˜ (π‘₯1, … , π‘₯𝑝) of its π‘˜ nearest
neighbors.

A

K Nearest Neighbor Classifier

17
Q

(True or false) There is no training step for k-NN models, just storing the training data to process it during the predict step.

A

True

18
Q

If a model has learned too many details from the training data, which don’t necessarily generalize to unseen data sets, it is called an ____ model.

A

Overfitted

19
Q

If a model is too general to capture all the relevant patterns from the data, then it is called an ____ model.

A

Underfitted

20
Q

What kind of fit does a model have if it has low error in the training data as well as in the test data.

A

Underfit

21
Q

One of the main disadvantages of this learner is that it can be computationally expensive for large datasets as it needs to store a lot of data per observation.

A

KNN Learner

22
Q

Trick that supports classifiers as knn and SVMs to map the problem into a higher-dimensional space, then classifies the transformed problem and then use the outcome of the aforementioned classification for the original problem.

A

The kernel trick

23
Q

The name given to data in which classes are distributed unevenly.

A

Imbalanced data.

24
Q

Two typical countermeasures to handling unbalanced data.

A
  • Over-/undersampling,
  • Tuning class thresholds
25
Q

A performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

A

Confusion matrix

26
Q

A plot of test sensitivity as the y coordinate versus its specificity or false positive rate (FPR) as the x coordinate, is an effective method of evaluating the performance of diagnostic tests.

A

Receiver Operating Characteristics (ROC) Curve

27
Q

In the receiver operating characteristics (ROC) curve, the ideal classifier lies at the ___ corner.

A

Top left

28
Q

(True or false) In the context of regression larger values of R2 adjusted are better.

A

True

29
Q

(True or false) Decision boundaries are defined for regression problems.

A

False. Decision boundaries are defined for classification problems only.

30
Q

(True or false) Both supervised and unsupervised machine learning models can have hyperparameters.

A

True

31
Q

When your data is noisy, what would you do if you wanted to apply k-NN?

A

Increase k