MLR3 Flashcards

1
Q

It is the interface to (almost) all implementations of machine learning algorithms in R, it is unified and hence user-friendly access to > 100 supervised learning techniques.

A

MLR3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

They wrap a DataBackend, and store meta-information, such as the role of the individual columns in the DataBackend.

A

MLR3 Tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

An object to transparently interface different data storage types.

A

Data Backend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The conceptual description of the machine learning algorithm. An “abstract hull” of the model.

A

Learner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which R function should we use to get a task from the mlr3 task dictionary?

A

tsk ()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the notation to create a new classification task in R?

A

task <- TaskClassif$new()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the notation to add a created task to the task dictionary in R?

A

mlr_tasks$add ()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Concept of mlr3 that consists in training the supervised learning models based on the task.

A

“train”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Concept of mlr3 that consists on using the models to make predictions (on new data)

A

“predict”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Two concepts of mlr3 that assess a model’s quality.

A

Performance & resampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Comparison of ≥ 1 learner(s) across ≥ 1 task(s)

A

Benchmarking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Optimizing the learner’s hyperparameters

A

Tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Looking for a ‘better’ subset of features

A

Feature selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(True or false) When creating a task, the target variable has to be specified.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Five most popular machine learning methods for mlr3

A

*linear and logistic regression
*k-nearest neighbor methods
* support vector machines
* gradient boosting
* random forests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

This function randomly splits the task into two disjoint sets: a training set (67% of the total data, the default) and a test set (33% of the total data, the data not part of the training set).

A

partition ()

17
Q

This function restricts tasks to a given set of features.

A

$select()

18
Q

This function applies restriction to rows (observations)

A

$filter() with the row IDs

19
Q

Which are the two roles a column of a dataset can have in mlr3?

A

Targets and features

20
Q

Which information (6) is stored in a learner?

A
  • Model
  • Parameters
  • Packages
  • Predict Type
  • Feature type
  • Properties
21
Q

With which function can you access a learner’s hyperparameters?

A

$param_set

22
Q

(True or false) A model allows a user to set their own hyperparameters.

A

False. A model automatically learns its own (internal) parameters such as the splits of a decision tree or a neural network’s weights.

23
Q

(True or false) The learner’s $model element
is always empty before training.

A

True

24
Q

(True or false) At the predict stage, all parameters and hyperparameters of the model are fixed.

A

True

25
Q

What are the two types of prediction?

A

Directly on the task or on a new dataset

26
Q

Loss as measured when using the same data set for training and assessing a model

A

Apparent error (or resubstitution error)

27
Q

This type of resampling splits data into a training set and a test set.

A

Holdout approach

28
Q

Helps to assess model quality when data set is too small for simple holdout approach.

A

Resampling

29
Q

Split data randomly into 𝑘 blocks of (roughly) equal size, then create 𝑘 training-test-splits (called folds) and each of the 𝑘 blocks acts as test data exactly once

A

k Fold cross validation

30
Q

Variant of cross validation in which every observation is a block itself.

A

Leave-one-Out (LOO) Crossvalidation

31
Q

Variant of cross validation that repeats 𝑘-fold CV multiple times (with different data-to-fold assignments for the different replications).

A

Repeated Cross Validation

32
Q

Create 𝐵 samples of training-test-splits and each of the 𝐵 splits is created by drawing randomly with replacement 𝑛 observations from the original data set (of size 𝑛)

A

Bootstrap Sampling

33
Q

Combines concepts of bootstrap and holdout: creates 𝐵 samples of training-test-splits and each of the 𝐵 splits is created by drawing randomly without replacement a predefined fraction from the original data set.

A

Subsampling

34
Q

Function used when one wants to compare multiple learners across one (or more) task(s).

A

Benchmarking

35
Q

True or false? Leave-one-out crossvalidation is equal to n-fold crossvalidation if n is the size of the data set.

A

True

36
Q

True or false? The models which are trained during resampling should be used to make predictions.

A

False. Resampling is a method to estimate the generalization error of a model. To make predictions, another model should be trained using the whole data set.

37
Q

True or false? The resampling method is a tunable model parameter.

A

False. The resampling method is not a parameter of the machine learning model.