MLR3 Flashcards
It is the interface to (almost) all implementations of machine learning algorithms in R, it is unified and hence user-friendly access to > 100 supervised learning techniques.
MLR3
They wrap a DataBackend, and store meta-information, such as the role of the individual columns in the DataBackend.
MLR3 Tasks
An object to transparently interface different data storage types.
Data Backend
The conceptual description of the machine learning algorithm. An “abstract hull” of the model.
Learner
Which R function should we use to get a task from the mlr3 task dictionary?
tsk ()
What is the notation to create a new classification task in R?
task <- TaskClassif$new()
What is the notation to add a created task to the task dictionary in R?
mlr_tasks$add ()
Concept of mlr3 that consists in training the supervised learning models based on the task.
“train”
Concept of mlr3 that consists on using the models to make predictions (on new data)
“predict”
Two concepts of mlr3 that assess a model’s quality.
Performance & resampling
Comparison of ≥ 1 learner(s) across ≥ 1 task(s)
Benchmarking
Optimizing the learner’s hyperparameters
Tuning
Looking for a ‘better’ subset of features
Feature selection
(True or false) When creating a task, the target variable has to be specified.
True
Five most popular machine learning methods for mlr3
*linear and logistic regression
*k-nearest neighbor methods
* support vector machines
* gradient boosting
* random forests
This function randomly splits the task into two disjoint sets: a training set (67% of the total data, the default) and a test set (33% of the total data, the data not part of the training set).
partition ()
This function restricts tasks to a given set of features.
$select()
This function applies restriction to rows (observations)
$filter() with the row IDs
Which are the two roles a column of a dataset can have in mlr3?
Targets and features
Which information (6) is stored in a learner?
- Model
- Parameters
- Packages
- Predict Type
- Feature type
- Properties
With which function can you access a learner’s hyperparameters?
$param_set
(True or false) A model allows a user to set their own hyperparameters.
False. A model automatically learns its own (internal) parameters such as the splits of a decision tree or a neural network’s weights.
(True or false) The learner’s $model element
is always empty before training.
True
(True or false) At the predict stage, all parameters and hyperparameters of the model are fixed.
True
What are the two types of prediction?
Directly on the task or on a new dataset
Loss as measured when using the same data set for training and assessing a model
Apparent error (or resubstitution error)
This type of resampling splits data into a training set and a test set.
Holdout approach
Helps to assess model quality when data set is too small for simple holdout approach.
Resampling
Split data randomly into 𝑘 blocks of (roughly) equal size, then create 𝑘 training-test-splits (called folds) and each of the 𝑘 blocks acts as test data exactly once
k Fold cross validation
Variant of cross validation in which every observation is a block itself.
Leave-one-Out (LOO) Crossvalidation
Variant of cross validation that repeats 𝑘-fold CV multiple times (with different data-to-fold assignments for the different replications).
Repeated Cross Validation
Create 𝐵 samples of training-test-splits and each of the 𝐵 splits is created by drawing randomly with replacement 𝑛 observations from the original data set (of size 𝑛)
Bootstrap Sampling
Combines concepts of bootstrap and holdout: creates 𝐵 samples of training-test-splits and each of the 𝐵 splits is created by drawing randomly without replacement a predefined fraction from the original data set.
Subsampling
Function used when one wants to compare multiple learners across one (or more) task(s).
Benchmarking
True or false? Leave-one-out crossvalidation is equal to n-fold crossvalidation if n is the size of the data set.
True
True or false? The models which are trained during resampling should be used to make predictions.
False. Resampling is a method to estimate the generalization error of a model. To make predictions, another model should be trained using the whole data set.
True or false? The resampling method is a tunable model parameter.
False. The resampling method is not a parameter of the machine learning model.