Unit 1 Flashcards

1
Q

How do Explicit Models work?

A

Use explicit knowledge to design model deductively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the pros of Explicit Models?

A

Pros:
□ Knowledge about behavior of model and environment/problem
□ Knowledge about restrictions of model and reasons for design choices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the cons of Explicit Models?

A

Cons:
□ Sometimes problem is too complex to model
□ Consequences of simplifications of problem/model hard to assess
□ Insufficient knowledge about problem/environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do Inductive Models work?

A

Machine Learning: Use previously observed data to create model inductively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the pros of Inductive Models?

A

Pros:
□ Problem can be solved without (exhaustive) knowledge about problem
□ Predictions/Insights are created directly from data
□ Can handle complex problems and profits from big data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the cons of Inductive Models?

A

Cons:
□ Data is required (sometimes a lot of data!)
□ Complex models (deep learning) can end up being a black box
□ Naive application might lead to biases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

☆☆

How does Supervised Machine Learning work?

A

■ Learning a function that maps an input to an output (target value).
■ Learning is based on example input values with corresponding target values (also called supervisory signals)
□ E.g. image + object type, DNA sequence + phenotype, …
■ Typical usage: predictive modeling
□ Train model on dataset with input+target values
□ Use trained model to predict target values for other (new) inputs

■ Classification: target value is class label (discrete attribute, e.g. integer, letter, word)
■ Regression: target value is numerical value (real number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

☆☆

What is a Model?

A

parameterized function/method with specific parameter values (e.g. a trained neural network)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

☆☆

What is a Model Class?

A

the class of models in which we search for the model (e.g. neural networks, SVMs, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

☆☆

What are Parameters?

A

representations of concrete models inside the given model class (e.g. network weights)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

☆☆

What are Hyperparameters?

A

parameters controlling model complexity or the training procedure (e.g. network learning rate, the number of hidden layers, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

☆☆

What is Model selection/training?

A

process of finding a model from the model class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the Feature Selection process work?

A

■ What data do we have?
■ Removal of redundant features
■ Removal of features the model class cannot utilize
■ (Deep Learning: Feature selection mainly done by neural network)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is done during Preprocessing?

A

■ Contrast and brightness correction
■ Segmentation
■ Alignment
■ Normalization
■ …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does Input Representation work?

A

■ We can represent each object by a vector of feature values (i.e. feature vectors) of length d x =(x(1),…,x(d))T
■ An object described by a feature vector is also referred to as sample
■ Individual x(j) may be
□ group descriptions: categorical variables/features (e.g. x(3) = name of the boat with which the fish was caught)
□ numbers: numerical variables/features (e.g. fish length in cm)
■ Assume our dataset consists of l objects with feature vectors x1,…,xl
■ Each feature vector is of length d
■ Then we can write the feature vectors of all objects in a matrix of feature vectors
■ Assume we are given a target value yi ∈ R for each sample xi
■ Then all target values constitute the target/label vector:
■ Often we write our dataset, including input features and targets, as data matrix Z

■ Note: Target of each sample can be a vector, then we get a target value matrix Y (multi-label classification). Don’t confuse this with multi-class classification (more than 2 possible label values).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

☆☆

How does the Loss function work?

A

■ Assume we have a model g, parameterized by w
■ g(x;w) maps an input vector x to an output value y
■ We want (prediction) y to be as close as possible to the true target value y ■ We can use a loss (cost) function L(y,g(x;w)) to measure how close our prediction is to the true target for a given sample with z = (x^T,y)^T
■ The smaller the loss (cost), the better our prediction

■ Many loss functions available with different justifications
■ Not every loss function is suitable for every task
■ Choice of loss function depends on data, task, and model class

17
Q

☆☆

What is the generalization error/risk?

A

The generalization error or risk is the expected loss on future data for a given model g(.;w):

■ In practice, we hardly have any knowledge about p(x,y)
■ →We have to estimate the generalization error

18
Q

☆☆

What is Empirical Risk Minimization (ERM) ?

A

Empirical Risk Minimization (ERM) is a fundamental principle in machine learning used to minimize the error (or “risk”) of a model by optimizing its performance based on a given training dataset.

■ We do not know the true p(x,y) but we have access to a subset of l data samples (i.e. our dataset)
■ We estimate the (true) risk by the empirical risk Remp on our dataset
■ Assume that the data points are i.i.d. (independent and identically distributed)
■ Strong law of large numbers: Remp(g(.;w)) → R(g(.;w)) for l → ∞
■ Goal: Empirical Risk Minimization (ERM)

19
Q

☆☆

What is the problem of overfitting?

A

If the model is too complex, it might memorize the training data rather than generalize to unseen data.

■ With ERM we can optimize our model by minimizing the risk on our (training) dataset
■ Problem: We might fit our parameters to noise specific to our training dataset(i.e.overfitting)
■→We need to get a better estimate for the (true) risk

20
Q

☆☆

What are the 3 subsets used in Machine Learning?

A

Training set: subset used to train a model, i.e. to optimize/fit model parameters
Validation set: subset used to find the best hyperparameters
Test set: subset used to estimate risk

21
Q

☆☆

What is the purpose of a training set?

A

Training set: A subset with m samples we perform ERM on (i.e. optimize parameters on) to train a model

22
Q

☆☆

What is the purpose of a test set?

A

Test set: A subset with l −m samples we use to estimate the risk. Neither used for model selection nor hyperparameter search nor training.

■ Our estimate Remp on the test set will show if we overfit to noise in training set

23
Q

How can we avoid overlaps between training and test sets?

A

■ Solution: Cross Validation (CV)
□ Split dataset into n disjoint folds
□ Use n−1 folds as training set, left-out fold as test set
□ Train n times, every time leaving out a different fold as test set
□ Average over n estimated risks on test sets to get better estimate of generalization capability

■ Nested Cross Validation
□ We can apply another (inner) CV procedure within each training-set of the original (outer) CV → allows for evaluation of model selection procedure
■ Getting a risk estimate on selected model:
1. Apply cross validation on training set (withhold test set)
2. 2. Use test set to estimate risk for the model selected via CV
3. ■ In practice, the found model is often trained further or re-trained on complete dataset for best performance

24
Q

What are some common pitfalls in Machine Learning?

A

Underfitting: model is too simple/coarse to fit training or test data (too low model complexity)
Overfitting: model fits (too) well to training data but not well to future/test data (too high model complexity)
Unbalanced datasets: datasets biased toward a single class need to be evaluated properly (balanced accuracy, ROCAUC, loss weighting, …)