Tidligere spørgsmål til eksamen Flashcards

1
Q

If you are given m data points, and use half for training and half for testing, the difference between training error and test error (normalized with respect to the number of data points) decreases as m increases True or False why?

A

True. As we have more and more data, training error increases and testing error decreases. And they all converge to the true error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Assume we have two equal vectors X and Z in our training set (that is, all attributes of X and Z including the labels are exactly the same). Can removing Z from our training data change the decision tree we learn for this dataset? Explain briefly

A

In general, no, the decision tree will most likely not change. The impurity measure measures the labels of the data, but not the amount, as a leaf is only considered pure if the impurity is 1 or 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What target values are used in different methods?

A

{0, 1}
Examples of Methods: Logistic Regression, Neural Networks Benefits: Probabilistic interpretation, sigmoid-friendly.

{-1, 1}
Examples of Methods: SVM, Perceptron
Benefits: Symmetry simplifies mathematical formulations (e.g., margins).

1-of-k (One-Hot Encoding)
Examples of Methods: Neural Networks, Cross-Entropy Loss
Benefits: Suitable for multiclass probability models, enables softmax activation.

{0, 1, …, k-1}
Examples of Methods: Decision Trees
Benefits: Simple encoding for discrete class labels.

Continuous Real Values
Examples of Methods: Regression, Linear Models
Benefits: Models directly predict continuous values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly