Tidligere spørgsmål til eksamen Flashcards

Question 1

Q

If you are given m data points, and use half for training and half for testing, the difference between training error and test error (normalized with respect to the number of data points) decreases as m increases True or False why?

Answer

A

True. As we have more and more data, training error increases and testing error decreases. And they all converge to the true error.

Question 2

Q

Assume we have two equal vectors X and Z in our training set (that is, all attributes of X and Z including the labels are exactly the same). Can removing Z from our training data change the decision tree we learn for this dataset? Explain briefly

Answer

A

In general, no, the decision tree will most likely not change. The impurity measure measures the labels of the data, but not the amount, as a leaf is only considered pure if the impurity is 1 or 0.

Question 3

Q

What target values are used in different methods?

Answer

A

{0, 1}
Examples of Methods: Logistic Regression, Neural Networks Benefits: Probabilistic interpretation, sigmoid-friendly.

{-1, 1}
Examples of Methods: SVM, Perceptron
Benefits: Symmetry simplifies mathematical formulations (e.g., margins).

1-of-k (One-Hot Encoding)
Examples of Methods: Neural Networks, Cross-Entropy Loss
Benefits: Suitable for multiclass probability models, enables softmax activation.

{0, 1, …, k-1}
Examples of Methods: Decision Trees
Benefits: Simple encoding for discrete class labels.

Continuous Real Values
Examples of Methods: Regression, Linear Models
Benefits: Models directly predict continuous values.