test Flashcards

Question 1

Q

In cross validation, the training data and the validation data should be preprocessed with the same parameters. TRUE / FALSE

Answer

A

True : Using separate statistics for training and validation sets mightcause a mismatch between the expected values of features from the training set and thevalidation set. We should use training set statistics for both.

Question 2

Q

Bagging increases the variance of decision trees. TRUE / FALSE

Answer

A

False :Bagging creates less correlated predictors by training onsubsets of data and aggregating predictions, which reduces the variance.

Question 3

Q

Solving logistic regression for Maximum Likelihood (ML) and Maximum aPosteriori (MAP) does not lead to the same solution. TRUE / FALSE

Answer

A

True : MAP introduces a regularization term on parameters θ

Question 4

Q

What is the main objective of Principal Components Analysis (PCA)?

Answer

A

Dimensionality reduction: The main objective of PCA is to identify the subspace in which the data approximately lies, so it could be possible to project the data on a subspace which has lower dimension than the actual data

Question 5

Q

Describe the normalization of the input data in PCA

Answer

A

First subtract the average of the samples. Second, divide by thestandard deviation of the data.

Question 6

Q

Explain why the normalization step is needed

Answer

A

By subtracting the mean weavoid taking the mean as one of the dominant eigenvectors (basis vectors); by dividing the data by the standard deviation we remove relative scalings between differentfeatures and make them more comparable

Question 7

Q

Definition of cross-entropy and Loss function in decision tree

Answer

A

Formulas:

Question 8

Q

If we normalize the
data before applying k-means clustering, will we get the same cluster assignments as
without normalization?

Answer

A

No, the issue is with the scaling applied when moving to unit variance.
Centroid-assignments are computed according to euclidean distance and changing the
scale of one of the variables can have an influence on this.

Question 9

Q

List the KKT conditions

Answer

A

Primal Feasibility (inequalityconstraints)
Dual Feasibility (Lagrange multipliers non-neg.)
Compl. Slackness (Lag. mult. * Prim. Feas. = 0)
Gradient of Lagrange ( is = 0)

Question 10

Q

What are the effects of the complementary slackness condition on the optimal SVM classifier

Answer

A

From complementary condition, αi > 0 define the support vectors x(i).
These are the only vectors left in the sum in the optimal classifier. Also, their function
margin is exactly equal to one.

Question 11

Q

What property should the objective function have to allow us to use the
kernel trick?

Answer

A

The objective function should be written in terms of inner products between data points, enabling the use of the kernel trick.

Question 12

Q

Definition/Formula of (Co-)Variance

Question 13

Q

Logistic regression does not have a closed form
solution so it gives a local minimum. TRUE/FALSE

Answer

A

FALSE: Logistic regression has a closed-form solution, and the optimization process is typically designed to find the global minimum of the logistic loss function.

Question 14

Q

Describe the difference between Gaussian Discriminative Analysis (GDA)
and the Gaussian Mixture Model (GMM).

Answer

A

GDA is primarily used for supervised classification with Gaussian assumptions for class-conditional distributions, while GMM is a more versatile model often used for unsupervised learning and clustering, allowing for a more flexible representation of data distributions using a mixture of Gaussians.

Question 15

Q

When normalizing the data, it is important to normalize the train and test sets separately.
TRUE/FALSE

Answer

A

TRUE: This is because the model should not have any information about the test set during training, including the scale of the features

Question 16

Q

For any dataset, there exists an SVM classifier that can achieve zero training error.
TRUE / FALSE

Answer

A

FALSE:the guarantee of achieving zero training error depends on factors such as the nature of the data, linear separability, and the appropriate choice of parameters. Also overfitting.