test Flashcards

1
Q

In cross validation, the training data and the validation data should be preprocessed with the same parameters. TRUE / FALSE

A

True : Using separate statistics for training and validation sets mightcause a mismatch between the expected values of features from the training set and thevalidation set. We should use training set statistics for both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bagging increases the variance of decision trees. TRUE / FALSE

A

False :Bagging creates less correlated predictors by training onsubsets of data and aggregating predictions, which reduces the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Solving logistic regression for Maximum Likelihood (ML) and Maximum aPosteriori (MAP) does not lead to the same solution. TRUE / FALSE

A

True : MAP introduces a regularization term on parameters θ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main objective of Principal Components Analysis (PCA)?

A

Dimensionality reduction: The main objective of PCA is to identify the subspace in which the data approximately lies, so it could be possible to project the data on a subspace which has lower dimension than the actual data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the normalization of the input data in PCA

A

First subtract the average of the samples. Second, divide by thestandard deviation of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain why the normalization step is needed

A

By subtracting the mean weavoid taking the mean as one of the dominant eigenvectors (basis vectors); by dividing the data by the standard deviation we remove relative scalings between differentfeatures and make them more comparable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Definition of cross-entropy and Loss function in decision tree

A

Formulas:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If we normalize the
data before applying k-means clustering, will we get the same cluster assignments as
without normalization?

A

No, the issue is with the scaling applied when moving to unit variance.
Centroid-assignments are computed according to euclidean distance and changing the
scale of one of the variables can have an influence on this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List the KKT conditions

A
  1. Primal Feasibility (inequalityconstraints)
  2. Dual Feasibility (Lagrange multipliers non-neg.)
  3. Compl. Slackness (Lag. mult. * Prim. Feas. = 0)
  4. Gradient of Lagrange ( is = 0)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the effects of the complementary slackness condition on the optimal SVM classifier

A

From complementary condition, αi > 0 define the support vectors x(i).
These are the only vectors left in the sum in the optimal classifier. Also, their function
margin is exactly equal to one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What property should the objective function have to allow us to use the
kernel trick?

A

The objective function should be written in terms of inner products between data points, enabling the use of the kernel trick.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Definition/Formula of (Co-)Variance

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Logistic regression does not have a closed form
solution so it gives a local minimum. TRUE/FALSE

A

FALSE: Logistic regression has a closed-form solution, and the optimization process is typically designed to find the global minimum of the logistic loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the difference between Gaussian Discriminative Analysis (GDA)
and the Gaussian Mixture Model (GMM).

A

GDA is primarily used for supervised classification with Gaussian assumptions for class-conditional distributions, while GMM is a more versatile model often used for unsupervised learning and clustering, allowing for a more flexible representation of data distributions using a mixture of Gaussians.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When normalizing the data, it is important to normalize the train and test sets separately.
TRUE/FALSE

A

TRUE: This is because the model should not have any information about the test set during training, including the scale of the features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For any dataset, there exists an SVM classifier that can achieve zero training error.
TRUE / FALSE

A

FALSE:the guarantee of achieving zero training error depends on factors such as the nature of the data, linear separability, and the appropriate choice of parameters. Also overfitting.