test Flashcards
In cross validation, the training data and the validation data should be preprocessed with the same parameters. TRUE / FALSE
True : Using separate statistics for training and validation sets mightcause a mismatch between the expected values of features from the training set and thevalidation set. We should use training set statistics for both.
Bagging increases the variance of decision trees. TRUE / FALSE
False :Bagging creates less correlated predictors by training onsubsets of data and aggregating predictions, which reduces the variance.
Solving logistic regression for Maximum Likelihood (ML) and Maximum aPosteriori (MAP) does not lead to the same solution. TRUE / FALSE
True : MAP introduces a regularization term on parameters θ
What is the main objective of Principal Components Analysis (PCA)?
Dimensionality reduction: The main objective of PCA is to identify the subspace in which the data approximately lies, so it could be possible to project the data on a subspace which has lower dimension than the actual data
Describe the normalization of the input data in PCA
First subtract the average of the samples. Second, divide by thestandard deviation of the data.
Explain why the normalization step is needed
By subtracting the mean weavoid taking the mean as one of the dominant eigenvectors (basis vectors); by dividing the data by the standard deviation we remove relative scalings between differentfeatures and make them more comparable
Definition of cross-entropy and Loss function in decision tree
Formulas:
If we normalize the
data before applying k-means clustering, will we get the same cluster assignments as
without normalization?
No, the issue is with the scaling applied when moving to unit variance.
Centroid-assignments are computed according to euclidean distance and changing the
scale of one of the variables can have an influence on this.
List the KKT conditions
- Primal Feasibility (inequalityconstraints)
- Dual Feasibility (Lagrange multipliers non-neg.)
- Compl. Slackness (Lag. mult. * Prim. Feas. = 0)
- Gradient of Lagrange ( is = 0)
What are the effects of the complementary slackness condition on the optimal SVM classifier
From complementary condition, αi > 0 define the support vectors x(i).
These are the only vectors left in the sum in the optimal classifier. Also, their function
margin is exactly equal to one.
What property should the objective function have to allow us to use the
kernel trick?
The objective function should be written in terms of inner products between data points, enabling the use of the kernel trick.
Definition/Formula of (Co-)Variance
Formula
Logistic regression does not have a closed form
solution so it gives a local minimum. TRUE/FALSE
FALSE: Logistic regression has a closed-form solution, and the optimization process is typically designed to find the global minimum of the logistic loss function.
Describe the difference between Gaussian Discriminative Analysis (GDA)
and the Gaussian Mixture Model (GMM).
GDA is primarily used for supervised classification with Gaussian assumptions for class-conditional distributions, while GMM is a more versatile model often used for unsupervised learning and clustering, allowing for a more flexible representation of data distributions using a mixture of Gaussians.
When normalizing the data, it is important to normalize the train and test sets separately.
TRUE/FALSE
TRUE: This is because the model should not have any information about the test set during training, including the scale of the features