Exam Flashcards
Feature Transformation is the process of obtaining new features (output) from existing features (input). Which of the following is not an example of that?
(a) shift from Cartesian to polar coordinates.
(b) Scaling the data to a speci!c interval as in normalizing.
(c) Computing the distance between two features with a given distance metric.
(d) Dropping some of the features in the original data set.
D
Which of the following statements about Principal Component Analysis (PCA) is true?
(a) PCA is a high-dimensional clustering method.
(b) PCA is an unsupervised learning method.
(c) PCA reduces dimensionality.
(d) PCA enhances prediction performance.
B
- Which of the following is/are true?
(I) Using too many features might result in learning exceptions that are specific to the training data, which might not generalize to the real world.
(II) Pearson’s coefficient being 0 implies that there is no dependency between the two input variables.
(III) The death rate among intensive care patients is high. However, we can not deduce from that being in intensive care causes the deaths, because correlation does not imply causation.
(a) Only I,
(b) I and II,
(c) I and III,
(d) I, II, and III
C
2 is not right, because if the variables are independent, Pearson’s correlation coefficient is 0, but the other way around is not true because the correlation coefficient detects only linear dependencies between two variables.
https://en.wikipedia.org/wiki/Correlation_and_dependence
See Correlation and independence
OR/AND
It is in the two input variables part. Since input variables are X variables and correlation is about the X and Y variable.
Which of the following are true about k-means clustering?
(I) k refers to the size of each cluster.
(II) For a given data set, the ideal number of the clusters is independent of the problem statement or relevant features.
(III) Imagine a data set has weight (in kg), size (in m3) and value (in Euros) for each parcel for a Cargo carrier. It is likely that, the chosen number of clusters for insurance purposes (based on value only) can be different from the number of clusters based on size (weight and volume).
(IV) If the average silhouette coe”cient in a cluster is close to 1, then the points in the
cluster are tightly grouped together.
(a) II,IV
(b) I,II
(c) III,IV
(d) I,II,III,IV
C
Which of the following is required by k-means clustering ?
(a) a de!ned distance metric
(b) number of clusters
(c) initial guess as to cluster centroids
(d) all of the above.
D
Which of the following is true about feature vectors?
(a) Prediction performance improves with the number of features in the feature vector.
(b) Prediction performance worsens if the number of features included in the feature vector decreases.
(c) Prediction performance depends on the balance between too few and too many features.
(d) Prediction performance bene!ts from the curse of dimensionality.
C
What is the result of a Principal Component Analysis transformation?
(a) A reduced set of features by linearly ranking the most important features.
(b) A new set of correlated features extracted by weighting the feature distances.
(c) A new set of features extracted by linearly recombining the original features.
(d) A reduced set of features extracted by rotating the feature matrix.
C
- What is the Curse of Dimensionality?
(a) The vector distances between your instances decrease with the number of features.
(b) The number of required features grows almost exponentially with the amount of
data.
(c) The amount of required data grows almost exponentially with the number of features.
(d) In high-dimensional feature space, data tends to be normally distributed.
C
When using a wrapper strategy, what is the best motivation for including feature selection
as a hyperparameter of a predictive model?
(a) It helps di#erent models select the same set of optimal features.
(b) A certain set of optimal features can be determined independently of the model.
(c) Reducing dimensions guarantees better generalization.
(d) A certain set of optimal features can be determined dependent on the model.
D
- You are testing various predictive algorithms and found that they are very likely to overfit due to the curse of dimensionality. Which strategy below would most likely solve this problem?
(a) Dimensionality Reduction
(b) Feature selection
(c) Both (a) and (b) may solve this problem.
(d) Neither (a) or (b) can be applied to solve this problem.
C
- What is the di#erence between dimensionality reduction and clustering?
(a) Clustering groups objects together based on their similarity in the feature space, while dimensionality reduction rotates them in the feature space.
(b) Clustering is a type of classi!cation, while dimensionality reduction is a particular way of carrying out regression.
(c) Clustering simpli!es data by grouping data points together based on their similarity in the feature space, while dimensionality reduction simpli!es the data by projecting the data points onto a smaller-dimensional space.
(d) Clustering is supervised, while dimensionality reduction is unsupervised.
C
- Given the data points x1 = (1, 1) and x2 = (8, 8), what is the Euclidean and the cosine distance between these points?
(a) Euclidean = 9.8995, cosine = 0
(b) Euclidean = 3,3416, cosine = 0
(c) Euclidean = 9.8995, cosine = 9
(d) Euclidean = 3,3416, cosine = 9
A
- For a classi!cation task, you apply Principal Component Analysis (PCA) to a data set with 10-dimensional feature vectors. You notice that the performance on your validation set is smaller for 9 than for 10 components. Which one of the following statements applies?
(a) The classi!er trained on 9 components is overfitting.
(b) The 10-th component contains information relevant for prediction.
(c) The 10 components contain considerable amounts of noise.
(d) All of the above.
B
- What is the main di#erence between the decision boundary generated by logistic regression
and the decision boundary generated by a linear Support Vector Machine (SVM)?
(a) In contrast to the decision boundary of logistic regression, the decision boundary of the SVM can be nonlinear.
(b) In contrast to the decision boundary of logistic regression, the decision boundary of the SVM can be linear.
(c) In contrast to the decision boundary of logistic regression, the decision boundary of the SVM is high dimensional.
(d) In contrast to the decision boundary of logistic regression, the decision boundary of the SVM is optimal.
D
Consider a multiclass e-mail classi!cation task that tries to predict calendar categories (i.e. meeting, festival, delivery, etc) based on the content of an e-mail. We train a Naive Bayes classifier for this task. The category festival occurs infrequently compared to the other categories. Which statement is true regarding this category?
(a) If all words in an e-mail would have equal probabilities between classes, the overall
probability for this category would be lower.
(b) Regardless of the probabilities for the words in an e-mail, this category will generally always have low prediction probabilities compared to the others.
(c) For this category to get classi!ed by the model, the e-mail would require high fre-
quency words to occur in an e-mail.
(d) As the probabilities for the words under this class will all be low, this category will generally always have low prediction probabilities compared to the others.
A
Will always be lower due to prior. The occurence is not equally likely.
Consider the features and prediction descriptions below. Which one of these is an example of information leakage?
(a) Predicting tweet sentiment (positive or negative) and using words such as good and bad as features.
(b) Predicting survival rates for a sinking ship and using date of birth of the passenger as one of the features.
(c) Predicting the severity of an incoming hurricane and using the money worth of
damages it caused as one of the features.
(d) Predicting daily ticket sales for a theme park and using yesterday’s amount of visitors and ticket price as one of the features.
C
Data leakage is when information from outside the training dataset is used to create the model. This additional information can allow the model to learn or know something that it otherwise would not know and in turn invalidate the estimated performance of the mode being constructed.
Als je een orkaan wil voorspellen en je gebruikt de schade die de orkaan heeft aangericht nadat de orkaan geweest is…..Dt kan je nog niet weten.