Week 4 Flashcards
Dimensionality reduction
The transformation of data from a high-dimensional space into a low-dimensional space so that meanful properties are kept.
PCA (abbreviation)
Principal component analysis
LDA (abbreviation)
Linear discriminant analysis
What happens in feature selection?
You choose k<d important features (for a dataset with d features) and ignore the rest.
When is feature selection preferred instead of feature extraction?
When features are individually powerful or meaningful.
What happens in feature extraction?
The original x.i with i=1,…,d dimensions are projected to new k<d dimensions.
When is feature extraction preferred over feature selection?
When features are individually weak and have similar variance.
What does langrangian relaxation do?
It relaxes the equality constraints of a constrained optimization problem by introducing a Langrangian multiplier vector lambda.
What is the projection of x on the direction of w in PCA?
z = w.T * x
What kind of problems does LDA solve?
multi-class classification problems
What does LDA do?
It separates classes through dimensionality reduction. It maximizes the distance between the means of two classes and then minimizes the variance within the individual lasses.
What does a high eigenvalue mean in LDA?
That the associated eigenvector is more critical.
What matrices contain the eigenvectors calculated from the data in LDA?
1) between-class scatter matrix
2) within-class scatter matrix
What does the between-class scatter matrix contain in LDA?
The dataspread within each class
What does the within-class scatter matrix contain in LDA?
How classes are spread between themselves.
What is the goal in LDA?
To find a w that maximizes
J(w) = (|w.T(m.1-m.2)|^2) / w.TS.ww
What solution does LDA give for w?
w = c * S.w^-1 * (m.1- m.2)
with c a constant
What is the first step in PCA?
Transform M-dimensional data to have zero mean by subtracting y_ from each object.
What is the second step in PCA?
(After you’ve made the data have zero mean)
Compute the sample covariance matrix C.
C= 1/N * SUM(n=1 to N) y.n*y.n.T
What is the third step in PCA?
(After you’ve made the mean of data 0 & computed the sample covariance matrix C)
Find the M eigenvector/eigenvalue pairs of the covariance matrix.
What is the 4th step of PCA?
(After you’ve made the data mean 0, computed covariance matrix C and found eigenvalue/eigenvector pairs of C)
Find the eigenvectors corresponding to the D highest eigenvalues
What is the 5th and last step of PCA?
(After you’ve made mean 0, found C, calculated eigenvector/eigenvalue pairs of C and found highest pairs)
Create the d-th dimension for object n in the projection by calculating
x.nd = w.d.T * y.n
How do you make the mean of the data 0 when performing PCA?
Calculate the mean and then do y.n - mean for each datapoint.
What does the pair with the highest eigenvalue of the covariance matrix C tell us in PCA?
It corresponds to the projection with the maximal variance.
Clustering
The partitioning of data objects into a finite number of disjoint groups such that objects in the same group share similarity.
Suppose we have a continuous dataset and multiple different priors. When using Bayes’ theorem, why do we not have to condition on a prior in the likelihood, in the p(t|w)?
Because the choice of prior doesn’t matter for the probability of the data if the value of w is fixed.
m is conditionally independent of t given w.
Hyperprior
Distribution over random variables.
MAP method (abbreviation)
maximum a posteriori estimate
How do you get the eigenvalues of a matrix?
Solve |K- lambda * I| = 0
Then take the determinant of the left side