Week 4 Flashcards by Jedidja Marsman

Dimensionality reduction

The transformation of data from a high-dimensional space into a low-dimensional space so that meanful properties are kept.

How well did you know this?

Not at all

Perfectly

PCA (abbreviation)

Principal component analysis

How well did you know this?

Not at all

Perfectly

LDA (abbreviation)

Linear discriminant analysis

How well did you know this?

Not at all

Perfectly

What happens in feature selection?

You choose k<d important features (for a dataset with d features) and ignore the rest.

How well did you know this?

Not at all

Perfectly

When is feature selection preferred instead of feature extraction?

When features are individually powerful or meaningful.

How well did you know this?

Not at all

Perfectly

What happens in feature extraction?

The original x.i with i=1,…,d dimensions are projected to new k<d dimensions.

How well did you know this?

Not at all

Perfectly

When is feature extraction preferred over feature selection?

When features are individually weak and have similar variance.

How well did you know this?

Not at all

Perfectly

What does langrangian relaxation do?

It relaxes the equality constraints of a constrained optimization problem by introducing a Langrangian multiplier vector lambda.

How well did you know this?

Not at all

Perfectly

What is the projection of x on the direction of w in PCA?

z = w.T * x

How well did you know this?

Not at all

Perfectly

What kind of problems does LDA solve?

multi-class classification problems

How well did you know this?

Not at all

Perfectly

What does LDA do?

It separates classes through dimensionality reduction. It maximizes the distance between the means of two classes and then minimizes the variance within the individual lasses.

How well did you know this?

Not at all

Perfectly

What does a high eigenvalue mean in LDA?

That the associated eigenvector is more critical.

How well did you know this?

Not at all

Perfectly

What matrices contain the eigenvectors calculated from the data in LDA?

1) between-class scatter matrix
2) within-class scatter matrix

How well did you know this?

Not at all

Perfectly

What does the between-class scatter matrix contain in LDA?

The dataspread within each class

How well did you know this?

Not at all

Perfectly

What does the within-class scatter matrix contain in LDA?

How classes are spread between themselves.

How well did you know this?

Not at all

Perfectly

What is the goal in LDA?

Study These Flashcards

To find a w that maximizes
J(w) = (|w.T(m.1-m.2)|^2) / w.TS.ww

What solution does LDA give for w?

Study These Flashcards

w = c * S.w^-1 * (m.1- m.2)
with c a constant

What is the first step in PCA?

Study These Flashcards

Transform M-dimensional data to have zero mean by subtracting y_ from each object.

What is the second step in PCA?

(After you’ve made the data have zero mean)

Study These Flashcards

Compute the sample covariance matrix C.

C= 1/N * SUM(n=1 to N) y.n*y.n.T

What is the third step in PCA?

(After you’ve made the mean of data 0 & computed the sample covariance matrix C)

Study These Flashcards

Find the M eigenvector/eigenvalue pairs of the covariance matrix.

What is the 4th step of PCA?

(After you’ve made the data mean 0, computed covariance matrix C and found eigenvalue/eigenvector pairs of C)

Study These Flashcards

Find the eigenvectors corresponding to the D highest eigenvalues

What is the 5th and last step of PCA?

(After you’ve made mean 0, found C, calculated eigenvector/eigenvalue pairs of C and found highest pairs)

Study These Flashcards

Create the d-th dimension for object n in the projection by calculating

x.nd = w.d.T * y.n

How do you make the mean of the data 0 when performing PCA?

Study These Flashcards

Calculate the mean and then do y.n - mean for each datapoint.

What does the pair with the highest eigenvalue of the covariance matrix C tell us in PCA?

Study These Flashcards

It corresponds to the projection with the maximal variance.

Clustering

The partitioning of data objects into a finite number of disjoint groups such that objects in the same group share similarity.

Suppose we have a continuous dataset and multiple different priors. When using Bayes' theorem, why do we not have to condition on a prior in the likelihood, in the p(t|w)?

Because the choice of prior doesn't matter for the probability of the data if the value of w is fixed. m is conditionally independent of t given w.

Hyperprior

Distribution over random variables.

MAP method (abbreviation)

maximum a posteriori estimate

How do you get the eigenvalues of a matrix?

Solve |K- lambda * I| = 0 Then take the determinant of the left side

Week 4 Flashcards

(30 cards)