Week 4 Flashcards

1
Q

Dimensionality reduction

A

The transformation of data from a high-dimensional space into a low-dimensional space so that meanful properties are kept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

PCA (abbreviation)

A

Principal component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

LDA (abbreviation)

A

Linear discriminant analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens in feature selection?

A

You choose k<d important features (for a dataset with d features) and ignore the rest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is feature selection preferred instead of feature extraction?

A

When features are individually powerful or meaningful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens in feature extraction?

A

The original x.i with i=1,…,d dimensions are projected to new k<d dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is feature extraction preferred over feature selection?

A

When features are individually weak and have similar variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does langrangian relaxation do?

A

It relaxes the equality constraints of a constrained optimization problem by introducing a Langrangian multiplier vector lambda.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the projection of x on the direction of w in PCA?

A

z = w.T * x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of problems does LDA solve?

A

multi-class classification problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does LDA do?

A

It separates classes through dimensionality reduction. It maximizes the distance between the means of two classes and then minimizes the variance within the individual lasses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a high eigenvalue mean in LDA?

A

That the associated eigenvector is more critical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What matrices contain the eigenvectors calculated from the data in LDA?

A

1) between-class scatter matrix
2) within-class scatter matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the between-class scatter matrix contain in LDA?

A

The dataspread within each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the within-class scatter matrix contain in LDA?

A

How classes are spread between themselves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the goal in LDA?

A

To find a w that maximizes
J(w) = (|w.T(m.1-m.2)|^2) / w.TS.ww

15
Q

What solution does LDA give for w?

A

w = c * S.w^-1 * (m.1- m.2)
with c a constant

16
Q

What is the first step in PCA?

A

Transform M-dimensional data to have zero mean by subtracting y_ from each object.

17
Q

What is the second step in PCA?

(After you’ve made the data have zero mean)

A

Compute the sample covariance matrix C.

C= 1/N * SUM(n=1 to N) y.n*y.n.T

18
Q

What is the third step in PCA?

(After you’ve made the mean of data 0 & computed the sample covariance matrix C)

A

Find the M eigenvector/eigenvalue pairs of the covariance matrix.

19
Q

What is the 4th step of PCA?

(After you’ve made the data mean 0, computed covariance matrix C and found eigenvalue/eigenvector pairs of C)

A

Find the eigenvectors corresponding to the D highest eigenvalues

20
Q

What is the 5th and last step of PCA?

(After you’ve made mean 0, found C, calculated eigenvector/eigenvalue pairs of C and found highest pairs)

A

Create the d-th dimension for object n in the projection by calculating

x.nd = w.d.T * y.n

21
Q

How do you make the mean of the data 0 when performing PCA?

A

Calculate the mean and then do y.n - mean for each datapoint.

22
Q

What does the pair with the highest eigenvalue of the covariance matrix C tell us in PCA?

A

It corresponds to the projection with the maximal variance.

23
Q

Clustering

A

The partitioning of data objects into a finite number of disjoint groups such that objects in the same group share similarity.

24
Q

Suppose we have a continuous dataset and multiple different priors. When using Bayes’ theorem, why do we not have to condition on a prior in the likelihood, in the p(t|w)?

A

Because the choice of prior doesn’t matter for the probability of the data if the value of w is fixed.
m is conditionally independent of t given w.

25
Q

Hyperprior

A

Distribution over random variables.

26
Q

MAP method (abbreviation)

A

maximum a posteriori estimate

27
Q

How do you get the eigenvalues of a matrix?

A

Solve |K- lambda * I| = 0
Then take the determinant of the left side

28
Q
A