Kursusgang 4 (Dimensionality reduction) Flashcards

1
Q

What is the curse of dimensionality?

A

The growth of volume is an exponential function of the dimensionality.

Classification accuracy depends on the dimensionality (and amount of training data) but the computational complexity also depends on the dimensionality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we want to reduce the dimensionality?

A

Reduce time complexity: Less computation.
Reduce space complexity: Less parameters.
Simpler models are more robust on small datasets.
More interpretable & simpler explanation.
Data visualization (structure, groups, outliers, etc.) if plotted in 2 or 3 dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two general methods of doing dimensionality reduction?

A

Feature selection that try to find a subset of the original variables. This is done by choosing M < D important features and ignoring the remaining D – M features.

Feature extraction that applies a mapping of the multidimensional space into a space of fewer dimensions. This is done by projecting the original x_i, i =1 ,…, D dimensions to the new M < D dimensions, z_j , j =1 ,…, M

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name the methods of feature selection

A

Forward search

Hill-climbing algorithm

Backward search

Floating search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does forward search work?

A

The method adds the best feature at each step.
Let the set of features F initially Ø. Then, at each iteration, find the best new feature
j = argmin_i E (F ∪ x_i).
Add x_j to F if E(F ∪ xj) < E(F).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does the hill-climbing algorithm work?

A

Start with an arbitrary or random solution. Identify neighboring states of the current solution by making small adjustments. If one of the neighboring states offers a better solution (according to some evaluation function), move to this new state. Repeat this process until no neighboring state is better than the current one. At this point, you’ve reached a local maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does backward search work?

A

It works by iteratively removing features one at a time that are not predictive of the target variable or have the least predictive power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does floating search work?

A

It adds k features and then removes l features. Both k and l and varied, thus searching the best features for a given dimension.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two main feature extraction methods?

A

Principal component analysis (PCA)
Linear discriminant analysis (LDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is principal component analysis?

A

It is a unsupervised linear method for dimensionality reduction.

It can be defined in two ways:
* The linear projection that minimizes the
average projection cost, that is defined as the mean squared distance between the data points and their projections
* The direction in the data space that has highest variance. The eigenvector to the largest eigenvalue is the first principal component.

The principal components must be mutually orthogonal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the eigenvalue in principal component analysis mean?

A

The eigenvalue magnitude signifies on how much variance each component explains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is principal component analysis performed in practice?

A

The covariance matrix of data is constructed. Eigenvectors of this matrix are computed. Data are projected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How should the proportion of variance, M<D, be chosen in principal component analysis?

A

Proportion of Variance is defined as the fraction of the sum of the M largest eigenvalues over the sum of the D largest eigenvalues. Proportion of variance should be below 0.9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define the Fischer criterion.

A

It is the projection of the data that maximizes the ratio
of class variance relative to the within-class variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define linear discriminant analysis.

A

Find a low-dimensional space such that when x is projected, classes are well-separated.

For two classes, this simplifies to the Fischer criterion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is principal component analysis an unsupervised method?

A

Yes

17
Q

Is linear discriminant analysis a supervised method?

A

Yes

18
Q

What is the reconstruction error of principal component analysis?

A

The variance not included in the PCA.