Dimensionality Reduction (Brainscape) Flashcards

Question 1

Q

What are the main motivations for reducing a dataset’s dimensionality(3)? what are the main drawbacks(4)?

Answer

A

1) speed up subsequent training algorithm
2) to visualize the data and gain insights on the most important features
3) to save space (data compression)

The main drawbacks are:

1) Some information is lost
2) Can be computationally intensive
3) It adds some complexity to your machine learning pipelines.
4) Transformed features are often hard to interpret.

Question 2

Q

What is the curse of dimensionality?

Answer

A

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.

For example is the fact that randomly sampled high-dimensional vectors are generally very sparse, increasing the risk of overfitting.

Question 3

Q

Once a dataset’s dimensionality has been reduced, is it possible to reverse the operation? If so, how? if not, why?

Answer

A

It is almost always impossible to perfectly reverse the operation because some information gets lost during dimensionality reduction. But it is possible to estimate with good accuracy what the original dataset looked like.

Question 4

Q

Does it make any sense to chain two different dimensionality reduction algorithyms?

Answer

A

It can absolutely make sense. A common example is using PCA to quickly get rid of a largue number of useless dimension, then applying another much slower dimensionality reduction algorithm, such as LLE.

Question 5

Q

What is the main problem of dimensionality reduction?

Answer

A

You lose some information.

Question 6

Q

What is the main idea of manifold learning.

Answer

A

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

Question 7

Q

Application of dimensionality reduction

Answer

A

Customer relationship management
Text Mining
Image retrieval
Microarray data analysis
Protein classification
face recognition
handwriting digit recognition
intrusion detection

Question 8

Q

Feature Selection

Answer

A

A process that chooses an optimal subset of features according to a objective function

Objectives: reduce dimensionality and remove noise. Improve speed of learning, predictive accuracy, and simplicity

Think stepwise / forward / backward regressions

Question 9

Q

Feature Extraction

Answer

A

The mapping of the original high dimensionality data to a lower dimensional space

Goals can change based on end usage:
Unsupervised learning - minimize information loss (PCA)
Supervised learning - maximize class discrimination (LDA)

Think PCA

Question 10

Q

Pros of feature reduction

Answer

A

All original features are used although they may not be used in the same form. They are combined linearly.

In feature selection, only a subset of the original features are selected

Question 11

Q

Feature selection methods

Answer

A

Remove features with missing values
remove features with low variance
remove highly correlated features
univariate feature selection
feature selection using select from model
filter methods
wrapper methods
embedded methods
hybrid methods

Question 12

Q

Filter Methods for Feature Selection

Answer

A

Filter based on:
Information Gain
Chi Squared Test
Fishers Score
Correlation coefficient

Question 13

Q

Information gain

Answer

A

Calculates the reduction in entropy from the transformation of a dataset

Question 14

Q

Fisher Score

Answer

A

Fishers score is one of the most widely used supervised feature selection methods.

The algorithm returns the ranks of variables based on the fishers score

Question 15

Q

Brainscape's Knowledge GenomeTM

Dimensionality Reduction (Brainscape) Flashcards

Brainscape's Knowledge Genome^TM