Chapter 6 - Unsupervised Learning Techniques Flashcards

1
Q

Reasons why unsupervised learning techniques are more challenging (2; 6)

A

1) Less clearly defined objectives; no simple goal
2) Less objective evaluation - no target to assess model quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Uses of unsupervised learning techniques (2; 6)

A

1) Exploratory data analysis - these techniques lend themselves to high dimension datasets, where bivariate data analysis would be futile
2) Feature generation - can generate features as a byproduct of exploratory data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Principal components analysis (6.1.1)

A

Advanced data analytic technique that transforms a high-dimensional dataset into a smaller, much more set of representative variables, known as principal components.

PCs are linear combinations of existing variables.

Particularly useful when dealing with highly correlated data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

PCA - Technical details (6.1.1)

A

Consider an n*p matrix X, denoting the combination of observations (n) and features (p) - features should be centered so they have a zero mean

We then create (m) PCs, which are linear combinations of the features. Each coefficient is called a “loading”. Result is a 1*p matrix of loadings for each PC

The PC’s “score” is then calculated by applying the loadings against the feature values (multiply matrices, result in 1*p matrix)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Selecting PC loadings (3; 6.1.1)

A

1) Selected in a way that maximizes the sample variance of PC scores. Normalization constraint applied so that variance can not be increased by arbitrarily increasing loadings.

2) Geometrically, the p loadings define a line that we can interpret as:
a) Maximal variance direction - the line along which the data vary the most (i.e. PC scores spread out as much as possible)
b) Line with minimal distance - line as close as possible to the observations (minimizes the square of perpendicular distances between observations and line)

3) After choosing the first PC, the second PC should be chosen in a way that maximizes variance with the additional criteria that the score of the second PC is uncorrelated with the score of the first - forces the second PC to measure different aspects of the variables. This line is perpendicular to the first PC line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Application of PCA (2; 6.1.1)

A

1) Data visualization - PCA reduces dimensionality from (p) variables to (m) PCs, making data much easier to visualize. With 2 PCs, we can plot scores against each other to produce a two-dimensional view of the data

2) Feature generation - the PCs are new features which are mutually uncorrelated by definition, removing collinearity issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Proportion of variance explained (6.1.2)

A

PVE = variance explained by the Mth PC / total variance

1) Always between 0 and 1, sum to 1 across all p features
2) Monotonically decreasing - PVE will be the highest for the first PC, which explains the most variance
3) Centering and scaling - if all variables adjusted to have zero mean and scaled to have unit standard deviation, math simplifies such that PVE = the sum of squared PC scores / n * p

If no scaling occurs, PCA will heavily favor variables with a large variance on their scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Drawbacks of PCA (4; 6.1.2)

A

1) Interpretability - largest drawback, difficult to interpret new features
2) Not good for non-linear relationships
3) PCA is not feature selection (PCs are linear combinations of all predictors, none of which are dropped)
4) Target variable is ignored when using PCA in a supervised learning problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly