PCA Flashcards

1
Q

Why do we need data reduction techniques?

A
  • Because of mammoth data sets – these datasets have loads of observations
  • Only a few important features contribute to the dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does data reduction allow?

A
  • Data reduction allows us to extract the necessary information from a huge array of data and remove redundant data
  • Removes noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

2 types of data reduction techniques

A
  • Dimensionality reduction – reduce the number of input variables in a dataset
  • Numerosity reduction – reduce data volume by using suitable forms of data representation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

examples of dimensionality reduction

A
  • Wavelet transform
  • Attribute subset selection
  • Principal component analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

examples of numerosity reduction

A

histogram
sampling
clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

wavelet transofrm

A

can be applied to ECG signals

• Helps to convert ECG signal into a form which makes it much easier for the QRS peak finder algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Attribute subset selection

A
  • Find a minimum set of attributes which find the same solution
  • With this you have reduced cost because there is less variables
  • Makes it easier for pattern recognition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Principal component analysis

A
  • A variable reduction technique
  • Reduces larger set of variables into smaller set of ‘artificial variables’ called principal components that account for most of the variance in the original variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

PCA can be used to solve 3 major problems:

A
  • Removing unrelated variables
  • Reducing redundancy in a set of variables
  • Removing multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assumptions for PCA

A
  • You have multiple continuous variables
  • Linear relationship between all variables
  • No outliers
  • Large sample size for PCA to produce reliable result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Steps of principal component analysis:

A
  1. Calculate covariance matrix
  2. Compute the eigenvectors and eigenvalues of the covariance matrix to identify principal components
  3. Choose which components to keep (ones with high eigen values)
  4. Reorient the data from the original axes to the ones represented by the principal components using the covariance matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Covariance matrix

A

• It is a square matrix that shows covariances of each pair of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Principal component

A
  • In order to perform a PCA you need to find the axis of the greatest variance which is the line of best fit
  • This line is the first principal component
  • Then you project your data points onto the first principal component
  • Before, each person was represented by lung function and oxygen but now they are represented by one principal component
  • Second principal component accounts for next highest variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Eigenvalues and eigenvectors

A
  • A matrix represents a linear transformation which means that a matrix contains a set of rules for moving data points around
  • One type of linear transformation is shearing – where data points are sheared by multiplying them by the matrix
  • Eigenvector corresponds to direction
  • Eigenvalue is how far along the line the data point has moved and corresponds to distance
  • Eigenvalues represent total amount of variance that be explained by a given principal component
  • Ranking eigenvectors in order of their eigenvalues highest to lowest you get principial component in order of significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Rotations

A
  • Goal of rotation is to improve the interpretability of the factor solution by reaching a simple solution
  • Rotation helps make it easier to interpret PCA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

2 types of rotations:

A
  • Orthogonal rotation (e.g., VARIMAX) – assumes components are independent or uncorrelated with each other
  • Oblique rotation (e.g., direct oblimin) – components are not independent and are correlated
17
Q

Putting it all together - PCA

A

PCA:
Calculate covariance matrix of data
calculate eigenvectors of the covariants matrix = principal components
eigenvector with largest eigen value = first principal component
eigenvalues represent the total variance that can be explained by a given principal component