PCA Flashcards

Question 1

Q

Why do we need data reduction techniques?

Answer

A

Because of mammoth data sets – these datasets have loads of observations
Only a few important features contribute to the dataset

Question 2

Q

What does data reduction allow?

Answer

A

Data reduction allows us to extract the necessary information from a huge array of data and remove redundant data
Removes noise

Question 3

Q

2 types of data reduction techniques

Answer

A

Dimensionality reduction – reduce the number of input variables in a dataset
Numerosity reduction – reduce data volume by using suitable forms of data representation

Question 4

Q

examples of dimensionality reduction

Answer

A

Wavelet transform
Attribute subset selection
Principal component analysis

Question 5

Q

examples of numerosity reduction

Answer

A

histogram
sampling
clustering

Question 6

Q

wavelet transofrm

Answer

A

can be applied to ECG signals

• Helps to convert ECG signal into a form which makes it much easier for the QRS peak finder algorithms

Question 7

Q

Attribute subset selection

Answer

A

Find a minimum set of attributes which find the same solution
With this you have reduced cost because there is less variables
Makes it easier for pattern recognition

Question 8

Q

Principal component analysis

Answer

A

A variable reduction technique
Reduces larger set of variables into smaller set of ‘artificial variables’ called principal components that account for most of the variance in the original variables

Question 9

Q

PCA can be used to solve 3 major problems:

Answer

A

Removing unrelated variables
Reducing redundancy in a set of variables
Removing multicollinearity

Question 10

Q

Assumptions for PCA

Answer

A

You have multiple continuous variables
Linear relationship between all variables
No outliers
Large sample size for PCA to produce reliable result

Question 11

Q

Steps of principal component analysis:

Answer

A

Calculate covariance matrix
Compute the eigenvectors and eigenvalues of the covariance matrix to identify principal components
Choose which components to keep (ones with high eigen values)
Reorient the data from the original axes to the ones represented by the principal components using the covariance matrix

Question 12

Q

Covariance matrix

Answer

A

• It is a square matrix that shows covariances of each pair of variables

Question 13

Q

Principal component

Answer

A

In order to perform a PCA you need to find the axis of the greatest variance which is the line of best fit
This line is the first principal component
Then you project your data points onto the first principal component
Before, each person was represented by lung function and oxygen but now they are represented by one principal component
Second principal component accounts for next highest variance

Question 14

Q

Eigenvalues and eigenvectors

Answer

A

A matrix represents a linear transformation which means that a matrix contains a set of rules for moving data points around
One type of linear transformation is shearing – where data points are sheared by multiplying them by the matrix
Eigenvector corresponds to direction
Eigenvalue is how far along the line the data point has moved and corresponds to distance
Eigenvalues represent total amount of variance that be explained by a given principal component
Ranking eigenvectors in order of their eigenvalues highest to lowest you get principial component in order of significance

Question 15

Q

Rotations

Answer

A

Goal of rotation is to improve the interpretability of the factor solution by reaching a simple solution
Rotation helps make it easier to interpret PCA

Question 16

Q

2 types of rotations:

Answer

Study These Flashcards

A

Orthogonal rotation (e.g., VARIMAX) – assumes components are independent or uncorrelated with each other
Oblique rotation (e.g., direct oblimin) – components are not independent and are correlated

Question 17

Q

Putting it all together - PCA

Answer

Study These Flashcards

A

PCA:
Calculate covariance matrix of data
calculate eigenvectors of the covariants matrix = principal components
eigenvector with largest eigen value = first principal component
eigenvalues represent the total variance that can be explained by a given principal component

PCA Flashcards

(17 cards)