wk6 Flashcards
Define variance
deviation of the average sample from the mean
What is covariance
how multidimensional data varies (deviates from the mean) w.r.t other dimensional data
What does a positive, negative, and zero covariance values mean
Positive:
both dimensions increase/decrease in variance w.r.t each other when the other increases/decreases (positive correlation)
Negative:
One dimension increases, the other decreases, and vice versa ( w.r.t variance)
Zero:
dimensions are not related at all
Why is measuring covariance difficult for something like medical imaging
Data is often in very highly-dimensional space, visualising this or analysing a correlation matrix is largely infeasible
What does PCA do at a high level
It maps the dataset modelled as vectors to a lower dimensional sub-space (coordinate system) by finding the eigenvectors ordered from highest to lowest eigenvalues. These eigenvectors can project the data into a lower dimensional subspace if we use fewer eigenvectors than the original dimensions.
This way, we can preserve most of the variance in the data while making it easier to work with. We may also uncover otherwise non-obvious relationships
What are the steps to computing PCA
1) Compute the covariance matrix
2) Compute the eigenvectors and eigenvalues of the covariance matrix
3) Arrange eigenvectors by eigenvalues from high to low and keep only the amount which explains the majority of the original variance in the data
4) Project data onto new eigenspace by multiplying the original data by eigenvectors
What is a quick way of computing a covariance matrix
1) mean centre data matrix
2) X . X^T
what does it mean if some eigenvalues are 0
They may be highly correlated with other pieces of data and therefore individually do not explain much of the variance in the original data
If a value is zero or close within some threshold i.e. the eigenvalues, then discard them
What do eigenvalues and eigenvectors of a matrix depict
Eigenvalues = magnitude of variance of a dimension
Eigenvectors = the direction of greatest variation
what is the relation of two eigenvectors with one another
they are orthogonal
What are eigenfaces
a decomposition of a large dataset of faces into a set of characteristics which best describe the most variable features of a face
how can we do facial recognition via PCA
1) gather a large dataset of human faces
2) compute a covariance matrix of the human faces. First re-arrange the dataset such that the rows correspond to an image and the column corresponds to the pixels of that image
3) Find the eigenvectors ordered by eigenvalues of the covariance matrix and only keep those which explain some significant variance
4) now take all images one by one and for each, re-arrange the matrix such that it is a single row with pixels as columns
5) multiply the image by the eigenvectors of of the eigenfaces to find the images corresponding weights
6) to reconstruct the image, multiply the eigenfaces by the weights
7) given a new image then, you can find it’s weights and then find the closest corresponding wieghts to another face and determine that as the most likely match