3. Visual Healthcare Analytics Flashcards
what is parallel coordinate
- visual of multidim data
- given NxM table with N patients and M clinical var, a line chart is generated by displaying M equally spaced vertical axes with individual ranges
- can then filter and explore correlations between vars for patients
draw a parallel coordinate graph
what is a chord visualisation plot
a diagram illustrating the connection between different variables
draw a chord visualisation plot
what are 2 dim reduction algos
- pca
- t-SNE
what is t-SNE
an algorithm that calculates similarity via the high dim and low dim space. it computes the distance between instances in both spaces and tries to optimise these similarity measures using a cost function
what is pca
- unsupervised linear dimensionality reduction
- visualisation technique
how is pca calculated at a high level (variance)
the greatest variance by some scalar projection becomes the first PC (coordinate), and the second greatest becomes the second PC etc etc.
how can we determine the amount of preserved information after pca is applied
variance - compare that of all original dimensions, with the variance of the reduced dimensions
after performing pca, two variables have a variance of 1.46 (A), and 0.2 (B). the entire dataset itself has a variance of 2.06, what can be inferred by this
we can infer that variable A alone can explain most of the information of the output predicted by the two variables
how is pca calculated
how are the number of pca dimensions measured
- consider using a scree plot. it shows the variance explained by each PC, based on the number of PCs used
draw a scree plot
when should we not use PCA
pca is linear because when calculated, it is projected as a linear vector [finish this]
what to use instead of pca
if it’s not a linear transformation, consider distributed stochastic neighbourhood embedding (t-SNE)