Multivariate Data Flashcards

1
Q

Multivariate Data

A

When multiple variables or features are measured for each observation
If data has many features, it may be referred to as ‘high dimensional’ data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dimensionality Reduction

A

Tried to find a reduced number of features to represent the dataset while preserving its structure
Axis on which features are drawn which allows one to understand how they link
Finds a reduced number of features representing the dataset whilst preserving some of the features in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dimensionality Reduction Usefulness

A

Allows the visualisation of multivariate data
Analyse/interpret its features
Understanding the structure of the original variables in terms of these latent features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Summary of Multivariate Data

A

More than one variable/feature of the data for each observation
Widespread in many areas
Many variables/features then dimensionality reduction can help with visualising/anaylsis/interpretation
Dimensionality reduction is a form of unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Matrix of Scatterplts

A

Diagonal elements = distribution of each variable
Off-diagonal below = scatterplot between each variable
Off-diagonal above = correlation coefficient of variables
All datapoints plotted
If working with all data sets can greatly impact the easiness of visualising data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Heat Map

A

Visualise relationship between many variables
Colour relates to correlation coefficient
Representational similarity analysis - distinct representation of value and where subjects are attending
Structure in data can ‘pop out’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculating a Covariance/Correlation Matrix

A

Capture relationship between all the different variables
Correlation matrix contains all of the correlations between all the different data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Summary of Visualisation

A

Multivariate data commonly visualised to examine how different variables correlate (covary) with each other
Heatmaps provide an intuitive way to examine structure when there are many variables
Covariance matrix is a matrix that contains the covariance of all variables with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Principal Component Analysis

A

Examines correlation between two different variables
Normally related to how correlated the different variables are
Regression includes fitting a line summarising the sum of squared individuals in the y axis emphasising difference
PCA finds Euclidean distance between data points trying to find axis predicting all of the variables available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Principal Component 1

A

Axis explaining the most variance in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regression vs PCA

A

Regression minimises residuals in y
PCA minimises euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variables Correlated

A

Describes data using a single vector - explains how much each of the x and y contributes to the components
Defining axis is saying how much of a contribution each original variable is contributiong to the PC1
Some variance is not explained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Principal Component 2

A

Vector explaining most variance after the contribution of PC1 has been removed
Points in another direction
Length of vector - amount of variance explained in original data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Final Principal Component

A

Resulting princiapl components depend on identifying the axes of covariance between the variables
Scree plot shows the variance explained by each principal component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

PCA in Practice

A
  1. subtract the mean of each column, and divide the standard deviation
  2. calculate covariance matrix between these columns
  3. calculate the eigenvectors and eigenvalues of the covariance matrix
  4. sort the eigenvectors by the eigenvalues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Eigenvalues

A

% explained variance

17
Q

Eigenvectors

A

Loadings/weights

18
Q

Overall Summary

A

Goal of PCA = find new set of axes explaining maximum variance in the original data
Vectors are obtained by explaining the covariance structure between different variables
Resulting vectors sorted from explaining most to least variance
Vectors are all orthogonal to each other