Week 2 - Visualising your data and models Flashcards

1
Q

Model-in-the-data-space

A

Assessing model fit by plotting it on the data

Straightforward if it is low dimensions
However, challenging if it is high dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data-in-the-model-space

A

Plot the data using the model’s perspective to see how well it aligns with predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scatterplot matrix

A

Showcases:
- Linear association (correlation)
- Clumping (separated data points)
- Clustering (un-separated points with high concentration)
- Outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Scatterplot matrix (with supervised data)

A

When its supervised and includes a response variable, always include it

More clear in terms of cluster and linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Scatterplot matrix drawbacks

A

Difficult to plot large numbers of variables

May not be able to detect outliers (and need of higher dimension)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Perception

A

Aspect ratio for scatterplots need to be equal/square because it adversely affects the perception of correlation and association between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parallel coordinate plot

A

x-axis is the variables and y-axis is their value
Each line represents an observation

Examining direction and orientation of lines to perceive multivariate relationships
- Crossing lines indicate negative association
- Lines with same slope indicate positive association
- Outliers have different pattern
- Groups of lines with same pattern indicate clustering

Can plot many more variables than a scatter plot matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parallel coordinate plot drawbacks

A

Disadvantages:
- Hard to follow lines
- Order of variables matters (insights are coming from the view, not the data)
- Scaling of variables matters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Scaling

A

Used to make data comparable, where Scaled data will be between 0 and 1

Standard data -> (data value - variable mean) / variable standard deviation

MinMax data -> (data value - variable lower bound) / (variable upper bound - variable lower bound)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

High-dimensions

A

Higher dimensions add new orthogonal axes and in machine learning, data often exists in much higher dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Matrix

A

n*p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Projection Matrix

A

p*d

‘p’ can be any value, but we most likely going to ‘2’ as the ‘d’ dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Projected Data Matrix

A

n*d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dimension reduction

A

Chooses the optimal projection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Principal component analysis (PCA)

A

Produces a low-dimensional representation of a dataset

It finds a sequence of linear combinations of the variables that have maximal variance, and are mutually uncorrelated

It is an unsupervised learning method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

First principal component

A

Maximises variance among the data points

Important feature in PCA because:
- It is a linear combination of the original variables.
- It represents the axis along which the data is most spread out.
- It helps reduce dimensionality while preserving the most critical information.

17
Q

Second principal component

A

Next most important direction in data after the first principal component

Second highest variance in data

It is orthogonal (perpendicular) to PC1, ensuring it provides new information not already explained by PC1

It helps in better understanding the structure of high-dimensional data

18
Q

Total variance

A

The sum of the variances of all original features in the dataset. It represents the total amount of information (spread) present in the data before transformation.

PCA redistributes this variance among the principal components (PCs):
- The first principal component (PC1) captures the highest variance
- The second principal component (PC2) captures the next highest variance, and so on
- The sum of variances of all principal components equals the total variance of the original data (assuming no dimensionality reduction)

This concept is useful for deciding how many principal components to keep—by retaining those that explain most of the total variance

19
Q

To choose k (the number of principal components)

A

Select k components that retain the largest proportion of variance in the data

Use the Proportion of Variance Explained (PVE) to measure how much variance each component captures

Examine the scree plot (variance explained vs. number of components) and look for the elbow point, where adding more components gives minimal additional variance

20
Q

Delectable details

A

PCA summarises linear relationships, and might not see other interesting dependencies.
Projection pursuit is a generalisation that can find other interesting patterns

Outliers can affect results, because direction of outliers will appear to have larger variance

Scaling of variables matters, and typically you would first standardize each variable to have mean 0 and variance 1

21
Q

Tour

A

Explores high-dimensional data by projecting it into lower-dimensional space and animating transitions between these projections.

Helps us understand structure and relationships with high-dimensional data

Projection matrix dimension stays the same, but the values within the projection matrix change over time, creating an animation effect

22
Q

Eigenvalues

A

Special numbers associated with a square matrix that tell us how the matrix stretches or shrinks vectors