Distance & Scaling Flashcards
Euclidean Measure
square root of (x2-x1)^2 + (y2-y1)^2
Manhattan Distance
Sum of the absolute differences between the coordinates of two points
Minkowski Distance is
A generalization
Minkowski Distance r = 1
Manhattan
Minkowski Distance r = 2
Euclidean
Minkowski Distance r = inf
Supremum Distance
As r is higher in minkowski distance
Gives more weight to larger differences, emphasizing outliers
Mahalanobis Distance
Uses variance to plot how close two points are
Standardization
Transforms data with mean=0 and std. dev = 1 (z-score_
Normalization
Scales a variable to have value between 0 and 1
When do you want to standardize distances
If the scales differ significantly
One Hot Encoding
Technique to convert categorial data into a binary matrix (numerical representation)
What happens if we dont scale
Some algorithms will be slow in converging
Features with high magnitudes will dominate the distance calculations
In one hot encoding, each category is represented as
a binary vector with a single 1 and all other positions as 0
When we scale, do we lose geometric representation of each point with respect to its neighbors?
No, the points stay exactly the same