Distance & Scaling Flashcards

1
Q

Euclidean Measure

A

square root of (x2-x1)^2 + (y2-y1)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Manhattan Distance

A

Sum of the absolute differences between the coordinates of two points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Minkowski Distance is

A

A generalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Minkowski Distance r = 1

A

Manhattan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Minkowski Distance r = 2

A

Euclidean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Minkowski Distance r = inf

A

Supremum Distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

As r is higher in minkowski distance

A

Gives more weight to larger differences, emphasizing outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mahalanobis Distance

A

Uses variance to plot how close two points are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standardization

A

Transforms data with mean=0 and std. dev = 1 (z-score_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normalization

A

Scales a variable to have value between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When do you want to standardize distances

A

If the scales differ significantly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

One Hot Encoding

A

Technique to convert categorial data into a binary matrix (numerical representation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens if we dont scale

A

Some algorithms will be slow in converging
Features with high magnitudes will dominate the distance calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In one hot encoding, each category is represented as

A

a binary vector with a single 1 and all other positions as 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When we scale, do we lose geometric representation of each point with respect to its neighbors?

A

No, the points stay exactly the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Should we scale with decision trees?

A

No, don’t need to scale and we lose the original value of the predictor which would make decision less explainable

17
Q

Scale then split or split then scale?

A

Split then scale. So that you don’t leak information from the training set into the test set

18
Q

Discretization

A

Sort numbers, create split points, map split values to discrete categorical variables

19
Q

Standardization is essentially

A

Feature Scaling

20
Q

Standardization formula

A

replace values with z score, mean of 0 and std dev of 1

21
Q

Mean Normalization

A

Redistributes with range [-1, 1] mean =0

22
Q

Min-max scaling

A

Redistributes with range [0,1]