Distance & Scaling Flashcards

Question 1

Q

Euclidean Measure

Answer

A

square root of (x2-x1)^2 + (y2-y1)^2

Question 2

Q

Manhattan Distance

Answer

A

Sum of the absolute differences between the coordinates of two points

Question 3

Q

Minkowski Distance is

Answer

A

A generalization

Question 4

Q

Minkowski Distance r = 1

Answer

A

Manhattan

Question 5

Q

Minkowski Distance r = 2

Answer

A

Euclidean

Question 6

Q

Minkowski Distance r = inf

Answer

A

Supremum Distance

Question 7

Q

As r is higher in minkowski distance

Answer

A

Gives more weight to larger differences, emphasizing outliers

Question 8

Q

Mahalanobis Distance

Answer

A

Uses variance to plot how close two points are

Question 9

Q

Standardization

Answer

A

Transforms data with mean=0 and std. dev = 1 (z-score_

Question 10

Q

Normalization

Answer

A

Scales a variable to have value between 0 and 1

Question 11

Q

When do you want to standardize distances

Answer

A

If the scales differ significantly

Question 12

Q

One Hot Encoding

Answer

A

Technique to convert categorial data into a binary matrix (numerical representation)

Question 13

Q

What happens if we dont scale

Answer

A

Some algorithms will be slow in converging
Features with high magnitudes will dominate the distance calculations

Question 14

Q

In one hot encoding, each category is represented as

Answer

A

a binary vector with a single 1 and all other positions as 0

Question 15

Q

When we scale, do we lose geometric representation of each point with respect to its neighbors?

Answer

A

No, the points stay exactly the same

Question 16

Q

Should we scale with decision trees?

Answer

Study These Flashcards

A

No, don’t need to scale and we lose the original value of the predictor which would make decision less explainable

Question 17

Q

Scale then split or split then scale?

Answer

Study These Flashcards

A

Split then scale. So that you don’t leak information from the training set into the test set

Question 18

Q

Discretization

Answer

Study These Flashcards

A

Sort numbers, create split points, map split values to discrete categorical variables

Question 19

Q

Standardization is essentially

Answer

Study These Flashcards

A

Feature Scaling

Question 20

Q

Standardization formula

Answer

Study These Flashcards

A

replace values with z score, mean of 0 and std dev of 1

Question 21

Q

Mean Normalization

Answer

Study These Flashcards

A

Redistributes with range [-1, 1] mean =0

Question 22

Q

Min-max scaling

Answer

Study These Flashcards

A

Redistributes with range [0,1]

Distance & Scaling Flashcards

(22 cards)