Week 3 Flashcards
Describe several properties of Vector Similarity.
Describe several properties of Vector Similarity.
- if more non-empty dimensions in common, the more similar.
- If values are closer in each shared dimension, the greater the similarity.
- Must handle real values.
- Normalization is often required.
- Vector dimensions must be aligned to calculate similarity.
How is the similarity dot product defined?
How is the similarity dot product defined? X = [x1, x2,…, xp], Y = [y1, y2, …, yp] sim(X, Y) = [i=1 to p] sum( xi * yi)
How is Manhattan Distance defined?
How is Manhattan Distance defined? d(X, Y) = [i=1 to p] sum |xi - yi|
How is Euclidean Distance defined?
How is Euclidean Distance defined? d(X, Y) = sqrt( [i=1 to p] sum (xi - yi)^2 )
What are the advantages of Manhattan or Euclidean distance measures?
What are the advantages of Manhattan or Euclidean distance measures? 1. Symmetry: d(X, Y) = d(Y, X) 2. Non-Negative 3. d(X, X) = 0 4. Triangle inequality: d(X, Y) <= d(X, Z) + d(Z, Y)
What are some disadvantages of Manhattan and Euclidean distance measures?
What are some disadvantages of Manhattan and Euclidean distance measures? 1. Vectors may not be normalized. 2. Vectors with larger values likely to yield higher values. 3. These are the same issues with Dot Products.
What are the properties of Cosine Similarity?
What are the properties of Cosine Similarity?
- Cos-Sim measures the angle between two vectors, not the magnitude.
- The less the angle the greater the similarity.
- Cos-Sim is the Normalized dot product.
What is the equation for Cosine Similarity?
What is the equation for Cosine Similarity?
cos(X, Y) = Eip xi * yi / ( Eip x2 * Ein y2 )
What are the advantages of Cosine Similarity?
What are the advantages of Cosine Similarity?
- Symmetric
- Bounded in [-1, 1]
- Non-negative vectors result in cos >= 0.
- Orthogonal vectors result in cos = 0; this usually indicates no relation.
What are the benefits of the Pearson Correlation Coefficient?
What are the benefits of the Pearson Correlation Coefficient?
- Linear correlation between two vectors
- Deducting mean (centering on the mean) from the vector, then cosine
- Similarity to the Cosine Similarity
- Mean of a vector indicates the trend
- Useful if dimensions are homogeneous (e.g. rating of some product)
- Not meaningful if dimensions are heterogeneous (e.g. height vs weight)