vDBMS Flashcards
What are Vector DBMS specialized for?
Handling high-dimensional vector data.
Why are embeddings important in Vector DBMS?
They mathematically represent objects or concepts in a high-dimensional vector space.
What types of machine learning data can embeddings represent?
Text, images, audio, and graphs.
How do Vector DBMS find related items?
By measuring distances between vectors (similarity scores).
What is the function used to define similarity in Vector DBMS?
f : R^D x R^D -> R, which outputs a similarity score.
Name the four mathematical properties of a metric.
Identity, positivity, symmetry, and triangle inequality.
Why is the dot product used despite not being a formal metric?
It is simple, computationally efficient, and differentiable, and it works with normalized vectors.
What is the ‘curse of dimensionality’?
The phenomenon where data points tend to be far apart in high-dimensional spaces.
What does query alignment mean in Vector DBMS?
Aligning queries with documents, especially when the query and document don’t share the same words.
How do KD-trees partition vector space?
By dividing the space into regions to allow efficient searching for nearby vectors.
What are the drawbacks of KD-trees in high-dimensional data?
Inefficiency, imbalance due to data drift, and low recall near leaf borders.
How does Locality-Sensitive Hashing (LSH) work?
By mapping similar vectors to similar hash buckets for approximate nearest neighbor searches.
What is the trade-off of LSH?
Ensuring precision requires a high storage cost.
How does quantization compress vectors?
By assigning them to the nearest centroids in a predefined codebook (e.g., k-means).
What is a drawback of quantization in vector databases?
Susceptibility to data drift and lack of error guarantees.