vDBMS Flashcards
What are Vector DBMS specialized for?
Handling high-dimensional vector data.
Why are embeddings important in Vector DBMS?
They mathematically represent objects or concepts in a high-dimensional vector space.
What types of machine learning data can embeddings represent?
Text, images, audio, and graphs.
How do Vector DBMS find related items?
By measuring distances between vectors (similarity scores).
What is the function used to define similarity in Vector DBMS?
f : R^D x R^D -> R, which outputs a similarity score.
Name the four mathematical properties of a metric.
Identity, positivity, symmetry, and triangle inequality.
Why is the dot product used despite not being a formal metric?
It is simple, computationally efficient, and differentiable, and it works with normalized vectors.
What is the ‘curse of dimensionality’?
The phenomenon where data points tend to be far apart in high-dimensional spaces.
What does query alignment mean in Vector DBMS?
Aligning queries with documents, especially when the query and document don’t share the same words.
How do KD-trees partition vector space?
By dividing the space into regions to allow efficient searching for nearby vectors.
What are the drawbacks of KD-trees in high-dimensional data?
Inefficiency, imbalance due to data drift, and low recall near leaf borders.
How does Locality-Sensitive Hashing (LSH) work?
By mapping similar vectors to similar hash buckets for approximate nearest neighbor searches.
What is the trade-off of LSH?
Ensuring precision requires a high storage cost.
How does quantization compress vectors?
By assigning them to the nearest centroids in a predefined codebook (e.g., k-means).
What is a drawback of quantization in vector databases?
Susceptibility to data drift and lack of error guarantees.
What are k-Nearest Neighbor graphs (kNNG)?
Graphs that directly index nearest neighbors for efficient search.
What is DiskANN?
A hybrid solution combining Vamana graph algorithm and Product Quantization.
What is SPANN?
A hybrid solution using a hierarchy of Product Quantization to reduce in-memory cost.
What is the advantage of hybrid search techniques?
They combine similarity search with exact matching to support complex queries.
How does pre-filtering work in hybrid search?
By using exact match predicates to reduce the candidate set before similarity search.
How does post-filtering work in hybrid search?
By performing similarity search first, then filtering results based on exact match predicates.
What are native VDBMS systems?
Systems specifically designed for vector data, like Milvus and Pinecone.
What are extended VDBMS systems?
Extensions of existing NoSQL or relational databases with vector search, like Elasticsearch.
What is a key feature of Vespa.ai in hybrid search?
It combines in-memory HNSW with centroids for efficiency.
Name three types of queries supported by Vector DBMS.
Search queries (NN), approximate search queries (ANN), and hybrid queries.