vDBMS Flashcards by Luciano Imbimbo

What are Vector DBMS specialized for?

Handling high-dimensional vector data.

How well did you know this?

Not at all

Perfectly

Why are embeddings important in Vector DBMS?

They mathematically represent objects or concepts in a high-dimensional vector space.

How well did you know this?

Not at all

Perfectly

What types of machine learning data can embeddings represent?

Text, images, audio, and graphs.

How well did you know this?

Not at all

Perfectly

How do Vector DBMS find related items?

By measuring distances between vectors (similarity scores).

How well did you know this?

Not at all

Perfectly

What is the function used to define similarity in Vector DBMS?

f : R^D x R^D -> R, which outputs a similarity score.

How well did you know this?

Not at all

Perfectly

Name the four mathematical properties of a metric.

Identity, positivity, symmetry, and triangle inequality.

How well did you know this?

Not at all

Perfectly

Why is the dot product used despite not being a formal metric?

It is simple, computationally efficient, and differentiable, and it works with normalized vectors.

How well did you know this?

Not at all

Perfectly

What is the ‘curse of dimensionality’?

The phenomenon where data points tend to be far apart in high-dimensional spaces.

How well did you know this?

Not at all

Perfectly

What does query alignment mean in Vector DBMS?

Aligning queries with documents, especially when the query and document don’t share the same words.

How well did you know this?

Not at all

Perfectly

How do KD-trees partition vector space?

By dividing the space into regions to allow efficient searching for nearby vectors.

How well did you know this?

Not at all

Perfectly

What are the drawbacks of KD-trees in high-dimensional data?

Inefficiency, imbalance due to data drift, and low recall near leaf borders.

How well did you know this?

Not at all

Perfectly

How does Locality-Sensitive Hashing (LSH) work?

By mapping similar vectors to similar hash buckets for approximate nearest neighbor searches.

How well did you know this?

Not at all

Perfectly

What is the trade-off of LSH?

Ensuring precision requires a high storage cost.

How well did you know this?

Not at all

Perfectly

How does quantization compress vectors?

By assigning them to the nearest centroids in a predefined codebook (e.g., k-means).

How well did you know this?

Not at all

Perfectly

What is a drawback of quantization in vector databases?

Susceptibility to data drift and lack of error guarantees.

How well did you know this?

Not at all

Perfectly

What are k-Nearest Neighbor graphs (kNNG)?

Study These Flashcards

Graphs that directly index nearest neighbors for efficient search.

What is DiskANN?

Study These Flashcards

A hybrid solution combining Vamana graph algorithm and Product Quantization.

What is SPANN?

Study These Flashcards

A hybrid solution using a hierarchy of Product Quantization to reduce in-memory cost.

What is the advantage of hybrid search techniques?

Study These Flashcards

They combine similarity search with exact matching to support complex queries.

How does pre-filtering work in hybrid search?

Study These Flashcards

By using exact match predicates to reduce the candidate set before similarity search.

How does post-filtering work in hybrid search?

Study These Flashcards

By performing similarity search first, then filtering results based on exact match predicates.

What are native VDBMS systems?

Study These Flashcards

Systems specifically designed for vector data, like Milvus and Pinecone.

What are extended VDBMS systems?

Study These Flashcards

Extensions of existing NoSQL or relational databases with vector search, like Elasticsearch.

What is a key feature of Vespa.ai in hybrid search?

Study These Flashcards

It combines in-memory HNSW with centroids for efficiency.

Name three types of queries supported by Vector DBMS.

Search queries (NN), approximate search queries (ANN), and hybrid queries.

vDBMS Flashcards

(25 cards)