vDBMS Flashcards

1
Q

What are Vector DBMS specialized for?

A

Handling high-dimensional vector data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are embeddings important in Vector DBMS?

A

They mathematically represent objects or concepts in a high-dimensional vector space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What types of machine learning data can embeddings represent?

A

Text, images, audio, and graphs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do Vector DBMS find related items?

A

By measuring distances between vectors (similarity scores).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the function used to define similarity in Vector DBMS?

A

f : R^D x R^D -> R, which outputs a similarity score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name the four mathematical properties of a metric.

A

Identity, positivity, symmetry, and triangle inequality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is the dot product used despite not being a formal metric?

A

It is simple, computationally efficient, and differentiable, and it works with normalized vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the ‘curse of dimensionality’?

A

The phenomenon where data points tend to be far apart in high-dimensional spaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does query alignment mean in Vector DBMS?

A

Aligning queries with documents, especially when the query and document don’t share the same words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do KD-trees partition vector space?

A

By dividing the space into regions to allow efficient searching for nearby vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the drawbacks of KD-trees in high-dimensional data?

A

Inefficiency, imbalance due to data drift, and low recall near leaf borders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does Locality-Sensitive Hashing (LSH) work?

A

By mapping similar vectors to similar hash buckets for approximate nearest neighbor searches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the trade-off of LSH?

A

Ensuring precision requires a high storage cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does quantization compress vectors?

A

By assigning them to the nearest centroids in a predefined codebook (e.g., k-means).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a drawback of quantization in vector databases?

A

Susceptibility to data drift and lack of error guarantees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are k-Nearest Neighbor graphs (kNNG)?

A

Graphs that directly index nearest neighbors for efficient search.

17
Q

What is DiskANN?

A

A hybrid solution combining Vamana graph algorithm and Product Quantization.

18
Q

What is SPANN?

A

A hybrid solution using a hierarchy of Product Quantization to reduce in-memory cost.

19
Q

What is the advantage of hybrid search techniques?

A

They combine similarity search with exact matching to support complex queries.

20
Q

How does pre-filtering work in hybrid search?

A

By using exact match predicates to reduce the candidate set before similarity search.

21
Q

How does post-filtering work in hybrid search?

A

By performing similarity search first, then filtering results based on exact match predicates.

22
Q

What are native VDBMS systems?

A

Systems specifically designed for vector data, like Milvus and Pinecone.

23
Q

What are extended VDBMS systems?

A

Extensions of existing NoSQL or relational databases with vector search, like Elasticsearch.

24
Q

What is a key feature of Vespa.ai in hybrid search?

A

It combines in-memory HNSW with centroids for efficiency.

25
Q

Name three types of queries supported by Vector DBMS.

A

Search queries (NN), approximate search queries (ANN), and hybrid queries.