Vector Search Flashcards

1
Q

What does FAISS stand for?

A

Facebook AI Similarity Search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the vector search problem

A

We have a set of vectors, and for some query vector, we want to find the vectors that are most similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Voronoi diagram?

A

It’s a division of the plane into cells, where each cell is all the space closer to one particular point than any others. It’s that point’s “territory.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Conceptually, if some query point falls in a Voronoi cell, what does that mean?

A

It means you know which point is closest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does “probing” mean?

A

It means searching nearby cells, not just the one you landed on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does IVF stand for?

A

InVerted File

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the IVF strategy?

A

It’s a strategy for vector search. You pick some set of key points, make a Voronoi diagram of them, and then find the Voronoi cell for your query vector and search all the points in that cell – plus maybe probe some nearby ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does PQ stand for?

A

Product Quantization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the point of PQ?

A

It’s to make the vectors themselves smaller, so that L2 distance is easier to calculate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you do Product Quantization?

A

First, chunk the vectors up into subvectors. Then, for a particular set of subvectors, do clustering, so you get centroids. Finally, replace each subvector with the cluster id.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PQ maps from what space size to what space size?

A

It maps from the original vector space, like R 100, to a space of the number of subvectors, like R4 if you’re breaking up into 4 subvectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are codewords?

A

They’re the centroids from Product Quantization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a codebook?

A

The set of all centroids for a particular subvector subspace in Product Quantization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Indexing?

A

It’s the strategy you use to store your vectors in your vector space – you put them in an index. And then you need to be able to search within that index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a “flat index”?

A

It’s one where you don’t modify the vectors at all and just store them directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How would you lookup in a flat index?

A

You would literally just do KNN with the entire index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does KNN stand for?

A

K Nearest Neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you do KNN?

A

Calculate distance between the query vector and every single other vector. Then return the top K closest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the pros and cons of a flat index?

A

Pros: Search quality is maximized since you are literally just doing KNN. Cons are that this takes forever.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does ANN stand for?

A

Approximate Nearest Neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does LSH stand for?

A

Locality Sensitivity Hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you do LSH?

A

Write some hash function that puts similar vectors in the same hash bucket. Then at lookup time, hash the query vector to get its group of similar vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does it mean “maximize collisions” in the context of LSH?

A

It means you want the hash function to put similar vectors in the same bucket (collide them).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the pros and cons of LSH?

A

Pros: It’s a middle ground on everything – decent performance but also decent speed and storage. Cons are that it’s not amazing at any one thing, and also susceptible to the curse of dimensionality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does NSW stand for?

A

Navigable Small World

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is an NSW graph?

A

It’s one where each node is connected to each of its nearest neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Where does the SW in NSW come from?

A

It’s “Small World” and it comes from a Facebook study where they found that you can get between any two people in the world in an average of 3.57 steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does the H in HNSW mean?

A

“Hierarchical” – it means you break the graph up into different layers of granularity and go from one to the next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are the pros and cons of HNSW?

A

Pros are that it is both high-quality and pretty fast. Cons are it eats up tons of memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the pros and cons of IVF?

A

Pros are it excels at all three of quality, memory, and speed. Not sure it has cons, other than still being approximate relative to the KNN lookup of a flat index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is Jaccard Similarity?

A

It’s a measure of the similarity of two sets – the cardinality of their intersection divided by the cardinality of their union.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is k-shingling? Give an example.

A

It’s when you slide a window of size k along a string to create tokens of size k. Example: Andrew w/ k=3 is And, ndr, dre, rew.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is MinHashing related to?

A

It’s a step in LSH, it’s how you create a representation for sparse vectors where they’ll have a high Jaccard score if they’re similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is random projection?

A

It’s when you create a bunch of random hyperplanes that divide the space between 1s and 0s, and then the representation of the vector is the 1/0 from each hyperplane, which you find using the dot product.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How can you use the dot product to find which side of a hyperplane a vector is on?

A

Let the normal vector be the vector orthogonal to the hyperplane. Then take the dot product between it and your vector. Dot product is positive when angle <90, and negative when angle >90. Since angle=90 is the hyperplane, this tells you if the angle puts the vector on one side or the other of the hyperplane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the Hamming distance?

A

It’s the number of mismatches between two vectors, e.g. 1110 and 0110 have one mismatch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is Quantization?

A

It’s a generic term referring to compressing data into small space?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the difference between dimensionality reduction and quantization?

A

Dimensionality reduction is trying to get fewer dimensions. Quantization is trying to reduce the scope of each dimension, like from 32 bits required to describe a particular dimension down to 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is IVFPQ?

A

It’s IVF (Inverted File, the Voronoid cells indexing strategy) but with PQ first, so that you’re doing IVF just on the PQ centroids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is recall? In the context of vector search?

A

It’s the number of true positives divided by all the real positives. So it’s the % of real positives that you were able to identify as positive. In the context of vector search this means the % of the K-nearest-neighbors that you were able to correctly identify with your ANN approach.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the pros and cons of PQ?

A

Pros are that it takes like 97% less memory and makes ANN 5-10x faster. Cons are it lowers your recall rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

In IVFPQ what do you do first, the IVF or the PQ?

A

The PQ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What does OPQ stand for?

A

Optimized Product Quantization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are you doing in OPQ?

A

It’s where you transform the vectors to maximize the variance in each subvector space, before running PQ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What problem does OPQ try to solve?

A

The effectiveness of PQ depends on how the subvectors are broken up.

46
Q

If I write OPQ32_128, what do those numbers mean?

A

OPQ dimensionality reduction to R128, followed by PQ with 32 subvectors.

47
Q

What is a probability skip list?

A

It’s a linked list with multiple layers, like maybe the first layer is just 1 -> 5 -> 10. You first jump through that layer to get relatively close to where you’re going, and then search just the segment you’ve found.

48
Q

How are probability skip lists related to HNSW?

A

The “hierarchical” aspect is kinda like a skip list, where the top level is just a few nodes that get you to the right neighborhood, and then you search with more and more granularity.

49
Q

What is a composite index?

A

It’s when you combine a bunch of different components, like legos, to build a sophisticated index.

50
Q

What are the four types of components you can combine into a composite index, and what are they?

A

Vector transform (preprocessing vectors before indexing), Coarse quantizer (organizing vectors into subdomains to reduce search scope), Fine quantizer (compressing vectors into smaller domains to save space), Refinement (ranking results at search time).

51
Q

Give two examples of vector transform

A

PCA and OPQ

52
Q

Give three examples of coarse quantizer

A

IVF, HNSW, Flat

53
Q

Give two examples of fine quantizer

A

PW, LSH

54
Q

What does IVF-ADC stand for?

A

It’s IVF (Inverted File) with ADC (Asymmetric Distance Computation)

55
Q

What are you doing in IVF-ADC?

A

You are doing IVF followed by PQ – but the PQ comes after the IVF.

56
Q

Why is it called ADC?

A

ADC means Asymmetric Distance Computation, and it’s Asymmetric because at search time you don’t quantize the query vector, you find its nearest neighbors based on its full representation.

57
Q

What does IMI stand for?

A

Inverted Multi-Index

58
Q

What are you doing in IMI?

A

It’s IVF, where you create Voronoi cells, but first you split the vector into subvectors, and then do IVF for each subvector space.

59
Q

What do you do at search time for IMI?

A

IMI creates subvector spaces with their own Voronoi diagrams, so you pull the candidate nearest neighbors for each subvector.

60
Q

What is the relationship between Multi-D-ADC and IMI?

A

They’re the same thing. Multi-D-ADC (ADC means Asymmetric Distance Computation) is the name because the Voronoi cells are split across many dimensions.

61
Q

How does the IVF+HNSW strategy work?

A

You first do IVF to get centroids of Voronoi cells. Then you make an HNSW graph with those.

62
Q

Why does RAG stand for?

A

Retrieval-Augmented Generation

63
Q

What is hallucination?

A

It’s when the LLM doesn’t know an answer, so it dreams up a convincing-sounding lie.

64
Q

What is RAG?

A

It’s when you fetch new information from an external database, and make it available to the LLM.

65
Q

What are two key types of information you’d want to get with RAG?

A

Up-to-date information, and context-specific data.

66
Q

Why use RAG when you can just fine-tune?

A

Fine-tuning (adding context-specific training examples and retraining the model on them) is very expensive and difficult – you would have to fully retrain the model every time there’s important new information, which could be very frequent.

67
Q

What’s the most basic way to do RAG?

A

Convert proprietary knowledge into embedded vectors and add them to the vector database and its index.

68
Q

What does SELF-RAG stand for?

A

Self-Reflective RAG

69
Q

What is SELF-RAG doing?

A

It’s creating “reflection tokens” for a particular query to decide if it needs to get new information, and then “critique tokens” to grade the relevance and quality of that new information.

70
Q

What does CRAG stand for?

A

Corrective RAG

71
Q

What is CRAG doing?

A

It has an evaluator that it uses to grade the quality of documents obtained, and if they’re insufficient, use a web search to get more.

72
Q

What is RAG Fusion doing?

A

Derive multiple new queries from the input query, and do document retrieval for all of them.

73
Q

What is the purpose of SELF-RAG?

A

It’s to increase the factuality of the responses.

74
Q

What is the purpose of RAG Fusion?

A

It’s to get more comprehensive answers.

75
Q

What is an agent?

A

It’s an LLM with built-in logic that lets it reason about the answers the main LLM is giving, and decide if it needs more information.

76
Q

What is a “canonical form”?

A

It’s a class of user intent, like a certain type of question. For instance “questions about LLMs”.

77
Q

What is RAG with guardrails?

A

It’s where you detect certain types of questions, or queries, and deterministically trigger certain actions.

78
Q

In what context are canonical forms used?

A

RAG with guardrails. The canonical forms are the classes of queries that you’re comparing the query to, to see if it matches with some sort of guardrailed scenario.

79
Q

What is the difference between a vector database and a vector index?

A

A vector index is just for organizing and storing vector embeddings. The database lets you manage those vector embeddings, like adding metadata and doing real-time updates.

80
Q

What is the main question you have to answer with filtering?

A

Do you filter before vector search, or after.

81
Q

What are three categories of evaluation for a vector database?

A

Technology, Dev Experience, and Enterprise-Readiness

82
Q

What are four factors to consider when evaluating the technology of a vector db?

A

Performance, Relevance, Scalability, Cost-efficiency

83
Q

What is the Euclidean distance?

A

The straight-line distance between two points.

84
Q

What is the problem with Euclidean distance (sometimes)?

A

It is sensitive to magnitudes – if you don’t care about the specific numbers, it’s not what you want.

85
Q

What is the difference between cosine similarity and dot product?

A

Cosine similarity totally ignores the magnitude of the vectors.

86
Q

What is the output of cosine similarity?

A

It’s a number from -1 to 1, comparing two vectors. 1 if they are the same direction, 0 if they are orthogonal, -1 if they are opposite directions.

87
Q

What is chunking?

A

It’s how you break text up into segments.

88
Q

What do you want when you are doing chunking?

A

You want each chunk to make sense on its own, without any surrounding context.

89
Q

What is fixed-size chunking?

A

It’s when you break the text up into segments where each segment has the exact same size.

90
Q

What is context-aware chunking?

A

It’s where you break the text up based on some semantic consideration, like one chunk per sentence.

91
Q

What is “parametric knowledge”?

A

It’s the knowledge that is learned during training.

92
Q

What is “source knowledge”?

A

It’s the knowledge that you would pull in when you are doing RAG.

93
Q

Why don’t you want to stuff the context window when doing RAG?

A

Research shows that the more documents you stuff, the more performance degrades.

94
Q

What is conversational memory?

A

It’s where you send the entire conversation up to this point, rather than the most recent query.

95
Q

How do you avoid context stuffing when doing conversational memory?

A

You use an LLM to summarize the conversation up to this point, and just add that to the context.

96
Q

What is the ReAct framework?

A

It’s for an agent, which does Reason and Act. It’s going to read what it gets from your LLM, reason about it, and decide what further actions to take.

97
Q

What does EBR stand for?

A

Embedding-based retrieval?

98
Q

What are you doing in EBR?

A

Use embeddings to represent both the queries and the documents, and then do NN search in the embedding space.

99
Q

What is the main challenge of using EBR for search, other than the scale?

A

You want to do term-based matching in addition to semantic matching.

100
Q

What are the two layers of search engines?

A

Retrieval (getting a set of relevant documents) and Ranking.

101
Q

What is the layer called where you do retrieval, in a search engine?

A

The recall layer

102
Q

What happens in the recall layer, in a search engine?

A

It’s where you do retrieval.

103
Q

What is the layer called where you do ranking, in a search engine?

A

The precision layer

104
Q

What happens in the precision layer, in a search engine?

A

It’s where you do ranking.

105
Q

What is the Facebook unified embedding?

A

It’s a model where you add the user information and social context to the text.

106
Q

What is triplet loss?

A

It’s where you have some anchor point, a positive example, and a negative examples, and the loss wants the negative example to be at least m, for some constant m, further away from the anchor than the positive example (which should be much closer). If that’s not the case, the loss function measures how much closer the negative example is.

107
Q

What is the loss function for the Facebook EBR model?

A

Triplet loss, with cosine similarity as the distance metric.

108
Q

How did Facebook get positive examples for their EBR model?

A

It was just the results that people clicked on.

109
Q

What was the main problem Facebook ran into while training their EBR model?

A

Trying to get good negative examples.

110
Q

What is Unicorn?

A

It’s the Facebook retrieval engine – seems kinda like SQL or SPARQL.

111
Q

What is “hybrid retrieval” in the context of Facebook EBR?

A

It’s when you do both term matching and embedding matching.