Vector Search Flashcards

Question

What does NSW stand for?

Answer 1

Navigable Small World

Answer 2

It’s one where each node is connected to each of its nearest neighbors.

Answer 3

It’s “Small World” and it comes from a Facebook study where they found that you can get between any two people in the world in an average of 3.57 steps.

Answer 4

“Hierarchical” – it means you break the graph up into different layers of granularity and go from one to the next.

Answer 5

Pros are that it is both high-quality and pretty fast. Cons are it eats up tons of memory.

Answer 6

Pros are it excels at all three of quality, memory, and speed. Not sure it has cons, other than still being approximate relative to the KNN lookup of a flat index.

Answer 7

It’s a measure of the similarity of two sets – the cardinality of their intersection divided by the cardinality of their union.

Answer 8

It’s when you slide a window of size k along a string to create tokens of size k. Example: Andrew w/ k=3 is And, ndr, dre, rew.

Answer 9

It’s a step in LSH, it’s how you create a representation for sparse vectors where they’ll have a high Jaccard score if they’re similar.

Answer 10

It’s when you create a bunch of random hyperplanes that divide the space between 1s and 0s, and then the representation of the vector is the 1/0 from each hyperplane, which you find using the dot product.

Answer 11

Let the normal vector be the vector orthogonal to the hyperplane. Then take the dot product between it and your vector. Dot product is positive when angle <90, and negative when angle >90. Since angle=90 is the hyperplane, this tells you if the angle puts the vector on one side or the other of the hyperplane.

Answer 12

It’s the number of mismatches between two vectors, e.g. 1110 and 0110 have one mismatch.

Answer 13

It’s a generic term referring to compressing data into small space?

Answer 14

Dimensionality reduction is trying to get fewer dimensions. Quantization is trying to reduce the scope of each dimension, like from 32 bits required to describe a particular dimension down to 4.

Answer 15

It’s IVF (Inverted File, the Voronoid cells indexing strategy) but with PQ first, so that you’re doing IVF just on the PQ centroids.

Answer 16

It’s the number of true positives divided by all the real positives. So it’s the % of real positives that you were able to identify as positive. In the context of vector search this means the % of the K-nearest-neighbors that you were able to correctly identify with your ANN approach.

Answer 17

Pros are that it takes like 97% less memory and makes ANN 5-10x faster. Cons are it lowers your recall rate.

Answer 18

Optimized Product Quantization

Answer 19

It’s where you transform the vectors to maximize the variance in each subvector space, before running PQ.

Answer 20

The effectiveness of PQ depends on how the subvectors are broken up.

Answer 21

OPQ dimensionality reduction to R128, followed by PQ with 32 subvectors.

Answer 22

It’s a linked list with multiple layers, like maybe the first layer is just 1 -> 5 -> 10. You first jump through that layer to get relatively close to where you’re going, and then search just the segment you’ve found.

Answer 23

The “hierarchical” aspect is kinda like a skip list, where the top level is just a few nodes that get you to the right neighborhood, and then you search with more and more granularity.

Answer 24

It’s when you combine a bunch of different components, like legos, to build a sophisticated index.

Answer 25

Vector transform (preprocessing vectors before indexing), Coarse quantizer (organizing vectors into subdomains to reduce search scope), Fine quantizer (compressing vectors into smaller domains to save space), Refinement (ranking results at search time).

Answer 26

PCA and OPQ

Answer 27

IVF, HNSW, Flat

Answer 28

It’s IVF (Inverted File) with ADC (Asymmetric Distance Computation)

Answer 29

You are doing IVF followed by PQ – but the PQ comes after the IVF.

Answer 30

ADC means Asymmetric Distance Computation, and it’s Asymmetric because at search time you don’t quantize the query vector, you find its nearest neighbors based on its full representation.

Answer 31

Inverted Multi-Index

Answer 32

It’s IVF, where you create Voronoi cells, but first you split the vector into subvectors, and then do IVF for each subvector space.

Answer 33

IMI creates subvector spaces with their own Voronoi diagrams, so you pull the candidate nearest neighbors for each subvector.

Answer 34

They’re the same thing. Multi-D-ADC (ADC means Asymmetric Distance Computation) is the name because the Voronoi cells are split across many dimensions.

Answer 35

You first do IVF to get centroids of Voronoi cells. Then you make an HNSW graph with those.

Answer 36

Retrieval-Augmented Generation

Answer 37

It’s when the LLM doesn’t know an answer, so it dreams up a convincing-sounding lie.

Answer 38

It’s when you fetch new information from an external database, and make it available to the LLM.

Answer 39

Up-to-date information, and context-specific data.

Answer 40

Fine-tuning (adding context-specific training examples and retraining the model on them) is very expensive and difficult – you would have to fully retrain the model every time there’s important new information, which could be very frequent.

Answer 41

Convert proprietary knowledge into embedded vectors and add them to the vector database and its index.

Answer 42

Self-Reflective RAG

Answer 43

It’s creating “reflection tokens” for a particular query to decide if it needs to get new information, and then “critique tokens” to grade the relevance and quality of that new information.

Answer 44

Corrective RAG

Answer 45

It has an evaluator that it uses to grade the quality of documents obtained, and if they’re insufficient, use a web search to get more.

Answer 46

Derive multiple new queries from the input query, and do document retrieval for all of them.

Answer 47

It’s to increase the factuality of the responses.

Answer 48

It’s to get more comprehensive answers.

Answer 49

It’s an LLM with built-in logic that lets it reason about the answers the main LLM is giving, and decide if it needs more information.

Answer 50

It’s a class of user intent, like a certain type of question. For instance “questions about LLMs”.

Answer 51

It’s where you detect certain types of questions, or queries, and deterministically trigger certain actions.

Answer 52

RAG with guardrails. The canonical forms are the classes of queries that you’re comparing the query to, to see if it matches with some sort of guardrailed scenario.

Answer 53

A vector index is just for organizing and storing vector embeddings. The database lets you manage those vector embeddings, like adding metadata and doing real-time updates.

Answer 54

Do you filter before vector search, or after.

Answer 55

Technology, Dev Experience, and Enterprise-Readiness

Answer 56

Performance, Relevance, Scalability, Cost-efficiency

Answer 57

The straight-line distance between two points.

Answer 58

It is sensitive to magnitudes – if you don’t care about the specific numbers, it’s not what you want.

Answer 59

Cosine similarity totally ignores the magnitude of the vectors.

Answer 60

It’s a number from -1 to 1, comparing two vectors. 1 if they are the same direction, 0 if they are orthogonal, -1 if they are opposite directions.

Answer 61

It’s how you break text up into segments.

Answer 62

You want each chunk to make sense on its own, without any surrounding context.

Answer 63

It’s when you break the text up into segments where each segment has the exact same size.

Answer 64

It’s where you break the text up based on some semantic consideration, like one chunk per sentence.

Answer 65

It’s the knowledge that is learned during training.

Answer 66

It’s the knowledge that you would pull in when you are doing RAG.

Answer 67

Research shows that the more documents you stuff, the more performance degrades.

Answer 68

It’s where you send the entire conversation up to this point, rather than the most recent query.

Answer 69

You use an LLM to summarize the conversation up to this point, and just add that to the context.

Answer 70

It’s for an agent, which does Reason and Act. It’s going to read what it gets from your LLM, reason about it, and decide what further actions to take.

Answer 71

Embedding-based retrieval?

Answer 72

Use embeddings to represent both the queries and the documents, and then do NN search in the embedding space.

Answer 73

You want to do term-based matching in addition to semantic matching.

Answer 74

Retrieval (getting a set of relevant documents) and Ranking.

Answer 75

The recall layer

Answer 76

It’s where you do retrieval.

Answer 77

The precision layer

Answer 78

It’s where you do ranking.

Answer 79

It’s a model where you add the user information and social context to the text.

Answer 80

It’s where you have some anchor point, a positive example, and a negative examples, and the loss wants the negative example to be at least m, for some constant m, further away from the anchor than the positive example (which should be much closer). If that’s not the case, the loss function measures how much closer the negative example is.

Answer 81

Triplet loss, with cosine similarity as the distance metric.

Answer 82

It was just the results that people clicked on.

Answer 83

Trying to get good negative examples.

Answer 84

It’s the Facebook retrieval engine – seems kinda like SQL or SPARQL.

Answer 85

It’s when you do both term matching and embedding matching.

Vector Search Flashcards

(111 cards)