Vector Queries Flashcards

1
Q

Vector Database

A

A vector database is a type of database specifically designed to store, manage, and retrieve data represented as vectors, which are numerical arrays capturing information in multi-dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Key Characteristics of Vector DBs

A

Data (such as text, images, or audio) is encoded into vectors, typically by machine learning models like embeddings from NLP models or feature vectors from image recognition models.

Each vector represents a “point” in multi-dimensional space, and similar data points (e.g., similar sentences, images) are often close to each other in this space.

Vector databases perform similarity searches using distance metrics, such as cosine similarity, Euclidean distance, or dot product.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Approximate Nearest Neighbor (ANN)

A

Vector databases use specialized indexing structures, such as Approximate Nearest Neighbor (ANN) indexing techniques, to efficiently handle high-dimensional data and speed up similarity searches.

Examples of ANN algorithms include HNSW (Hierarchical Navigable Small World) and LSH (Locality-Sensitive Hashing).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Syntactical vs. Semantic Search

A

In a syntactical search, the engine would look for documents containing that exact phrase. If a document doesn’t have the words “apple”, “alcoholic”, and “beverage” in close proximity or in that specific order, it may not be ranked high or even shown in the results. This method is limited because it’s tied strictly to the syntax of the query and can miss out on contextually relevant documents.

In the realm of semantic search, querying for “apple alcoholic beverage” wouldn’t just give you documents containing that exact phrase. It would understand the essence of your query and fetch documents related to “appletini”, “apple brandy”, “apple bourbon”, and more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is Vector Search Crucial for Semantic Search?

A

Words, phrases, or even entire sentences can be represented as vectors in a high-dimensional space. In this vector space, the “distance” between vectors indicates semantic similarity. Words or phrases with similar meanings will have vectors closer to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

dense_vector

A

Elasticsearch’s dense_vector datatype is designed to store vectors of float values. These vectors are often employed in machine learning, especially for embeddings where items are represented as vectors in high-dimensional space.

To store a vector, you can define a mapping like:

{
“properties”: {
“text-vector”: {
“type”: “dense_vector”,
“dims”: 512
}
}
}

Here, dims denotes the number of dimensions in the vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

script_score

A

To perform vector similarity searches, we need to measure how close a given vector is to other vectors in the database. A common method for this is to compute the dot product between vectors. The script_score function in Elasticsearch allows us to compute custom scores for documents based on a script. By employing this functionality, we can compute the dot product between our query vector and the vectors stored in our database.

{
“query”: {
“script_score”: {
“query”: {
“match_all”: {}
},
“script”: {
“source”: “dotProduct(params.queryVector, ‘text-vector’) + 1.0”,
“params”: {
“queryVector”: […]
}
}
}
}
}
Here, params.queryVector is the vector you’re searching with, and ‘text-vector’ refers to the field in which the vectors are stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Imp Link

A

https://www.elastic.co/search-labs/blog/elastic-vector-database-practical-example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly