Elastic Search Flashcards

Question 1

Q

What is a document

Answer

A

The individual units of data being searched over. It is just a JSON object.

{
  "id": "XYZ123",
  "title": "The Great Gatsby",
  "author": "F. Scott Fitzgerald",
  "price": 10.99,
  "createdAt": "2024-01-01T00:00:00.000Z"
}

Question 2

Q

What are indices

Answer

A

A collection of documents. Each document is associated with a unique ID and a set of fields, which are key-value pairs that contain the data you’re searching over.

Question 3

Q

What are Mappings and fields

Answer

A

Mappings are the schema of the index. Mappings define the fields that the index will have, the data type of each field, and any other properties like how a field is indexed.

An example of a mapping:

{
  "properties": {
    "id": { "type": "keyword" },
    "title": { "type": "text" },
    "author": { "type": "text" },
    "price": { "type": "float" },
    "createdAt": { "type": "date" }
  }
}

Question 4

Q

What is a shard

Answer

A

1:1 with lucene indexes

Question 5

Q

What is a replica

Answer

A

A replica is an exact copy of a shard. Elasticsearch allows one or more copies of a shard

Question 6

Q

What is TF-IDF (Term Frequency-Inverse Document Frequency)

Answer

A

It is a measure of importance of a word in a document. The term frequency is the number of times a term occurs in a document. The inverse document frequency is the number of documents the term occurs in.

A term with a high DF might be considered not important, or common. Conversely, a term with a low DF might be considered more important.

Question 7

Q

How is pagination handled

Answer

A

From/Size Pagination

from: the starting index of results
size: the number of results to return
This can be very inefficient for deep pagination (e.g. beyond 10k results) as the results are sorted on every request

Search After Pagination

search_after: use the sort values of the previous result as a starting point for the next page
Ensures you don’t miss any documents you haven’t yet seen
You must keep state client side and you could miss results that were inserted after your search for previous pages

Cursors

Create a point in time (PIT), use the PIT in your search query, close the PIT
This will ensure data is consistent for your query
Subsequent requests will not have to resort data
This does use more memory

Question 8

Q

What are the different node types

Answer

A

Master Node
Data Node
Coordinating Node
Ingest Node
Machine Learning Node

Question 9

Q

What is a master node

Answer

A

A node responsible for coordinating the cluster
Can add and remove nodes
Can create and remove indices

Question 10

Q

What is a data node

Answer

A

A data node is responsible for storing the data
Large clusters will have many data nodes
Data nodes house indices, which are comprised of shards and their replicas
Shards are composed of lucene indexes

Question 11

Q

What is a coordinating node

Answer

A

A node that is responsible for coordinating search requests across the cluster
It receives the search request from the client, performs query optimization, and sends it to the appropriate nodes

Question 12

Q

What is an ingest node

Answer

A

A node responsible for ingestion of data
The data is transformed and prepared for indexing

Question 13

Q

What is a machine learning node

Answer

A

A node responsible for machine learning tasks

Question 14

Q

What is a lucene index

Answer

A

Lucene indexes are made up of segments
Segments are immutable
CRUD
- Writes are batched to create new segments
- When segments get too numerous, a merge occurs to merge segments
- Deletions are handled by delete identifiers, entries with a delete identifier is skipped when reading and fully removed on the next merge
- Updates are similar to deleted with the record be re-written with the new data

Question 15

Q

What are inverted indexes

Answer

A

A type of index that maps the content to the locations. In the example of a document store, it maps the words to the documents they are in so you have O(1) lookup times

Question 16

Q

What are doc values

Answer

Study These Flashcards

A

A columnar, contiguous representation of a single field for all documents across the segment.

If you wanted to sort by price, the doc value structure can be used after finding the values with the inverted index

Elastic Search Flashcards

(16 cards)