Basics Flashcards
Document
JSON File Converted into a ES Document
Index
Collection of Documents
Shard
Index is a collection of shards. Shards are distributed over nodes
Node
An instance of elasticsearch. Can have umltiple nodes running on a physical machine
Primary Shard & Replica Shard
Primary Shard is the first shard. Replica Shard is replica of the primary shard
Cluster
Collection of index, developed for specific purpose, like eComm Seaarch, APM. Cross Cluster Searches are possible
Routing Formula and No of Replicas
shard_number = hash(documentid) + number of replicas
Number of replicas of index cannot be changed as it would affect the routing formula
Add documents
POST /index_name/_doc/
{
“field1”: “value1”,
“field2”: “value2”
}
Delete documents
DELETE /index_name/_doc/document_id
Update documents
POST /index_name/_doc/document_id/_update
{
“doc”: {
“field1”: “new_value”
}
}
Create Index
PUT /index_name
{
“settings”: {
“number_of_shards”: 1,
“number_of_replicas”: 1
},
“mappings”: {
“properties”: {
“field1”: { “type”: “text” },
“field2”: { “type”: “keyword” }
}
}
}
DELETE index
DELETE /index_name
Optimistic Concurrency Control
To ensure an older version of a document doesn’t overwrite a newer version, every operation performed to a document is assigned a sequence number by the primary shard that coordinates that change. The sequence number is increased with each operation and thus newer operations are guaranteed to have a higher sequence number than older operations.
As an application developer you need to pass the primary term and sequence number during the update, to make sure that you are not updating an older copy of the document
First do a GET and get the sample fields in the response. Then do a POST to update the contents
PUT products/_doc/1567?if_seq_no=362&if_primary_term=2
{
“product”: “r2d2”,
“details”: “A resourceful astromech droid”,
“tags”: [ “droid” ]
}
Inverted Index
Mapping between keyword to document number
Ex
Document 1 :
Space : The final frontier. These are the voyages..
Document 2 :
He’s bad, he’s the number one. He’s the space cowboy with the laser gun!
space: 1,2
the: 1,2
final: 1
frontier: 1
he: 2
bad: 2
TF-IDF
Term Frequence * Inverse Document Frequency