ElasticSearch Flashcards
Elasticsearch is an open-source _______ built on top of _________
search engine; Apache Lucene
What is Apache Lucene
Apache Lucene is a fulltext search-engine library
What does REST stand for?
representational state transfer
What is RESTful service?
A RESTful service is one that implements REST pattern.
What is REST?
REST, or REpresentational State Transfer, is an architectural style for providing standards between computer systems on the web, making it easier for systems to communicate with each other.
There are 4 basic HTTP verbs we use in requests to interact with resources in a REST system:
- GET — retrieve a specific resource (by id) or a collection of resources
- POST — create a new resource
- PUT — update a specific resource (by id)
- DELETE — remove a specific resource by id
- For ES - HEAD
What is ElasticSearch
- Enables full-text search
- A distributed real-time document store where every field is indexed and searchable
- A distributed search engine with real-time analytics
- Capable of scaling to hundreds of servers and petabytes of structured and
unstructured data
What is a node?
A node is a running instance of Elasticsearch.
What is a cluster
A cluster is a group
of nodes with the same cluster.name that are working together
to share data and to provide failover and scale, although a single
node can form a cluster all by itself.
All other languages can communicate with Elasticsearch over port _______ using a
_______, accessible with your favorite web client.
9200 ; RESTful API
A request to Elasticsearch consists of the same parts as any HTTP request:
curl -X ‘:///?’ -d ‘’
What is a protocol
Either http or https
What is HOST
The hostname of any node in your Elasticsearch cluster, or localhost for a node on your local machine.
What is PORT
The port running the Elasticsearch HTTP service, which defaults to 9200.
What is QUERY_STRING
Any optional query-string parameters (for example ?pretty will pretty-print the
JSON response to make it easier to read.)
What is BODY
A JSON-encoded request body (if the request needs one.)
For instance, to count the number of documents in the cluster, we could use
curl -XGET 'http://localhost:9200/_count?pretty' -d ' { "query": { "match_all": {} } }
shorthand format:
GET /_count { "query": { "match_all": {} } }
Elasticsearch is _______- oriented, meaning that it stores entire ___________
document; objects or documents.
How does elastic search makes a document searchable?
It indexes the contents of the documents to make it searchable
In Elasticsearch, you index, search, sort, and filter ________; not __________
This is a fundamentally different way of thinking about
data and is one of the reasons Elasticsearch can perform complex _________
documents; rows of columnar data; full-text search.
Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns
ElasticSearch => Indices => Types => Documents => Fields
What is an Index?
Index is like a databases. A place to store the documents.
What is indexing (verb)
To index a document in a index(noun) so it can be retrieved or queried
Insert a document
PUT /meijer/households/1 { "primary_tender" : "12312321" "primary_customer": "23123213" }
How to retrieve a document?
GET /meijer/households/1
Simplest search?
GET /meijer/households/_search //retrieves everything
GET /meijer/households/_search displays how many results.
By default it retrieve the top 10 results.
Lightweight search to get the customer = 1234
GET /meijer/households/_search?q=primary_customer:1234
ElasticSearch DSL search to get the customer = 1234
GET /meijer/households/_search { "query": { "match" : { "primary_customer":"1234" } } }
ElasticSearch DSL search history to get the customer = 1234 and dm_flag = true
GET /meijer/households/_search { "query": { "filtered": { "filter": { "range":{ "age":{ "gt":30 } } }, "match": { "customer":"1234" } } } }
What is relevance score?
How well the document matches the query
By default, Elasticsearch sorts matching results by their _________
relevance score
Elasticsearch vs RDMS
Relevance score is the major difference. In RDBMS the term either matches or not
“Complete phrase” search
GET /meijer/households/_search { "query": { "match_phrase": { "hobby":"rock climbing" } } }
Highlight searches/search phrases as google
GET /meijer/households/_search { "query":{ "match_phrase":{ "hobby":"rock climbing" } }, "highlight":{ "fields":{ "hobby":{} } } }
OP { ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go <em>rock</em> <em>climbing</em>" ] } } ] } }
Elasticsearch has functionality called _______, which allow you to generate sophisticated analytics over your data.
aggregations
One node in the cluster is elected to be the ______ node, which is in charge of managing cluster-wide changes
master
Master node is responsible for
- Creating and deleting indices
2. Adding and remove nodes from the cluster.
Master node doesn’t need to be involved in __________
Document-level searches or changes.
Users can talk to ______ node
Any node, including master node.
________ node knows where the document lives
Every node
__________ node is responsible for gathering the response from node or nodes holding the data and returning the final response to the client.
The node the user is talking to..which can be any node including the master node.
Get cluster health
GET /_cluster/health
Status colors
Green, Yellow, Red
Green status
Primary and replica shards are active
Yellow status
All primary shards are active but not all replica shards are not active
Red Status
Not all primary shards are active.
An index is a ___________ namespace that points to one or more ___________
Logical; Physical shards
What is a shard?
A shard is a low level worker that holds just a slice of all data that is stored in an index.
A shard is a single instance of ________
Lucene
Our documents are stored and indexed in _______
Shards.
Applications don’t talk to _______ but talk to _______
Shards; Index
How is an ElasticSearch cluster balanced?
As the data grows or shrinks, the shards are moved across the nodes to maintain the balance.
A shard is either a __________ shard or _________ shard
Primary or replica
Each document in your index belongs to a ___________
Primary shard.
__________ determines the maximum amount of data your index (limited by hw constraints) can hold.
Number of primary shards.
What is a replica shard?
A replica shard is a copy of the primary shard.
When is the number of primary shards fixed?
The number of primary shards are fixed when the index is created. The number of replica shards can change at any time.
By default indices are assigned __________ shards
5
Query to assign three primary shards
PUT /shard_setting { "settings":{ "number_of_shards": 3 "number_of_replicas" : 1 } }
Change the number of replicas for a index
PUT /shard_setting/_settings
{
“number_of_replicas”: 2
}
Which data in elasticsearch is indexed by default?
All data in a document is indexed by default
What metadata does a document consist of
- Index
- Type
- ID
Index naming constraints
- Lowercase
- cannot begin with underscore
- Cannot contain commas
Type naming constraints
- Lower case or uppercase
- cannot begin with underscore
- Cannot contain commas
___________ uniquely identifies a document.
ID when combined with index and type
Documents are indexed—stored and made searchable—by using the ________.
index API
Every time a change is made
to a document (including deleting it), the________ is incremented.
_version number
Pretty pull
GET /meijer/households/123?pretty
How to get the response code 404 or 200 OK
By passing -i in curl command curl -i -XGET http://localhost:9200/meijer/households/1234?pretty OP HTTP/1.1 404 Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 83 { "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false }
Retrieving Part of a Document
GET /meijer/households/1234?_source = title,text OP { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 1, "exists" : true, "_source" : { "title": "My first blog entry" , "text": "Just trying this out..." } }
Get only data without any metadata
GET /meijer/households/1234/_source
Checking Whether a Document Exists
curl -i -XHEAD https://localhost:9200/meijer/households/1234
OP
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Documents in Elasticsearch are ______– (mutable/immutable)
immutable; if we
need to update an existing document, we reindex or replace it
Create only if it doesn’t exist
PUT /meijer/households/123?op_type=create
or
PUT /meijer/households/123/_create
Deleting a Document
DELETE /meijer/households/123
ElasticSearch uses _________ concurrency control
Optimistic Concurrency control
What is optimistic concurrency control
Used by Elasticsearch, this approach assumes that conflicts are unlikely to happen and doesn’t block operations from being attempted. However, if the underlying data has been modified between reading and writing, the update will fail. It is then up to the application to decide how it should resolve the conflict. For instance, it could reattempt the update, using the fresh data, or it could report the
situation to the user
n the cluster. Elasticsearch is also asynchronous and concurrent, meaning that these
replication
requests are sent in parallel, and may arrive at their destination out of sequence
We can take advantage of the ______ to ensure that conflicting changes
made by our application do not result in data loss.
_version number; We want this update to succeed only if the current _version of this document in our index is version 1.
We want this update to succeed only if the current _version of this document in our index is version 1.
PUT /meijer/households/1234?version = 1
Partial Updates to Documents
POST /meijer/households/1234/_update { "doc":{ "dm_flag" = "true" "mp_dlag" = ["false"] } }
Retrieving Multiple Documents
Multi get API, mget
mget example
GET /_mget { "docs" :[ { "_index":"meijer" "_type": "households" "_id": "1234" }, { "_index" : "Rayleys" "type":"offers" "id": ["1","2","3","4"] } ] }