ElasticSearch Flashcards

1
Q

Elasticsearch is an open-source _______ built on top of _________

A

search engine; Apache Lucene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Apache Lucene

A

Apache Lucene is a fulltext search-engine library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does REST stand for?

A

representational state transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is RESTful service?

A

A RESTful service is one that implements REST pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is REST?

A

REST, or REpresentational State Transfer, is an architectural style for providing standards between computer systems on the web, making it easier for systems to communicate with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

There are 4 basic HTTP verbs we use in requests to interact with resources in a REST system:

A
  1. GET — retrieve a specific resource (by id) or a collection of resources
  2. POST — create a new resource
  3. PUT — update a specific resource (by id)
  4. DELETE — remove a specific resource by id
  5. For ES - HEAD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ElasticSearch

A
  1. Enables full-text search
  2. A distributed real-time document store where every field is indexed and searchable
  3. A distributed search engine with real-time analytics
  4. Capable of scaling to hundreds of servers and petabytes of structured and
    unstructured data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a node?

A

A node is a running instance of Elasticsearch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a cluster

A

A cluster is a group
of nodes with the same cluster.name that are working together
to share data and to provide failover and scale, although a single
node can form a cluster all by itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

All other languages can communicate with Elasticsearch over port _______ using a
_______, accessible with your favorite web client.

A

9200 ; RESTful API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A request to Elasticsearch consists of the same parts as any HTTP request:

A

curl -X ‘:///?’ -d ‘’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a protocol

A

Either http or https

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is HOST

A

The hostname of any node in your Elasticsearch cluster, or localhost for a node on your local machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is PORT

A

The port running the Elasticsearch HTTP service, which defaults to 9200.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is QUERY_STRING

A

Any optional query-string parameters (for example ?pretty will pretty-print the
JSON response to make it easier to read.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is BODY

A

A JSON-encoded request body (if the request needs one.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For instance, to count the number of documents in the cluster, we could use

A
curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
 "query": {
 "match_all": {}
 }
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

shorthand format:

A
GET /_count
{
 "query": {
 "match_all": {}
 }
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Elasticsearch is _______- oriented, meaning that it stores entire ___________

A

document; objects or documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does elastic search makes a document searchable?

A

It indexes the contents of the documents to make it searchable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In Elasticsearch, you index, search, sort, and filter ________; not __________
This is a fundamentally different way of thinking about
data and is one of the reasons Elasticsearch can perform complex _________

A

documents; rows of columnar data; full-text search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns

A

ElasticSearch => Indices => Types => Documents => Fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is an Index?

A

Index is like a databases. A place to store the documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is indexing (verb)

A

To index a document in a index(noun) so it can be retrieved or queried

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Insert a document

A
PUT /meijer/households/1
{
"primary_tender" : "12312321"
"primary_customer": "23123213"
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How to retrieve a document?

A

GET /meijer/households/1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Simplest search?

A

GET /meijer/households/_search //retrieves everything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

GET /meijer/households/_search displays how many results.

A

By default it retrieve the top 10 results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Lightweight search to get the customer = 1234

A

GET /meijer/households/_search?q=primary_customer:1234

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

ElasticSearch DSL search to get the customer = 1234

A
GET /meijer/households/_search
{
"query": {
"match" : {
"primary_customer":"1234"
}
}
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

ElasticSearch DSL search history to get the customer = 1234 and dm_flag = true

A
GET /meijer/households/_search
{
"query": {
"filtered": 
{
"filter": {
"range":{
"age":{
"gt":30
}
}
},
"match": {
"customer":"1234"
}
}
}
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is relevance score?

A

How well the document matches the query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

By default, Elasticsearch sorts matching results by their _________

A

relevance score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Elasticsearch vs RDMS

A

Relevance score is the major difference. In RDBMS the term either matches or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

“Complete phrase” search

A
GET /meijer/households/_search
{
"query":
{
"match_phrase":
{
"hobby":"rock climbing"
}
}
}
36
Q

Highlight searches/search phrases as google

A
GET /meijer/households/_search
{
"query":{
"match_phrase":{
"hobby":"rock climbing"
}
},
"highlight":{
"fields":{
"hobby":{}
}
}
}
OP
{
 ...
 "hits": {
 "total": 1,
 "max_score": 0.23013961,
 "hits": [
 {
 ...
 "_score": 0.23013961,
 "_source": {
 "first_name": "John",
 "last_name": "Smith",
 "age": 25,
 "about": "I love to go rock climbing",
 "interests": [ "sports", "music" ]
 },
 "highlight": {
 "about": [
 "I love to go <em>rock</em> <em>climbing</em>"
 ]
 }
 }
 ]
 }
}
37
Q

Elasticsearch has functionality called _______, which allow you to generate sophisticated analytics over your data.

A

aggregations

38
Q

One node in the cluster is elected to be the ______ node, which is in charge of managing cluster-wide changes

A

master

39
Q

Master node is responsible for

A
  1. Creating and deleting indices

2. Adding and remove nodes from the cluster.

40
Q

Master node doesn’t need to be involved in __________

A

Document-level searches or changes.

41
Q

Users can talk to ______ node

A

Any node, including master node.

42
Q

________ node knows where the document lives

A

Every node

43
Q

__________ node is responsible for gathering the response from node or nodes holding the data and returning the final response to the client.

A

The node the user is talking to..which can be any node including the master node.

44
Q

Get cluster health

A

GET /_cluster/health

45
Q

Status colors

A

Green, Yellow, Red

46
Q

Green status

A

Primary and replica shards are active

47
Q

Yellow status

A

All primary shards are active but not all replica shards are not active

48
Q

Red Status

A

Not all primary shards are active.

49
Q

An index is a ___________ namespace that points to one or more ___________

A

Logical; Physical shards

50
Q

What is a shard?

A

A shard is a low level worker that holds just a slice of all data that is stored in an index.

51
Q

A shard is a single instance of ________

A

Lucene

52
Q

Our documents are stored and indexed in _______

A

Shards.

53
Q

Applications don’t talk to _______ but talk to _______

A

Shards; Index

54
Q

How is an ElasticSearch cluster balanced?

A

As the data grows or shrinks, the shards are moved across the nodes to maintain the balance.

55
Q

A shard is either a __________ shard or _________ shard

A

Primary or replica

56
Q

Each document in your index belongs to a ___________

A

Primary shard.

57
Q

__________ determines the maximum amount of data your index (limited by hw constraints) can hold.

A

Number of primary shards.

58
Q

What is a replica shard?

A

A replica shard is a copy of the primary shard.

59
Q

When is the number of primary shards fixed?

A

The number of primary shards are fixed when the index is created. The number of replica shards can change at any time.

60
Q

By default indices are assigned __________ shards

A

5

61
Q

Query to assign three primary shards

A
PUT /shard_setting
{
"settings":{
"number_of_shards": 3
"number_of_replicas" : 1
}
}
62
Q

Change the number of replicas for a index

A

PUT /shard_setting/_settings
{
“number_of_replicas”: 2
}

63
Q

Which data in elasticsearch is indexed by default?

A

All data in a document is indexed by default

64
Q

What metadata does a document consist of

A
  1. Index
  2. Type
  3. ID
65
Q

Index naming constraints

A
  1. Lowercase
  2. cannot begin with underscore
  3. Cannot contain commas
66
Q

Type naming constraints

A
  1. Lower case or uppercase
  2. cannot begin with underscore
  3. Cannot contain commas
67
Q

___________ uniquely identifies a document.

A

ID when combined with index and type

68
Q

Documents are indexed—stored and made searchable—by using the ________.

A

index API

69
Q

Every time a change is made

to a document (including deleting it), the________ is incremented.

A

_version number

70
Q

Pretty pull

A

GET /meijer/households/123?pretty

71
Q

How to get the response code 404 or 200 OK

A
By passing -i in curl command
curl -i -XGET http://localhost:9200/meijer/households/1234?pretty
OP
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 83
{
 "_index" : "website",
 "_type" : "blog",
 "_id" : "124",
 "found" : false
}
72
Q

Retrieving Part of a Document

A
GET /meijer/households/1234?_source = title,text
OP
{
 "_index" : "website",
 "_type" : "blog",
 "_id" : "123",
 "_version" : 1,
 "exists" : true,
 "_source" : {
 "title": "My first blog entry" ,
 "text": "Just trying this out..."
 }
}
73
Q

Get only data without any metadata

A

GET /meijer/households/1234/_source

74
Q

Checking Whether a Document Exists

A

curl -i -XHEAD https://localhost:9200/meijer/households/1234

OP
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

75
Q

Documents in Elasticsearch are ______– (mutable/immutable)

A

immutable; if we

need to update an existing document, we reindex or replace it

76
Q

Create only if it doesn’t exist

A

PUT /meijer/households/123?op_type=create
or
PUT /meijer/households/123/_create

77
Q

Deleting a Document

A

DELETE /meijer/households/123

78
Q

ElasticSearch uses _________ concurrency control

A

Optimistic Concurrency control

79
Q

What is optimistic concurrency control

A

Used by Elasticsearch, this approach assumes that conflicts are unlikely to happen and doesn’t block operations from being attempted. However, if the underlying data has been modified between reading and writing, the update will fail. It is then up to the application to decide how it should resolve the conflict. For instance, it could reattempt the update, using the fresh data, or it could report the
situation to the user

80
Q

n the cluster. Elasticsearch is also asynchronous and concurrent, meaning that these

A

replication

requests are sent in parallel, and may arrive at their destination out of sequence

81
Q

We can take advantage of the ______ to ensure that conflicting changes
made by our application do not result in data loss.

A

_version number; We want this update to succeed only if the current _version of this document in our index is version 1.

82
Q

We want this update to succeed only if the current _version of this document in our index is version 1.

A

PUT /meijer/households/1234?version = 1

83
Q

Partial Updates to Documents

A
POST /meijer/households/1234/_update
{
"doc":{
"dm_flag" = "true"
"mp_dlag" = ["false"]
}
}
84
Q

Retrieving Multiple Documents

A

Multi get API, mget

85
Q

mget example

A
GET /_mget
{
"docs" :[
{
"_index":"meijer"
"_type": "households"
"_id": "1234"
},
{
"_index" : "Rayleys"
"type":"offers"
"id": ["1","2","3","4"]
}
]
}