Getting Started with elastic search cluster Flashcards

1
Q

What is an index?

A

An index is the place where ElasticSearch stores data. If you come from the relational database world, you can think of an index like a table. But in contrast to a relational database, the table values stored in an index are prepared for fast and efficient full-text searching and in particular, do not have to store the original values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a document?

A

The main entity stored in ElasticSearch is a document. In an analogy to relational databases, a document is a row of data in a database table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a field? How is it used in documents?

A

Documents consist of fields (row columns), but each field may occur several times and such a field is called multivalued. Each field has a type (text, number, date, and so on). Field types can also be complex—a field can contain other subdocuments or arrays.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the structure of a document.

A

Unlike relational databases, documents don’t need to have a fixed structure; every document may have a different set of fields and in addition to that, the fields don’t have to be known during application development. Of course, one can force a document structure with the use of schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a document type?

A

In ElasticSearch, one index can store many objects with different purposes. For example, a blog application can store articles and comments. Document type lets us easily differentiate these objects. It is worth noting that practically every document can have a different structure; but in real operations, dividing it into types significantly helps in data manipulation. Of course, one needs to keep the limitations in mind. One such limitation is that the different document types can’t set different types for the same property.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe Node and Clusters

A

ElasticSearch can work as a standalone, single-search server. Nevertheless, to be able to process large sets of data and to achieve fault tolerance, ElasticSearch can be run on many cooperating servers. Collectively, these servers are called a cluster and each of them is called a node. Large amounts of data can be split across many nodes via index sharding (splitting it into smaller individual parts). Better availability and performance are achieved through the replicas (copies of index parts).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Shard?

A

When we have a large number of documents, we can come to a point where a single node is not enough because of the RAM limitations, hard disk capacity, and so on. The other problem is that the desired functionality is so complicated that the server computing power is not sufficient. In such cases, the data can be divided into smaller parts called shards, where each shard is a separate Apache Lucene index. Each shard can be placed on a different server and thus your data can be spread among the clusters. When you query an index that is built from multiple shards, ElasticSearch sends the query to each relevant shard and merges the result in a transparent way so that your application doesn’t need to know about shards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a replica?

A

In order to increase query throughput or achieve high availability, shard replicas can be used. The primary shard is used as the place where operations that change the index are directed. A replica is just an exact copy of the primary shard and each shard can have zero or more replicas. When the primary shard is lost (for example, the server holding the shard data is unavailable), a cluster can promote a replica to be the new primary shard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ES: Where is all the data used in ES stored?

A

in the data directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ES: what are the main configuration fileS?

A

The whole configuration is located in the config directory. We can see two files there: elasticsearch.yml (or elasticsearch.json, which will be used if present) and logging.yml. The first file is responsible for setting the default configuration values for the server. This is important because some of these values can be changed at runtime and be kept as a part of the cluster state, so the values in this file may not be accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ES: How can you configure a node to belong to a clusteR?

A

in es.yml The cluster.name property is responsible for holding the name of our cluster. The cluster name separates different clusters from each other. Nodes configured with the same name will try to form a cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ES: what is the default port?

A

9200

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ES: how do you make responses human readable?

A

Add ?pretty parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ES: How can you shut down ES?

A

There are three ways in which we can shut down ElasticSearch:

If your node is attached to the console (run with the –f option), just press Ctrl + C
The second option is to kill the server process by sending the TERM signal (see the kill command on the Linux boxes and program manager on Windows)
The third method is to use a REST API e.g. curl -XPOST http://localhost:9200/_cluster/nodes/_shutdown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ES: how can you manipulate indexes?

A

By using common REST calls. e.g. POST to create a new resource when you don’t know the id.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ES: How can you update documents?

A

Internally, ElasticSearch must fetch the document, take its data from the _source field, remove the old document, apply changes, and index it as a new document. ElasticSearch implements this through a script given as a parameter.
e.g. curl -XPOST http://localhost:9200/blog/article/1/_update -d ‘{
“script”: “ctx._source.content = "new content"”
}’
Notice that we didn’t have to send the whole document, only the changed parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ES: How can you create a mapping for an index?

A

create a json file with the mapping info. Post this to the desired index. e.g. curl -XPOST ‘http://localhost:9200/posts’ –d @posts.json

posts.json:
curl -XPOST ‘http://localhost:9200/posts’ –d @posts.json

content of posts.json:
{
“mappings”: {
“post”: {
“properties”: {
“id”: {“type”:”long”, “store”:”yes”,
“precision_step”:”0” },
“name”: {“type”:”string”, “store”:”yes”,
“index”:”analyzed” },
“published”: {“type”:”date”, “store”:”yes”,
“precision_step”:”0” },
“contents”: {“type”:”string”, “store”:”no”,
“index”:”analyzed” }
}
}
}
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

ES: What is a mapping of an index?

A

A loose schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

ES: What are the core field types?

A
String
Number
Date
Boolean
Binary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

ES: What are the common attributes that you can use to describe all the types?

A

index_name; index; store; boost; null_value; include_in_all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

ES: Explain the index_name common attribute of a type.

A

This is the name of the field that will be stored in the index. If this is not defined, the name will be set to the name of the object that the field is defined with. You’ll usually omit this property.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

ES: Explain the ‘index’ common attribute of a type.

A

This can take the values analyzed and no. For the string-based fields, it can also be set to not_analyzed. If set to analyzed, the field will be indexed and thus searchable. If set to no, you won’t be able to search such a field. The default value is analyzed. In the case of the string-based fields, there is an additional option—not_analyzed, which says that the field should be indexed but not processed by the analyzer. So, it is written in the index as it was sent to ElasticSearch and only the perfect match will be counted during a search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

ES: Explain the ‘store’ common attribute of a type.

A

This can take the values yes and no, and it specifies if the original value of the field should be written into the index. The default value is no, which means that you can’t return that field in the results (although if you use the _source field, you can return the value even if it is not stored), but if you have it indexed you still can search on it.

24
Q

ES: Explain the ‘boost’ common attribute of a type.

A

The default value of this attribute is 1. Basically, it defines how important the field is inside the document; the higher the boost, the more important are the values in the field.

25
Q

ES: Explain the ‘null_value’ common attribute of a type.

A

This attribute specifies a value that should be written into the index if that field is not a part of an indexed document. The default behavior will just omit that field.

26
Q

ES: Explain the ‘include_in_all’ common attribute of a type.

A

This attribute specifies if the field should be included in the _all field. By default, if the _all field is used, all the fields will be included in it.

27
Q

ES: Explain the “term_vector” attribute of the string type.

A

This can take the values no (the default one), yes, with_offsets, with_positions, or with_positions_offsets. It defines whether the Lucene term vectors should be calculated for that field or not. If you are using highlighting, you will need to calculate term vectors.

28
Q

ES: Explain the “omit_norms” attribute of the string type.

A

This can take the value true or false. The default value is false. When this attribute is set to true, it disables the Lucene norms calculation for that field (and thus you can’t use index-time boosting).

29
Q

ES: Explain the “index_options” attribute of the string type.

A

This allows to set indexing options. The possible values are “docs” which affects in number of documents for terms to be indexed, “freqs” which results in indexing number of documents for terms and term frequencies and “positions” which results in the previously mentioned two and term positions. The default value is freqs

30
Q

ES: Explain the “analyzer” attribute of the string type.

A

This is the name of the analyzer used for indexing and searching. It defaults to the globally defined analyzer name.

31
Q

ES: Explain the “index_analyzer” attribute of the string type.

A

This is the name of the analyzer used for indexing

32
Q

ES: Explain the “search_analyzer” attribute of the string type.

A

This is the name of the analyzer used for processing the part of the query string that is sent to that field.

33
Q

ES: Explain the “ignore_above” attribute of the string type.

A

This is the maximum size of the field. The rest of the fields beyond the specified value characters will be ignored. This attribute is useful if we are only interested in the first N characters of the field.

34
Q

ES: What are the available number types

A

byte: A byte value; for example, 1
short: A short value; for example, 12
integer: An integer value; for example, 134
long: A long value; for example, 12345
float: A float value; for example, 12.23
double: A double value, for example, 12.23

35
Q

ES: Explain the “precision_step” attribute of the number and date type.

A

This is the number of terms generated for each value in a field. The lower the value, the higher the number of terms generated, resulting in faster range queries (but a higher index size). The default value is 4.

36
Q

ES: Explain the “ignore_malformed” attribute of the number and date type.

A

This can take the value true or false. The default value is false. It should be set to true in order to omit badly formatted values.

37
Q

ES: Explain the “format” attribute of the date type.

A

This specifies the format of the date. The default value is dateOptionalTime. For a full list of formats, please visit http://www.elasticsearch.org/guide/reference/mapping/date-format.html.

38
Q

How is the Date core type used?

A

This core type is designed to be used for date indexing. It follows a specific format that can be changed and is stored in UTC by default.

39
Q

How is the Binary core type used?

A

The binary field is a BASE64 representation of the binary data stored in the index. You can use it to store data that is normally written in binary form, like images. Fields based on this type are, by default, stored and not indexed. The binary type only supports the index_name property.

40
Q

How is the multi_field core type used?

A

Sometimes you would like to have the same field values in two fields—for example, one for searching and one for faceting. There is a special type in ElasticSearch—multi_field—that allows us to map several core types into a single field and have them analyzed differently. For example, if we would like to calculate faceting and search on our name field, we could define the following multi_field:

“name”: {
“type”: “multi_field”,
“fields”: {
“name”: { “type” : “string”, “index”: “analyzed” },
“facet”: { “type” : “string”, “index”: “not_analyzed” }
}
}

41
Q

ES: What is an analyzer?

A

It’s a functionality that is used to analyze data or queries in a way we want them to be indexed or searched.

42
Q

ES: What analyzers are available out of the box?

A

standard; simple; whitespace; stop; keyword; pattern; language; snowball;

43
Q

ES: explain the ‘standard’ analyzer.

A

A standard analyzer that is convenient for most European languages (please refer to http://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html for the full list of parameters).

44
Q

ES: explain the ‘simple’ analyzer.

A

An analyzer that splits the provided value on non-letter characters and converts letters to lowercase.

45
Q

ES: explain the ‘whitespace’ analyzer.

A

An analyzer that splits the provided value on the basis of whitespace characters.

46
Q

ES: explain the ‘stop’ analyzer.

A

This is similar to a simple analyzer; but in addition to the simple analyzer functionality, it filters the data on the provided stop words set

47
Q

ES: explain the ‘keyword’ analyzer.

A

This is a very simple analyzer that just passes the provided value. You’ll achieve the same by specifying that field as not_analyzed.

48
Q

ES: explain the ‘pattern’ analyzer.

A

This is an analyzer that allows flexible text separation by the use of regular expressions

49
Q

ES: explain the ‘language’ analyzer.

A

This is an analyzer that is designed to work with a specific language

50
Q

ES: explain the ‘snowball’ analyzer.

A

Ths is an analyzer similar to the standard one, but in addition, it provides a stemming algorithm

51
Q

ES: How can you create your own analyzer?

A
Each analyzer is built from a single tokenizer and multiple filters. 
When providing the mapping for an index, we have to add a settings section. In the following example we also have added a custom filter (ourEnglishFilter).
e.g. 
"settings" : {
  "index" : {
    "analysis": {
      "analyzer": {
        "en": {
          "tokenizer": "standard",
          "filter": [
            "asciifolding",
            "lowercase",
            "ourEnglishFilter"
          ]
        }
      },
      "filter": {
        "ourEnglishFilter": {
          "type": "kstem"
        }
      }
    }
  } 
}
52
Q

ES: explain the _analyzer field in a mapping.

A

An analyzer field (_analyzer) allows us to specify a field value that will be used as the analyzer name for the document to which the field belongs. So if you have a language field, you can use that value to select the correct analyzer (given you have named you custom analyzers to match language values e.g. nl).
{
“mappings” : {
“post” : {
“_analyzer” : {
“path” : “language”
},
“properties” : {
“id”: { “type” : “long”, “store” : “yes”,
“precision_step” : “0” },
“name”: { “type” : “string”, “store” : “yes”,
“index” : “analyzed” },
“language”: { “type” : “string”, “store” : “yes”,
“index” : “not_analyzed”}
}
}
}
}

53
Q

ES: How can you specify a default analyzer in an index mapping file?

A
This is done in the same way as configuring a custom analyzer in the settings section of the mappings file, but instead of specifying a custom name for the analyzer, the "default" keyword should be used.
{
  "settings" : {
    "index" : {
      "analysis": {
        "analyzer": {
          "default": {
            "tokenizer": "standard",
            "filter": [
             "asciifolding",
             "lowercase",
             "ourEnglishFilter"
            ]
          }
        },
        "filter": {
          "ourEnglishFilter": {
            "type": "kstem"
          }
        }
}   }
54
Q

ES: Explain the _source field in an index mapping file.

A

by default, ES stores the source json of a document in the _source field. You can disable this:
“_source” : {
“enabled” : false
}

55
Q

ES: Explain the _all field in an index mapping file.

A

Sometimes, it’s handy to have some of the fields copied into one; instead of searching multiple fields, a general purpose field will be used for searching—for example, when you don’t know which fields to search on. By default, ElasticSearch will include the values from all the text fields into the _all field.
You can disable this:
“_all” : {
“enabled” : false
}
However, please remember that the _all field will increase the size of the index, so it should be disabled if not needed.