Getting Started with elastic search cluster Flashcards

Question

ES: Explain the 'null_value' common attribute of a type.

Answer 1

This attribute specifies a value that should be written into the index if that field is not a part of an indexed document. The default behavior will just omit that field.

Answer 2

This attribute specifies if the field should be included in the _all field. By default, if the _all field is used, all the fields will be included in it.

Answer 3

This can take the values no (the default one), yes, with_offsets, with_positions, or with_positions_offsets. It defines whether the Lucene term vectors should be calculated for that field or not. If you are using highlighting, you will need to calculate term vectors.

Answer 4

This can take the value true or false. The default value is false. When this attribute is set to true, it disables the Lucene norms calculation for that field (and thus you can't use index-time boosting).

Answer 5

This allows to set indexing options. The possible values are "docs" which affects in number of documents for terms to be indexed, "freqs" which results in indexing number of documents for terms and term frequencies and "positions" which results in the previously mentioned two and term positions. The default value is freqs

Answer 6

This is the name of the analyzer used for indexing and searching. It defaults to the globally defined analyzer name.

Answer 7

This is the name of the analyzer used for indexing

Answer 8

This is the name of the analyzer used for processing the part of the query string that is sent to that field.

Answer 9

This is the maximum size of the field. The rest of the fields beyond the specified value characters will be ignored. This attribute is useful if we are only interested in the first N characters of the field.

Answer 10

byte: A byte value; for example, 1 short: A short value; for example, 12 integer: An integer value; for example, 134 long: A long value; for example, 12345 float: A float value; for example, 12.23 double: A double value, for example, 12.23

Answer 11

This is the number of terms generated for each value in a field. The lower the value, the higher the number of terms generated, resulting in faster range queries (but a higher index size). The default value is 4.

Answer 12

This can take the value true or false. The default value is false. It should be set to true in order to omit badly formatted values.

Answer 13

This specifies the format of the date. The default value is dateOptionalTime. For a full list of formats, please visit http://www.elasticsearch.org/guide/reference/mapping/date-format.html.

Answer 14

This core type is designed to be used for date indexing. It follows a specific format that can be changed and is stored in UTC by default.

Answer 15

The binary field is a BASE64 representation of the binary data stored in the index. You can use it to store data that is normally written in binary form, like images. Fields based on this type are, by default, stored and not indexed. The binary type only supports the index_name property.

Answer 16

Sometimes you would like to have the same field values in two fields—for example, one for searching and one for faceting. There is a special type in ElasticSearch—multi_field—that allows us to map several core types into a single field and have them analyzed differently. For example, if we would like to calculate faceting and search on our name field, we could define the following multi_field: "name": { "type": "multi_field", "fields": { "name": { "type" : "string", "index": "analyzed" }, "facet": { "type" : "string", "index": "not_analyzed" } } }

Answer 17

It's a functionality that is used to analyze data or queries in a way we want them to be indexed or searched.

Answer 18

standard; simple; whitespace; stop; keyword; pattern; language; snowball;

Answer 19

A standard analyzer that is convenient for most European languages (please refer to http://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html for the full list of parameters).

Answer 20

An analyzer that splits the provided value on non-letter characters and converts letters to lowercase.

Answer 21

An analyzer that splits the provided value on the basis of whitespace characters.

Answer 22

This is similar to a simple analyzer; but in addition to the simple analyzer functionality, it filters the data on the provided stop words set

Answer 23

This is a very simple analyzer that just passes the provided value. You'll achieve the same by specifying that field as not_analyzed.

Answer 24

This is an analyzer that allows flexible text separation by the use of regular expressions

Answer 25

This is an analyzer that is designed to work with a specific language

Answer 26

Ths is an analyzer similar to the standard one, but in addition, it provides a stemming algorithm

Answer 27

``` Each analyzer is built from a single tokenizer and multiple filters. When providing the mapping for an index, we have to add a settings section. In the following example we also have added a custom filter (ourEnglishFilter). e.g. "settings" : { "index" : { "analysis": { "analyzer": { "en": { "tokenizer": "standard", "filter": [ "asciifolding", "lowercase", "ourEnglishFilter" ] } }, "filter": { "ourEnglishFilter": { "type": "kstem" } } } } } ```

Answer 28

An analyzer field (_analyzer) allows us to specify a field value that will be used as the analyzer name for the document to which the field belongs. So if you have a language field, you can use that value to select the correct analyzer (given you have named you custom analyzers to match language values e.g. nl). { "mappings" : { "post" : { "_analyzer" : { "path" : "language" }, "properties" : { "id": { "type" : "long", "store" : "yes", "precision_step" : "0" }, "name": { "type" : "string", "store" : "yes", "index" : "analyzed" }, "language": { "type" : "string", "store" : "yes", "index" : "not_analyzed"} } } } }

Answer 29

``` This is done in the same way as configuring a custom analyzer in the settings section of the mappings file, but instead of specifying a custom name for the analyzer, the "default" keyword should be used. { "settings" : { "index" : { "analysis": { "analyzer": { "default": { "tokenizer": "standard", "filter": [ "asciifolding", "lowercase", "ourEnglishFilter" ] } }, "filter": { "ourEnglishFilter": { "type": "kstem" } } ``` } }

Answer 30

by default, ES stores the source json of a document in the _source field. You can disable this: "_source" : { "enabled" : false }

Answer 31

Sometimes, it's handy to have some of the fields copied into one; instead of searching multiple fields, a general purpose field will be used for searching—for example, when you don't know which fields to search on. By default, ElasticSearch will include the values from all the text fields into the _all field. You can disable this: "_all" : { "enabled" : false } However, please remember that the _all field will increase the size of the index, so it should be disabled if not needed.

Getting Started with elastic search cluster Flashcards

(55 cards)