Getting Started with elastic search cluster Flashcards
What is an index?
An index is the place where ElasticSearch stores data. If you come from the relational database world, you can think of an index like a table. But in contrast to a relational database, the table values stored in an index are prepared for fast and efficient full-text searching and in particular, do not have to store the original values.
What is a document?
The main entity stored in ElasticSearch is a document. In an analogy to relational databases, a document is a row of data in a database table.
What is a field? How is it used in documents?
Documents consist of fields (row columns), but each field may occur several times and such a field is called multivalued. Each field has a type (text, number, date, and so on). Field types can also be complex—a field can contain other subdocuments or arrays.
Describe the structure of a document.
Unlike relational databases, documents don’t need to have a fixed structure; every document may have a different set of fields and in addition to that, the fields don’t have to be known during application development. Of course, one can force a document structure with the use of schema.
What is a document type?
In ElasticSearch, one index can store many objects with different purposes. For example, a blog application can store articles and comments. Document type lets us easily differentiate these objects. It is worth noting that practically every document can have a different structure; but in real operations, dividing it into types significantly helps in data manipulation. Of course, one needs to keep the limitations in mind. One such limitation is that the different document types can’t set different types for the same property.
Describe Node and Clusters
ElasticSearch can work as a standalone, single-search server. Nevertheless, to be able to process large sets of data and to achieve fault tolerance, ElasticSearch can be run on many cooperating servers. Collectively, these servers are called a cluster and each of them is called a node. Large amounts of data can be split across many nodes via index sharding (splitting it into smaller individual parts). Better availability and performance are achieved through the replicas (copies of index parts).
What is a Shard?
When we have a large number of documents, we can come to a point where a single node is not enough because of the RAM limitations, hard disk capacity, and so on. The other problem is that the desired functionality is so complicated that the server computing power is not sufficient. In such cases, the data can be divided into smaller parts called shards, where each shard is a separate Apache Lucene index. Each shard can be placed on a different server and thus your data can be spread among the clusters. When you query an index that is built from multiple shards, ElasticSearch sends the query to each relevant shard and merges the result in a transparent way so that your application doesn’t need to know about shards.
What is a replica?
In order to increase query throughput or achieve high availability, shard replicas can be used. The primary shard is used as the place where operations that change the index are directed. A replica is just an exact copy of the primary shard and each shard can have zero or more replicas. When the primary shard is lost (for example, the server holding the shard data is unavailable), a cluster can promote a replica to be the new primary shard.
ES: Where is all the data used in ES stored?
in the data directory
ES: what are the main configuration fileS?
The whole configuration is located in the config directory. We can see two files there: elasticsearch.yml (or elasticsearch.json, which will be used if present) and logging.yml. The first file is responsible for setting the default configuration values for the server. This is important because some of these values can be changed at runtime and be kept as a part of the cluster state, so the values in this file may not be accurate.
ES: How can you configure a node to belong to a clusteR?
in es.yml The cluster.name property is responsible for holding the name of our cluster. The cluster name separates different clusters from each other. Nodes configured with the same name will try to form a cluster.
ES: what is the default port?
9200
ES: how do you make responses human readable?
Add ?pretty parameter
ES: How can you shut down ES?
There are three ways in which we can shut down ElasticSearch:
If your node is attached to the console (run with the –f option), just press Ctrl + C
The second option is to kill the server process by sending the TERM signal (see the kill command on the Linux boxes and program manager on Windows)
The third method is to use a REST API e.g. curl -XPOST http://localhost:9200/_cluster/nodes/_shutdown
ES: how can you manipulate indexes?
By using common REST calls. e.g. POST to create a new resource when you don’t know the id.
ES: How can you update documents?
Internally, ElasticSearch must fetch the document, take its data from the _source field, remove the old document, apply changes, and index it as a new document. ElasticSearch implements this through a script given as a parameter.
e.g. curl -XPOST http://localhost:9200/blog/article/1/_update -d ‘{
“script”: “ctx._source.content = "new content"”
}’
Notice that we didn’t have to send the whole document, only the changed parts.
ES: How can you create a mapping for an index?
create a json file with the mapping info. Post this to the desired index. e.g. curl -XPOST ‘http://localhost:9200/posts’ –d @posts.json
posts.json:
curl -XPOST ‘http://localhost:9200/posts’ –d @posts.json
content of posts.json:
{
“mappings”: {
“post”: {
“properties”: {
“id”: {“type”:”long”, “store”:”yes”,
“precision_step”:”0” },
“name”: {“type”:”string”, “store”:”yes”,
“index”:”analyzed” },
“published”: {“type”:”date”, “store”:”yes”,
“precision_step”:”0” },
“contents”: {“type”:”string”, “store”:”no”,
“index”:”analyzed” }
}
}
}
}
ES: What is a mapping of an index?
A loose schema.
ES: What are the core field types?
String Number Date Boolean Binary
ES: What are the common attributes that you can use to describe all the types?
index_name; index; store; boost; null_value; include_in_all
ES: Explain the index_name common attribute of a type.
This is the name of the field that will be stored in the index. If this is not defined, the name will be set to the name of the object that the field is defined with. You’ll usually omit this property.
ES: Explain the ‘index’ common attribute of a type.
This can take the values analyzed and no. For the string-based fields, it can also be set to not_analyzed. If set to analyzed, the field will be indexed and thus searchable. If set to no, you won’t be able to search such a field. The default value is analyzed. In the case of the string-based fields, there is an additional option—not_analyzed, which says that the field should be indexed but not processed by the analyzer. So, it is written in the index as it was sent to ElasticSearch and only the perfect match will be counted during a search.