Indexer Clustering (Architecture and theory) Flashcards

Question

What is distributed search?

Answer 1

one or more search heads distribute search requests across multiple indexers. The indexers still perform the actual searching of their own indexes, but the search heads manage the overall search process across all the indexers and present the consolidated search results to the user. With distributed search, a Splunk Enterprise instance called a search head sends search requests to a group of indexers, or search peers, which perform the actual searches on their indexes. The search head then merges the results back to the user.

Answer 2

$SPLUNK\_HOME/var/lib/splunk

Answer 3

There are several key reasons for having multiple indexes: To control user access. To accommodate varying retention policies. To speed searches in certain situations. The main reason you'd set up multiple indexes is to control user access to the data that's in them. When you assign users to roles, you can limit user searches to specific indexes based on the role they're in. In addition, if you have different policies for retention for different sets of data, you might want to send the data to different indexes and then set a different archive or retention policy for each index. Another reason to set up multiple indexes has to do with the way search works. If you have both a high-volume/high-noise data source and a low-volume data source feeding into the same index, and you search mostly for events from the low-volume data source, the search speed will be slower than necessary, because the indexer also has to search through all the data from the high-volume source. To mitigate this, you can create dedicated indexes for each data source and send data from each source to its dedicated index. Then, you can specify which index to search on. You'll probably notice an increase in search speed.

Answer 4

The following example inputs.conf stanza sends all data from /var/log to an index named fflanda: [monitor:///var/log] disabled = false index = fflanda

Answer 5

Index parallelization is a feature that allows an indexer to maintain multiple pipeline sets. A pipeline set handles the processing of data from ingestion of raw data, through event processing, to writing the events to disk. A pipeline set is one instance of the processing pipeline described in How indexing works. It is called a "pipeline set" because it comprises the individual pipelines, such as the parsing pipeline and the indexing pipeline, that together constitute the overall processing pipeline By default, an indexer runs just a single pipeline set. However, if the underlying machine is under-utilized, in terms of available cores and I/O both, you can configure the indexer to run two pipeline sets. By running two pipeline sets, you potentially double the indexer's indexing throughput capacity. Note: The actual amount of increased throughput on your indexer depends on the nature of your data inputs and other factors. In addition, if the indexer is having difficulty handling bursts of data, index parallelization can help it to accommodate the bursts, assuming again that the machine has the available capacity. To summarize, these are some typical use cases for index parallelization, dependent on available machine resources: Scale indexer throughput. Handle bursts of data. For a better understanding of the use cases and to determine whether your deployment can benefit from multiple pipeline sets, see Parallelization settings in the Capacity Planning Manual. Note: You cannot use index parallelization with multiple pipeline sets for metrics data that is received from a UDP data input. If your system uses multiple pipeline sets, use a TCP or HTTP Event Collector data input for metrics data. For more about metrics, see the Metrics manual.

Answer 6

You can configure forwarders to run multiple pipeline sets. Multiple pipeline sets increase forwarder throughput and allow the forwarder to process multiple inputs simultaneously. This can be of particular value, for example, when a forwarder needs to process a large file that would occupy the pipeline for a long period of time. With just a single pipeline, no other files can be processed until the forwarder finishes the large file. With two pipeline sets, the second pipeline can ingest and forward smaller files quickly, while the first pipeline continues to process the large file. Assuming that the forwarder has sufficient resources and depending on the nature of the incoming data, a forwarder with two pipelines can potentially forward twice as much data as a forwarder with one pipeline.

Answer 7

License warnings occur when you exceed the maximum daily indexing volume allowed for your license. Here are the conditions: Your daily indexing volume is measured from midnight to midnight using the clock on the license master. If you exceed your licensed daily volume on any one calendar day, you generate a license warning. If you generate a license warning, you have until midnight on the license master to resolve the warning before it counts against the total number of warnings allowed by your license. For guidance on what to do when a warning appears, see Correct license warnings.

Answer 8

The path that contains the hot and warm buckets. (Required.) This location must be writable.

Answer 9

The path that contains the cold buckets. (Required.) This location must be writable.

Answer 10

The path that contains any thawed buckets. (Required.) This location must be writable.

Answer 11

Determines whether the index gets replicated to other cluster peers. (Required for indexes on cluster peer nodes.)

Answer 12

The maximum number of concurrent hot buckets. This value should be at least 2, to deal with any archival data. The main default index, for example, has this value set to 10.

Answer 13

Determines rolling behavior, hot to warm. The maximum size for a hot bucket. When a hot bucket reaches this size, it rolls to warm. This attribute also determines the approximate size for all buckets.

Answer 14

Determines rolling behavior, warm to cold. The maximum number of warm buckets. When the maximum is reached, warm buckets begin rolling to cold.

Answer 15

Determines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen.

Answer 16

Determines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen.

Answer 17

Location for archived data. Determines behavior when a bucket rolls from cold to frozen. If set, the indexer will archive frozen buckets into this directory just before deleting them from the index.

Answer 18

Script to run just before a cold bucket rolls to frozen. If you set both this attribute and coldToFrozenDir, the indexer will use coldToFrozenDir and ignore this attribute.

Answer 19

Maximum size for homePath (hot/warm bucket storage) or coldPath (cold bucket storage). If either attribute is missing or set to 0, its path is not individually constrained in size.

Answer 20

Maximum size for a volume. If the attribute is missing, the individual volume is not constrained in size.

Answer 21

What does homePath in indexes.conf do?

Answer 22

What does coldPath in indexes.conf do?

Answer 23

What does thawedPath in indexes.conf do?

Answer 24

What does repFactor on indexes.conf do?

Answer 25

What does maxHotBuckets on indexes.conf do?

Answer 26

What does maxDataSize in indexes.conf do?

Answer 27

What does maxWarmDBCount in indexes.conf do?

Answer 28

What does maxTotalDataSizeMB in indexes.conf do?

Answer 29

What does frozenTimePeriodInSecs in indexes.conf do?

Answer 30

What does coldToFrozenDir in indexes.conf do?

Answer 31

What does coldToFrozenScript in indexes.conf do?

Answer 32

What does homePath.maxDataSizeMB coldPath.maxDataSizeMB in indexes.conf do?

Answer 33

What does maxVolumeDataSizeMB in indexes.conf do?

Answer 34

Bucket names depend on: a) The state of the bucket: hot, warm/cold/thawed b) The type of bucket directory: non-clustered, clustered-originating, clustered replicated

Answer 35

It represents a directory on the file system where indexed data resides. Volumes can store data from multiple indexes. You would typically use seperate volume for hot/warm and cold buckets, for instance you can set up one volume to contain the hot/warm buckets for all your indexes, and another volume to contain the cold buckets.

Answer 36

in indexes.conf: [volume:] path = maxVolumedataSizeMB = ... (optional)

Answer 37

Once you configured volumes, you can use them to define index's homepath and coldpath. For Example: In indexes.conf: [idx1] homePath = volume:hot1/idx1 coldPath = volume:cold1/idx1 [idx2] homePath = volume:hot1/idx2 coldPath = volume:cold1/idx2

Indexer Clustering (Architecture and theory) Flashcards

(61 cards)