Indexing Flashcards

Question 1

Q

What is the system recommodation for the reference indexer?

Answer

A

12 cores, 2+ GHz, 12 GB RAM, 800 IOPS

Question 2

Q

What is the system recommodation for the high end indexer?

Answer

A

48 cores, 2+ GHz, 128 GB RAM, 1200 IOPS

Question 3

Q

Give an example of how 800 IOPS can be reached?

Answer

A

By using eight x-GB, 15,000 RPM, serial-attached SCSI (SAS) HDs in a Redundant Array of Independent Disks (RAID) 1+0 fault tolerance scheme as the disk subsystem.

Each hard drive is capable of about 200 average IOPS. The combined array produces a little over 800 average IOPS.

Question 4

Q

What is a realiable methode to meassure IOPS on a disk subsystem?

Answer

A

bonnie++ or FIO

Question 5

Q

In case you need to meassure the IOPS at customer site with shared storage, what do you need to consider?

Answer

A

To perform the test on all indexers at the same time to get a reliable results

Question 6

Q

List the index artifacts and where they are located?

Answer

A

The indexing artifacts are stored under $SPLUNK_DB/etc/var/lib/

Data is stored in buckets. One index can contain several buckets.

There are different types of buckets (hot,warm,cold)

Frozen buckets per default will be deleted. You can specify to archive them.

Question 7

Q

Does hot and warm buckets can be seperated on disk?

Answer

A

No, they do live under the same directory.

The path of hot/and warm buckets and be configured with homePath.maxDataSizeMB

Question 8

Q

Can warm/hot buckets be seperated from cold? If so, what would be a common use case?

Answer

A

Yes, they can be seperated.

A common use case would be different underlying storage systems, eg hot/warms on high performance storage and cold on slower storage.

Question 9

Q

What is the default time until data in an index gets frozen?

Question 10

Q

What is the rolling behaivor for maxDataSize?

Answer

A

Hot to warm

Question 11

Q

What is the rolling behavior of maxWarmDBCount?

Answer

A

Warm to cold

Question 12

Q

How do you configure maximum size for cold storage?

Answer

A

coldPath.maxDataSizeMB

Question 13

Q

What is the default setting for maxTotalDataSizeMB?

Answer

A

500000 MB [~500GB]

Question 14

Q

What is the rolling behavior maxTotalDataSizeMB?

Answer

A

Cold to frozen [based on size]

Question 15

Q

How do you configure maximum size of an index?

Answer

A

maxTotalDataSizeMB

Question 16

Q

What is the rolling behavior for frozenTimePeriodInSeconds?

Answer

A

Cold to frozen [based on time]

Question 17

Q

What is the default setting for maxHotBuckets?

Question 18

Q

What 3 bucket controls are settable in the GUI?

Answer

A

1) maxTotalDataSizeMB
2) maxDataSize
3) timePeriodInSecsBeforeTsidxReduction

Question 19

Q

What is the default setting for maxDataSize?

Answer

A

auto (sets the size of hot buckets to 750MB)

You should use “auto_high_volume” for high-volume indexes (such as a firewall index); otherwise, use “auto”. A “high volume index” would typically be considered one that gets over 10GB of data per day

Question 20

Q

How do you configure the maximum number of warm buckets?

Answer

A

maxWarmDBCount

Question 21

Q

What is maxTotalDataSizeMB applied to?

Answer

A

to both, homepath and coldpath

Question 22

Q

Why using volumes can be a good aproach?

Answer

A

Using volumes helps to prevent failures in index size calculation.

A volume offers the possibility to assing several index to one volume. The volume has a maximum limit. That does prevent indexes to growth until the maximum disk space reaches and keeps the index size under control.

hot/warm and cold can be on different volumes

Question 23

Q

Which bucket type needs to have a volume defintion to work?

Answer

A

tstatsHomePath (Accelerated Data Models)

Question 24

Q

What is the differene between a pipline and processor?

Answer

A

Pipeline : A thread. Splunk creates a thread for each pipeline. Multiple pipelines run in parallel.
Processor: Processes in pipeline

Question 25

Q

How does the word ‘queue’ fit into the picture of a pipline and processors?

Answer

A

Each pipepline has a seperat queue where the data ‘waits’ to be processed, similar to a mail or printer queue. Its a memory space between pipelines to store data.

Question 26

Q

In which pipepline is the UTF-8 processor located?

Answer

A

Parsing Pipeline

Question 27

Q

In which pipepline is the annotator processor located?

Answer

A

Typing Pipeline

Question 28

Q

In which piepline is the indexandforward processor located?

Answer

A

Indexing Pipeline

Question 29

Q

What happens when an event enters the linebreaker processor?

Answer

A

Splits data stream into events based on the linebreaker configuration

Question 30

Q

What processor anonymizing sensitive data?

Answer

A

regexreplacement processor in the typing pipeline

Question 31

Q

Which pipelines use props.conf?

Answer

A

All of them using props.conf, except indexing pipeline

Question 32

Q

Which pipeline uses outputs.conf?

Answer

A

Indexing pipeline

Question 33

Q

Which pipeline and which processor does the timestamp extraction?

Answer

A

Typing pipeline, aggregator processor

Question 34

Q

In case the ‘great eight’ have been configured, which pipeline will have a significant lower load?

Answer

A

Parsing pipeline and merging pipeline

Question 35

Q

What is the average percentage of compressed data itself (journal.gz) residing in buckets?

Question 36

Q

What is the average size in percent of the lexicon (TSIDX) files which reside in a bucket?

Question 37

Q

What is the average compression for all indexed data?

Answer

A

50% - which containts 15% journal.gz and 35% TSIDX

Question 38

Q

What kind of data does live under the tstatsHomePath?

Answer

A

Accelerated Data Models

Question 39

Q

What kind of data does live under the summaryHomePath?

Answer

A

Accelerated Reports

Question 40

Q

Which pipepline forwards data to the nullQueue?

Answer

A

Typing pipeline

Question 41

Q

What are thawed buckets?

Answer

A

Data restored from an archive. If you archive frozen data, you can later return it to the index by thawing it.