Indexing Flashcards
What is the system recommodation for the reference indexer?
12 cores, 2+ GHz, 12 GB RAM, 800 IOPS
What is the system recommodation for the high end indexer?
48 cores, 2+ GHz, 128 GB RAM, 1200 IOPS
Give an example of how 800 IOPS can be reached?
By using eight x-GB, 15,000 RPM, serial-attached SCSI (SAS) HDs in a Redundant Array of Independent Disks (RAID) 1+0 fault tolerance scheme as the disk subsystem.
Each hard drive is capable of about 200 average IOPS. The combined array produces a little over 800 average IOPS.
What is a realiable methode to meassure IOPS on a disk subsystem?
bonnie++ or FIO
In case you need to meassure the IOPS at customer site with shared storage, what do you need to consider?
To perform the test on all indexers at the same time to get a reliable results
List the index artifacts and where they are located?
The indexing artifacts are stored under $SPLUNK_DB/etc/var/lib/
Data is stored in buckets. One index can contain several buckets.
There are different types of buckets (hot,warm,cold)
Frozen buckets per default will be deleted. You can specify to archive them.
Does hot and warm buckets can be seperated on disk?
No, they do live under the same directory.
The path of hot/and warm buckets and be configured with homePath.maxDataSizeMB
Can warm/hot buckets be seperated from cold? If so, what would be a common use case?
Yes, they can be seperated.
A common use case would be different underlying storage systems, eg hot/warms on high performance storage and cold on slower storage.
What is the default time until data in an index gets frozen?
~6 years
What is the rolling behaivor for maxDataSize?
Hot to warm
What is the rolling behavior of maxWarmDBCount?
Warm to cold
How do you configure maximum size for cold storage?
coldPath.maxDataSizeMB
What is the default setting for maxTotalDataSizeMB?
500000 MB [~500GB]
What is the rolling behavior maxTotalDataSizeMB?
Cold to frozen [based on size]
How do you configure maximum size of an index?
maxTotalDataSizeMB
What is the rolling behavior for frozenTimePeriodInSeconds?
Cold to frozen [based on time]
What is the default setting for maxHotBuckets?
3
What 3 bucket controls are settable in the GUI?
1) maxTotalDataSizeMB
2) maxDataSize
3) timePeriodInSecsBeforeTsidxReduction
What is the default setting for maxDataSize?
auto (sets the size of hot buckets to 750MB)
You should use “auto_high_volume” for high-volume indexes (such as a firewall index); otherwise, use “auto”. A “high volume index” would typically be considered one that gets over 10GB of data per day
How do you configure the maximum number of warm buckets?
maxWarmDBCount
What is maxTotalDataSizeMB applied to?
to both, homepath and coldpath
Why using volumes can be a good aproach?
Using volumes helps to prevent failures in index size calculation.
A volume offers the possibility to assing several index to one volume. The volume has a maximum limit. That does prevent indexes to growth until the maximum disk space reaches and keeps the index size under control.
hot/warm and cold can be on different volumes
Which bucket type needs to have a volume defintion to work?
tstatsHomePath (Accelerated Data Models)
What is the differene between a pipline and processor?
- Pipeline : A thread. Splunk creates a thread for each pipeline. Multiple pipelines run in parallel.
- Processor: Processes in pipeline
How does the word ‘queue’ fit into the picture of a pipline and processors?
Each pipepline has a seperat queue where the data ‘waits’ to be processed, similar to a mail or printer queue. Its a memory space between pipelines to store data.
In which pipepline is the UTF-8 processor located?
Parsing Pipeline
In which pipepline is the annotator processor located?
Typing Pipeline
In which piepline is the indexandforward processor located?
Indexing Pipeline
What happens when an event enters the linebreaker processor?
Splits data stream into events based on the linebreaker configuration
What processor anonymizing sensitive data?
regexreplacement processor in the typing pipeline
Which pipelines use props.conf?
All of them using props.conf, except indexing pipeline
Which pipeline uses outputs.conf?
Indexing pipeline
Which pipeline and which processor does the timestamp extraction?
Typing pipeline, aggregator processor
In case the ‘great eight’ have been configured, which pipeline will have a significant lower load?
Parsing pipeline and merging pipeline
What is the average percentage of compressed data itself (journal.gz) residing in buckets?
15%
What is the average size in percent of the lexicon (TSIDX) files which reside in a bucket?
35%
What is the average compression for all indexed data?
50% - which containts 15% journal.gz and 35% TSIDX
What kind of data does live under the tstatsHomePath?
Accelerated Data Models
What kind of data does live under the summaryHomePath?
Accelerated Reports
Which pipepline forwards data to the nullQueue?
Typing pipeline
What are thawed buckets?
Data restored from an archive. If you archive frozen data, you can later return it to the index by thawing it.