Indexing Flashcards
What is the most important field in an event when indexing?
Timestamp Followed by sourcetype
What are indexes?
Buckets Hot -> Warm -> Cold -> Frozen
- Raw data in compressed form (journal.gz), TSIDX files, and other metadata
What is the Data Pipeline?
Parsing Queue -> Aggregator Queue -> Typing Queue -> Index Queue
- HF skips index queue and sends data to Indexing Tier, but performs all other functions
What are homePath, coldPath, thawedPath?
Indexed data = a mix of files
What is the summaryHomePath?
Report Acceleration summaries = actually CSV files(results)
What is the tstatsHomePath ?
Data Model Acceleration summaries = These are TSIDX files
What are Events?
Records of activity found in inputs and machine data.
Describe the Parsing Queue Line Breaker
Splunk will split data stream into events using ([\r\n]+)
- Create alternate line delimiters using custom regular expressions (regex)
Using this may increase indexing speed
UTF-8 Encoding is the default (Can be changed via CHARSET in props)
Line-Breaker ([\r\n]+) Newline setting is the default but can be configured via Props for outliers
Describe the Parsing Queue Header
This setting applies at input time, when data is first read by Slunk.
HEADER_MODE (Props) Setting is used to handle log files with consistent header information
- empty (Default) | always | firstline | none
- If “always”, any line with SPLUNK can be used to rewrite index-time fields.
- If “firstline”, only the first line can be used to rewrite index-time fields.
- If “none”, the string SPLUNK is treated as normal data.
- If , scripted inputs take the value “always” and file inputs take the value “none”.
Decribe the Merging Pipeline Aggregator
Splunk will merge lines separated by line breaker into events
Best practice for efficiency: Use SHOULD_LINEMERGE=False combined with appropriate Line Breaker settings
Describe the Typing Pipeline Regex Replacement
- Anonymizing Sensitive Data with SEDCMD
- Must define new indexed fields in Fields.conf file on SH and Indexers
Describe the Typing Pipeline Annotator
Splunk can identify punctuation patterns to find similar events
- ANNOTATE_PUNCT = t|f Determines whether to index special token stating with “punct::”
Describe the Indexing Pipeline TCP Out
Splunk can send data to other SPlunk instances, or external log sources.
Dedicated processor: forward data over TCP (raw or s2s)
Maintains queue for each tcpout group (group of indexers)
Describe the Indexing Pipeline Syslog Out
The syslog output processor sends RFC 3164-Compliant events to a TCP/UDP-based server and port, making the payload of any non-compliant data RFC 3164-compliant.
Describe the Indexing Pipeline Indexer
Splunk Transforms data into events and stores it in indexes
- Internal data is written to several preconfigured indexes
- External data is written to a single, preconfigured index
- Additional indexes can be created to meet specific data requirements
Describe indexing pipeline Segmentation
A segment is a searchable part of an event
Splunk breaks events into segments at index/search time
- Major Breakers - From segmenters.conf - [ ] < > ( ) { } | ! ; , ‘ “ * \n \r \s \t / : = @ . ? - & $ # + % _ \ %
- Minor Breakers - Configurable to break events into smaller chunks
Describe a Time Series Index
These are TSIDX files. Optimized to execute arbitrary boolean keyword searches and return millions of events in reverse time order
Inverted Index
- Allows for fast full text searches
- Maps Keywords to locations in raw data
Two basic Components
- Lexicon
- Value Arrays containing information about events.
What is Retention?
How long to keep the data and how long it should be searchable.
Allow Splunk to expire old data to make room for new data
Most restrictive rule wins
What is retention “change in state”?
Buckets could move from Warm -> Cold, Cold -> Frozen, or Warm->Frozen
Buckets can not move from Hot -> Frozen
If a bucket is moved from Hot -> Warm on the same storage, search performance will be unchanged
If Bucket is moved to a directory on Slower disks search Performance will be impacted.
Thawing a bucket requires user action
What do Frozen buckets typically contain?
Frozen Buckets typically contain only journal data