Indexing Flashcards
What is the most important field in an event when indexing?
Timestamp Followed by sourcetype
What are indexes?
Buckets Hot -> Warm -> Cold -> Frozen
- Raw data in compressed form (journal.gz), TSIDX files, and other metadata
What is the Data Pipeline?
Parsing Queue -> Aggregator Queue -> Typing Queue -> Index Queue
- HF skips index queue and sends data to Indexing Tier, but performs all other functions
What are homePath, coldPath, thawedPath?
Indexed data = a mix of files
What is the summaryHomePath?
Report Acceleration summaries = actually CSV files(results)
What is the tstatsHomePath ?
Data Model Acceleration summaries = These are TSIDX files
What are Events?
Records of activity found in inputs and machine data.
Describe the Parsing Queue Line Breaker
Splunk will split data stream into events using ([\r\n]+)
- Create alternate line delimiters using custom regular expressions (regex)
Using this may increase indexing speed
UTF-8 Encoding is the default (Can be changed via CHARSET in props)
Line-Breaker ([\r\n]+) Newline setting is the default but can be configured via Props for outliers
Describe the Parsing Queue Header
This setting applies at input time, when data is first read by Slunk.
HEADER_MODE (Props) Setting is used to handle log files with consistent header information
- empty (Default) | always | firstline | none
- If “always”, any line with SPLUNK can be used to rewrite index-time fields.
- If “firstline”, only the first line can be used to rewrite index-time fields.
- If “none”, the string SPLUNK is treated as normal data.
- If , scripted inputs take the value “always” and file inputs take the value “none”.
Decribe the Merging Pipeline Aggregator
Splunk will merge lines separated by line breaker into events
Best practice for efficiency: Use SHOULD_LINEMERGE=False combined with appropriate Line Breaker settings
Describe the Typing Pipeline Regex Replacement
- Anonymizing Sensitive Data with SEDCMD
- Must define new indexed fields in Fields.conf file on SH and Indexers
Describe the Typing Pipeline Annotator
Splunk can identify punctuation patterns to find similar events
- ANNOTATE_PUNCT = t|f Determines whether to index special token stating with “punct::”
Describe the Indexing Pipeline TCP Out
Splunk can send data to other SPlunk instances, or external log sources.
Dedicated processor: forward data over TCP (raw or s2s)
Maintains queue for each tcpout group (group of indexers)
Describe the Indexing Pipeline Syslog Out
The syslog output processor sends RFC 3164-Compliant events to a TCP/UDP-based server and port, making the payload of any non-compliant data RFC 3164-compliant.
Describe the Indexing Pipeline Indexer
Splunk Transforms data into events and stores it in indexes
- Internal data is written to several preconfigured indexes
- External data is written to a single, preconfigured index
- Additional indexes can be created to meet specific data requirements