Indexing Flashcards

Question 1

Q

What is the most important field in an event when indexing?

Answer

A

Timestamp Followed by sourcetype

Question 2

Q

What are indexes?

Answer

A

Buckets Hot -> Warm -> Cold -> Frozen

- Raw data in compressed form (journal.gz), TSIDX files, and other metadata

Question 3

Q

What is the Data Pipeline?

Answer

A

Parsing Queue -> Aggregator Queue -> Typing Queue -> Index Queue

HF skips index queue and sends data to Indexing Tier, but performs all other functions

Question 4

Q

What are homePath, coldPath, thawedPath?

Answer

A

Indexed data = a mix of files

Question 5

Q

What is the summaryHomePath?

Answer

A

Report Acceleration summaries = actually CSV files(results)

Question 6

Q

What is the tstatsHomePath ?

Answer

A

Data Model Acceleration summaries = These are TSIDX files

Question 7

Q

What are Events?

Answer

A

Records of activity found in inputs and machine data.

Question 8

Q

Describe the Parsing Queue Line Breaker

Answer

A

Splunk will split data stream into events using ([\r\n]+)
- Create alternate line delimiters using custom regular expressions (regex)

Using this may increase indexing speed

UTF-8 Encoding is the default (Can be changed via CHARSET in props)

Line-Breaker ([\r\n]+) Newline setting is the default but can be configured via Props for outliers

Question 9

Q

Describe the Parsing Queue Header

Answer

A

This setting applies at input time, when data is first read by Slunk.

HEADER_MODE (Props) Setting is used to handle log files with consistent header information

empty (Default) | always | firstline | none
If “always”, any line with SPLUNK can be used to rewrite index-time fields.
If “firstline”, only the first line can be used to rewrite index-time fields.
If “none”, the string SPLUNK is treated as normal data.
If , scripted inputs take the value “always” and file inputs take the value “none”.

Question 10

Q

Decribe the Merging Pipeline Aggregator

Answer

A

Splunk will merge lines separated by line breaker into events

Best practice for efficiency: Use SHOULD_LINEMERGE=False combined with appropriate Line Breaker settings

Question 11

Q

Describe the Typing Pipeline Regex Replacement

Answer

A

Anonymizing Sensitive Data with SEDCMD

- Must define new indexed fields in Fields.conf file on SH and Indexers

Question 12

Q

Describe the Typing Pipeline Annotator

Answer

A

Splunk can identify punctuation patterns to find similar events
- ANNOTATE_PUNCT = t|f Determines whether to index special token stating with “punct::”

Question 13

Q

Describe the Indexing Pipeline TCP Out

Answer

A

Splunk can send data to other SPlunk instances, or external log sources.

Dedicated processor: forward data over TCP (raw or s2s)

Maintains queue for each tcpout group (group of indexers)

Question 14

Q

Describe the Indexing Pipeline Syslog Out

Answer

A

The syslog output processor sends RFC 3164-Compliant events to a TCP/UDP-based server and port, making the payload of any non-compliant data RFC 3164-compliant.

Question 15

Q

Describe the Indexing Pipeline Indexer

Answer

A

Splunk Transforms data into events and stores it in indexes

Internal data is written to several preconfigured indexes
External data is written to a single, preconfigured index
Additional indexes can be created to meet specific data requirements

Question 16

Q

Describe indexing pipeline Segmentation

Answer

Study These Flashcards

A

A segment is a searchable part of an event

Splunk breaks events into segments at index/search time

Major Breakers - From segmenters.conf - [ ] < > ( ) { } | ! ; , ‘ “ * \n \r \s \t / : = @ . ? - & $ # + % _ \ %
Minor Breakers - Configurable to break events into smaller chunks

Question 17

Q

Describe a Time Series Index

Answer

Study These Flashcards

A

These are TSIDX files. Optimized to execute arbitrary boolean keyword searches and return millions of events in reverse time order

Inverted Index

Allows for fast full text searches
Maps Keywords to locations in raw data

Two basic Components

Lexicon
Value Arrays containing information about events.

Question 18

Q

What is Retention?

Answer

Study These Flashcards

A

How long to keep the data and how long it should be searchable.

Allow Splunk to expire old data to make room for new data

Most restrictive rule wins

Question 19

Q

What is retention “change in state”?

Answer

Study These Flashcards

A

Buckets could move from Warm -> Cold, Cold -> Frozen, or Warm->Frozen
Buckets can not move from Hot -> Frozen

If a bucket is moved from Hot -> Warm on the same storage, search performance will be unchanged

If Bucket is moved to a directory on Slower disks search Performance will be impacted.

Thawing a bucket requires user action

Question 20

Q

What do Frozen buckets typically contain?

Answer

Study These Flashcards

A

Frozen Buckets typically contain only journal data

Indexing Flashcards

(20 cards)