Indexing Flashcards

1
Q

What is the most important field in an event when indexing?

A

Timestamp Followed by sourcetype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are indexes?

A

Buckets Hot -> Warm -> Cold -> Frozen

- Raw data in compressed form (journal.gz), TSIDX files, and other metadata

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Data Pipeline?

A

Parsing Queue -> Aggregator Queue -> Typing Queue -> Index Queue

  • HF skips index queue and sends data to Indexing Tier, but performs all other functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are homePath, coldPath, thawedPath?

A

Indexed data = a mix of files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the summaryHomePath?

A

Report Acceleration summaries = actually CSV files(results)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the tstatsHomePath ?

A

Data Model Acceleration summaries = These are TSIDX files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Events?

A

Records of activity found in inputs and machine data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the Parsing Queue Line Breaker

A

Splunk will split data stream into events using ([\r\n]+)
- Create alternate line delimiters using custom regular expressions (regex)

Using this may increase indexing speed

UTF-8 Encoding is the default (Can be changed via CHARSET in props)

Line-Breaker ([\r\n]+) Newline setting is the default but can be configured via Props for outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the Parsing Queue Header

A

This setting applies at input time, when data is first read by Slunk.

HEADER_MODE (Props) Setting is used to handle log files with consistent header information

  • empty (Default) | always | firstline | none
  • If “always”, any line with SPLUNK can be used to rewrite index-time fields.
  • If “firstline”, only the first line can be used to rewrite index-time fields.
  • If “none”, the string SPLUNK is treated as normal data.
  • If , scripted inputs take the value “always” and file inputs take the value “none”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decribe the Merging Pipeline Aggregator

A

Splunk will merge lines separated by line breaker into events

Best practice for efficiency: Use SHOULD_LINEMERGE=False combined with appropriate Line Breaker settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the Typing Pipeline Regex Replacement

A
  • Anonymizing Sensitive Data with SEDCMD

- Must define new indexed fields in Fields.conf file on SH and Indexers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the Typing Pipeline Annotator

A

Splunk can identify punctuation patterns to find similar events
- ANNOTATE_PUNCT = t|f Determines whether to index special token stating with “punct::”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the Indexing Pipeline TCP Out

A

Splunk can send data to other SPlunk instances, or external log sources.

Dedicated processor: forward data over TCP (raw or s2s)

Maintains queue for each tcpout group (group of indexers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the Indexing Pipeline Syslog Out

A

The syslog output processor sends RFC 3164-Compliant events to a TCP/UDP-based server and port, making the payload of any non-compliant data RFC 3164-compliant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the Indexing Pipeline Indexer

A

Splunk Transforms data into events and stores it in indexes

  • Internal data is written to several preconfigured indexes
  • External data is written to a single, preconfigured index
  • Additional indexes can be created to meet specific data requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe indexing pipeline Segmentation

A

A segment is a searchable part of an event

Splunk breaks events into segments at index/search time

  • Major Breakers - From segmenters.conf - [ ] < > ( ) { } | ! ; , ‘ “ * \n \r \s \t / : = @ . ? - & $ # + % _ \ %
  • Minor Breakers - Configurable to break events into smaller chunks
17
Q

Describe a Time Series Index

A

These are TSIDX files. Optimized to execute arbitrary boolean keyword searches and return millions of events in reverse time order

Inverted Index

  • Allows for fast full text searches
  • Maps Keywords to locations in raw data

Two basic Components

  • Lexicon
  • Value Arrays containing information about events.
18
Q

What is Retention?

A

How long to keep the data and how long it should be searchable.

Allow Splunk to expire old data to make room for new data

Most restrictive rule wins

19
Q

What is retention “change in state”?

A

Buckets could move from Warm -> Cold, Cold -> Frozen, or Warm->Frozen
Buckets can not move from Hot -> Frozen

If a bucket is moved from Hot -> Warm on the same storage, search performance will be unchanged

If Bucket is moved to a directory on Slower disks search Performance will be impacted.

Thawing a bucket requires user action

20
Q

What do Frozen buckets typically contain?

A

Frozen Buckets typically contain only journal data