Search Flashcards

Question 1

Q

What is the system recommodation for the reference searchhead?

Answer

A

16 cores, 12 GB RAM, RAID1

Question 2

Q

What is the system recommodations for the high end searchhead?

Answer

A

There is no high end reference available. As a general rule, one search consumes up to one core. If you have a high amount of users who search, the amount of CPU cores should be higher.

Question 3

Q

When a user performed a search, where does the search artifacts live and for how long?

Answer

A

The search artifacts lives under /opt/splunk/var/run/splunk/dispatch

TTL for ad-hoc and manual searches incl remote searches are 10 mins

Scheduled search do live twice the schedule period

Question 4

Q

What is SRS in terms of search artifacts?

Answer

A

Search artifacts are stored in the dispatch folder. The dispatch folder contains several directories which related to the SID of the search.

Each SID directory has a results.srs.gz, which contain the splunk search result (SRS) in a binary serialization format, it is per default not human readable.

To convert the SRS in a readable CSV format, use the splunkd toCsv [output path] tool

Question 5

Q

What does the job inspector do?

Answer

A

The job inspector is report which gets generated for each search.

The job inspector does contain valuable information of the search costs.

It also contains informations of how Splunk breaks down the searches into the central and remote parts.

Question 6

Q

List some of the ways to minimize the search costs?

Answer

A

Define index and sourcetype
Set the time range
Make sure to use a search mode which fits to your needs
Try to avoid using NOT, use AND instead
Try to avoid using transactions, use stats first() instead
Try to avoid using join, use stats first() instead
Use streaming commands before non-streaming commands
Instead of using wildcards, be more specific
Limit the output of your search
Try to use the TERM() directive for eg ip addresses
Use fields to only work with the important fields
Use filter commands before calculating commands
Use Data Model Acceleration or Report Acceleration
Use the job inspector to recognize the slowest part of your search and tune it

Question 7

Q

How does the inner-search works in Splunk?

Answer

A

Example on how a search works:

1) Search for index=main name=peter within the last 24 hours
2) Search gets streamed to the Indexer tier (after checking if there is enough disk space available and detention mode)
3) Indexer checks if the queried index already exists
4) Splunk hashed the search terms (name=peter) and compares them to the hashes in the bloomfilter, which reside in the related index
5) The bloomfilter provides informations if the search term does NOT exisit in the buckets
6) If there is a match in the hashes, Splunk now checks the TSIDX files related to the positive buckets to find out where exactly the raw data is located
7) The TSIDX files provide a seek address and Splunk now finds the data in the journal.gz files and uncompresses them

Question 8

Q

What is the size of an uncompressed slice in the journal.gz?

Answer

A

~128KB of uncompressed data make up a slice

Question 9

Q

Why should you avoid wildcards in your search?

Answer

A

Because wildcards are not compatible with bloomfilters and searches will take longer

Question 10

Q

Why should you avoid the NOT operator in your search?

Answer

A

Bloom filters are designed to quickly locate data. Searching for terms which does not exists, will take longer (use AND or OR operator instead).

Question 11

Q

What is the order of extractions processed on the raw data (props.conf)?

Answer

A

Inline field extraction (EXTRACT-)
Field extraction using a field transform (REPORT-)
Automatic key-value field extraction (KV_MODE)
Field aliases (FIELDALIAS-)
Calculated fields (EVAL-)
Lookups (LOOKUP-)
Event types (eventtypes.conf)
Tags (tags.conf)

Question 12

Q

What is the difference between search type and search mode?

Answer

A

Search mode is the method which Splunk uses to process the data on a search-time level. There are 3 different modes available (fast, smart, verbose).

Search type describes the way the SPL is used. On a high level, there are two types of searches:

raw searches (typcially searching for eg http codes)
transforming searches (eg performing statistical calculation)

Question 13

Q

Before a search gets send out to the indexer, which two parameters gets checked before performing the search on an indexer?

Answer

A

Available disk space

- Detention (active|inactive)

Question 14

Q

On a low level, what 4 types of searches does exist?

Answer

A

Streaming commands
Non-streaming commands
Transforming commands
Generating commands

Question 15

Q

If a search begins with a | (pipe), what kind of search is that?

Answer

A

A generation command

Question 16

Q

What is an example of a streaming command?

Answer

A

Streaming commands gets streamed to the indexer. The indexer then takes over this part of the search and streams back the results.

A typical example is the eval command or the rex command.

Question 17

Q

What is the difference between a non-streaming command and a centralized command?

Answer

A

It is the same. Non-streaming commands does not get streamed to an indexer, hence they are performed centralized.

Question 18

Q

List 4 examples of non-streaming commands

Answer

A

top
dedup
stats
sort
many transforming commands

Question 19

Q

List 4 examples of transforming commands

Answer

A

timechart
chart
stats
top
rare

Question 20

Q

What are the characteristics of a non-streaming command?

Answer

A

A non-streaming command requires the events from all of the indexers before the command can operate on the entire set of events

Question 21

Q

What is the reason to use a subsearch?

Answer

A

A typical scenario of using a subsearch is, if the target is a moving host (eg most active host today). Once the target has been detected, the inner-search hands over its results to the outer-search (main search)

Question 22

Q

If the outer-search of a subsearch has a time defined (earliest=-30m), does it apply to the subsearch too?

Answer

A

No, the defined time in the outer search does not apply to the subsearch. Only the time which is defined in the global time range picker applies to the subsearch.

If the time range in both searches needs to be different, set the time directly in the subsearch too.

Question 23

Q

What are the limitations of a subsearch?

Answer

A

A subsearch can only display/process up to 10k (can be changed through limits.conf) events. The search also runs maximum 60s before it stopps. The user experience and the results can be sluggish.

A subsearch is only recommended to use for a small set of data.

Question 24

Q

A subsearch has a runtime limitation of 60s per default. Where can it be changed?

Answer

A

limits.conf

Question 25

Q

What is an alternative to a subsearch?

Answer

A

the stats() command (works not in all cases)

Question 26

Q

What is the difference between a remoteSearch and reportSearch in regards to the Job Inspector?

Answer

A

A remoteSearch is performed on indexer (eg streaming commands) and a reportSearch works locally on the SearchHead (eg non-streaming commands)

Question 27

Q

What is the meaning of the field ‘dispatch.check_disk_usage’ in the Job Inspector?

Answer

A

The time spent checking the disk usage of this job

Question 28

Q

What is the meaning of the field ‘eai:acl’ in the Job Inspector?

Answer

A

Describes the app and user-level permissions. For example, is the app shared globally, and what users can run or view the search?

Question 29

Q

If you want to meassure how a search performed, which field do you use check in the Job Inspector?

Answer

A

scanCount/second

Rate should hover between 10k and 20k events per second for performance to be deemed good

Question 30

Q

What are the characteristics of a search with the mode ‘fast’ ?

Answer

A

Splunk only returns information on default fields and fields that are required to fulfill your search. If you are searching on specific fields, those fields are extracted.

Under the Fast mode you will see only event lists and event timelines for searches that do not include transforming commands

Question 31

Q

What is the meaning of the field ‘command.search.kv’ in the Job Inspector?

Answer

A

Tells how long it took to apply field extractions to the events.