Search Flashcards

Question 1

Q

Describe the anatomy of a Search:

Answer

A

Request is received
Disk space on indexer is checked
Create dispatch directory in $SPLUNK_HOME/var/run/dispatch
Initialize config subsystem (props.conf, transforms.conf) using bundle identified by SH $SPLUNK_HOME/var/run//searchpeers/-
Find buckets that match the time of the search
Consult the bloom filters
Find events matching any keywords within the lexicon (.tsdx files)
Use the results returned to find the event offsets within the raw data from the value array
Uncompress the appropriate slice in the rawdata/journal.gz to get the _raw for the event(s)
Process the raw data with the automatic extractions in this order:
- sourcetype RENAME, EXTRACT-xxx, REPORT-xxx, KV_MODE, FIELDALIAS-xxx, EVAL-yyy, LOOKUP-xxx
Send the results to the Search Head

Question 2

Q

If indexer is in manual detention, can it still e searched?

Question 3

Q

what does the dispatch directory contain?

Answer

A

Contains search status, results, log, and extracted fields in CSV format.

Kept for 10 minutes by default.

Question 4

Q

What are the initialize configs for the Search Head

Answer

A

Bundle is sent from the SH and includes knowledge Objects(KO) (saved searches, lookups, eventtypes.

Process of distributing KOs means that peers by default receive nearly the entire contents of the SH’s Apps.

Question 5

Q

During search, does the indexer check to see if it has enough disk space to run the search?

Answer

A

Yes.

diskUsage and detention settings are checked in server.conf on Indexer
If indexer is in Manual detention, it can still be searched.

Question 6

Q

Are hot buckets included in every search?

Answer

A

No, Hot buckets are not touched if the time range does not require it.

Question 7

Q

What are bloom filters?

Answer

A

Hash table that can eliminate buckets, therefore indexer only needs to search buckets that are not ruled out by the Bloom Filter.
The execution cost of retrieving events from disk grows with the size and number of tsidx files.
Bloom filters decrease the number of tsidx files that the indexer needs to search, decreasing the time it takes to search each bucket.
If a (warm or cold) filter-less bucket is older than the configured maxBloomBackfillBucketAge in indexes.conf, Splunk will not create a bloom filter for that bucket.

Question 8

Q

What is the Splunk lexicon?

Answer

A

Finds events that match the keywords in the search.

A location tag is created for the location of the keyword in a file

Question 9

Q

What is the order of extractions processed on the raw data (props.conf)?

Answer

A

Inline field extraction (EXTRACT-)
Field extraction using a field transform (REPORT-)
Automatic key-value field extraction (KV_MODE)
Field aliases (FIELDALIAS-)
Calculated fields (EVAL-)
Lookups (LOOKUP-)
Event types (eventtypes.conf)
Tags (tags.conf)

Question 10

Q

Describe Job Inspection

Answer

A

Allows for the post mortem inspection of search metrics.
Time Spent on Commands
Time Spent searching
Time spent fetching
Workload undertaken by search peers
Also available via REST /services/search/jobs
The Execution costs section contains information about the search processing components that were used to process your search.
With this information you can troubleshoot the efficiency of your search by narrowing down which processing components are impacting the search performance.
The fields shown in the Search job properties section provide information about the search job like the total amount of disk space used (in bytes), and the number of possible events that were dropped (for real-time searches).

Question 11

Q

WHat are the different types of search commands?

Answer

A

streaming
non-streaming
transforming
generating

Question 12

Q

Describe streaming commands:

Answer

A

Operate on each event individually
Distributable Streaming commands run on Indexers
Eval, fields, rename, regex
Improves Processing time, but all commands prior must also be able to be run on the indexer, else the search is run on the SH.
Order of events does not matter.
Centralized/stateful streaming run on search heads
- head, streamstats
- order of events matter
- only work on searchhead

Question 13

Q

Describe non-streaming commands:

Answer

A

Force the entire set of events to the search head.

- Sort, dedup, top

Question 14

Q

Describe transforming commands:

Answer

A

Non-Streaming Command that operates on the entire Dataset
Generate a reporting Data structure.
Chart, TimeChart, stats.
Can be either Streaming or Reporting
– Streaming Reporting (Stats, Chart) generates output in Batches
– Reporting (CLUSTER, GEOSTATS) Takes all events at once.

Question 15

Q

Describe generating commands:

Answer

A

Invoked at the beginning of a search with a leading |
Do not expect or require input.
Dbinspect, datamodel, inputcsv
Most Generating commands are centralized
Results are usually returned in a list or table

Question 16

Q

What are subsearches?

Answer

A

Best for Small result sets
Join, set require subsearch
Used to produce terms for outer search
Always run first

Question 17

Q

What is a caveat of using calculated fields that are named the same as a lookup field?

Answer

A

Fields from lookups are unavailable when calculated fields reference them in an eval expression.

Question 18

Q

When does a subsearch run?

Answer

A

Subsearches always run before the main search

Question 19

Q

When should you not run a SubSearch?

Answer

A

For subsearches that return many results, it is generally more efficient to use stats and/or eval.

Generally, subsearches take longer that other types of searches
Can be confirmed using the job inspector
GUI provides no feedback while subsearch runs; can result in sluggish user experience

Question 20

Q

How can you tell it is a subsearch?

Answer

A

it is encloded in square brackets []

Question 21

Q

What is the best search advice?

Answer

A

Filter early
Specify an index
Utilize indexed extractions where available
Use the TERM directive if applicable
Place streaming/remote commands before non-streaming commands
Avoid using table, except a the very end
- will cause data to be pushed to the search head
Remove unnecessary data using | fields

Question 22

Q

What two things are a search broken into in the Search Job Properties of the Job Inspector?

Answer

A

remoteSearch(done on the indexers) and reportSearch(part of the search string which happens on the Search Head)