Search Flashcards
What is the system recommodation for the reference searchhead?
16 cores, 12 GB RAM, RAID1
What is the system recommodations for the high end searchhead?
There is no high end reference available. As a general rule, one search consumes up to one core. If you have a high amount of users who search, the amount of CPU cores should be higher.
When a user performed a search, where does the search artifacts live and for how long?
The search artifacts lives under /opt/splunk/var/run/splunk/dispatch
TTL for ad-hoc and manual searches incl remote searches are 10 mins
Scheduled search do live twice the schedule period
What is SRS in terms of search artifacts?
Search artifacts are stored in the dispatch folder. The dispatch folder contains several directories which related to the SID of the search.
Each SID directory has a results.srs.gz, which contain the splunk search result (SRS) in a binary serialization format, it is per default not human readable.
To convert the SRS in a readable CSV format, use the splunkd toCsv [output path] tool
What does the job inspector do?
The job inspector is report which gets generated for each search.
The job inspector does contain valuable information of the search costs.
It also contains informations of how Splunk breaks down the searches into the central and remote parts.
List some of the ways to minimize the search costs?
- Define index and sourcetype
- Set the time range
- Make sure to use a search mode which fits to your needs
- Try to avoid using NOT, use AND instead
- Try to avoid using transactions, use stats first() instead
- Try to avoid using join, use stats first() instead
- Use streaming commands before non-streaming commands
- Instead of using wildcards, be more specific
- Limit the output of your search
- Try to use the TERM() directive for eg ip addresses
- Use fields to only work with the important fields
- Use filter commands before calculating commands
- Use Data Model Acceleration or Report Acceleration
- Use the job inspector to recognize the slowest part of your search and tune it
How does the inner-search works in Splunk?
Example on how a search works:
1) Search for index=main name=peter within the last 24 hours
2) Search gets streamed to the Indexer tier (after checking if there is enough disk space available and detention mode)
3) Indexer checks if the queried index already exists
4) Splunk hashed the search terms (name=peter) and compares them to the hashes in the bloomfilter, which reside in the related index
5) The bloomfilter provides informations if the search term does NOT exisit in the buckets
6) If there is a match in the hashes, Splunk now checks the TSIDX files related to the positive buckets to find out where exactly the raw data is located
7) The TSIDX files provide a seek address and Splunk now finds the data in the journal.gz files and uncompresses them
What is the size of an uncompressed slice in the journal.gz?
~128KB of uncompressed data make up a slice
Why should you avoid wildcards in your search?
Because wildcards are not compatible with bloomfilters and searches will take longer
Why should you avoid the NOT operator in your search?
Bloom filters are designed to quickly locate data. Searching for terms which does not exists, will take longer (use AND or OR operator instead).
What is the order of extractions processed on the raw data (props.conf)?
- Inline field extraction (EXTRACT-)
- Field extraction using a field transform (REPORT-)
- Automatic key-value field extraction (KV_MODE)
- Field aliases (FIELDALIAS-)
- Calculated fields (EVAL-)
- Lookups (LOOKUP-)
- Event types (eventtypes.conf)
- Tags (tags.conf)
What is the difference between search type and search mode?
Search mode is the method which Splunk uses to process the data on a search-time level. There are 3 different modes available (fast, smart, verbose).
Search type describes the way the SPL is used. On a high level, there are two types of searches:
- raw searches (typcially searching for eg http codes)
- transforming searches (eg performing statistical calculation)
Before a search gets send out to the indexer, which two parameters gets checked before performing the search on an indexer?
- Available disk space
- Detention (active|inactive)
On a low level, what 4 types of searches does exist?
- Streaming commands
- Non-streaming commands
- Transforming commands
- Generating commands
If a search begins with a | (pipe), what kind of search is that?
A generation command