Search Flashcards
Describe the anatomy of a Search:
- Request is received
- Disk space on indexer is checked
- Create dispatch directory in $SPLUNK_HOME/var/run/dispatch
- Initialize config subsystem (props.conf, transforms.conf) using bundle identified by SH $SPLUNK_HOME/var/run//searchpeers/-
- Find buckets that match the time of the search
- Consult the bloom filters
- Find events matching any keywords within the lexicon (.tsdx files)
- Use the results returned to find the event offsets within the raw data from the value array
- Uncompress the appropriate slice in the rawdata/journal.gz to get the _raw for the event(s)
- Process the raw data with the automatic extractions in this order:
- sourcetype RENAME, EXTRACT-xxx, REPORT-xxx, KV_MODE, FIELDALIAS-xxx, EVAL-yyy, LOOKUP-xxx - Send the results to the Search Head
If indexer is in manual detention, can it still e searched?
yes
what does the dispatch directory contain?
Contains search status, results, log, and extracted fields in CSV format.
Kept for 10 minutes by default.
What are the initialize configs for the Search Head
Bundle is sent from the SH and includes knowledge Objects(KO) (saved searches, lookups, eventtypes.
Process of distributing KOs means that peers by default receive nearly the entire contents of the SH’s Apps.
During search, does the indexer check to see if it has enough disk space to run the search?
Yes.
diskUsage and detention settings are checked in server.conf on Indexer
If indexer is in Manual detention, it can still be searched.
Are hot buckets included in every search?
No, Hot buckets are not touched if the time range does not require it.
What are bloom filters?
- Hash table that can eliminate buckets, therefore indexer only needs to search buckets that are not ruled out by the Bloom Filter.
- The execution cost of retrieving events from disk grows with the size and number of tsidx files.
- Bloom filters decrease the number of tsidx files that the indexer needs to search, decreasing the time it takes to search each bucket.
- If a (warm or cold) filter-less bucket is older than the configured maxBloomBackfillBucketAge in indexes.conf, Splunk will not create a bloom filter for that bucket.
What is the Splunk lexicon?
Finds events that match the keywords in the search.
A location tag is created for the location of the keyword in a file
What is the order of extractions processed on the raw data (props.conf)?
- Inline field extraction (EXTRACT-)
- Field extraction using a field transform (REPORT-)
- Automatic key-value field extraction (KV_MODE)
- Field aliases (FIELDALIAS-)
- Calculated fields (EVAL-)
- Lookups (LOOKUP-)
- Event types (eventtypes.conf)
- Tags (tags.conf)
Describe Job Inspection
- Allows for the post mortem inspection of search metrics.
- Time Spent on Commands
- Time Spent searching
- Time spent fetching
- Workload undertaken by search peers
- Also available via REST /services/search/jobs
- The Execution costs section contains information about the search processing components that were used to process your search.
- With this information you can troubleshoot the efficiency of your search by narrowing down which processing components are impacting the search performance.
- The fields shown in the Search job properties section provide information about the search job like the total amount of disk space used (in bytes), and the number of possible events that were dropped (for real-time searches).
WHat are the different types of search commands?
- streaming
- non-streaming
- transforming
- generating
Describe streaming commands:
- Operate on each event individually
- Distributable Streaming commands run on Indexers
- Eval, fields, rename, regex
- Improves Processing time, but all commands prior must also be able to be run on the indexer, else the search is run on the SH.
- Order of events does not matter.
- Centralized/stateful streaming run on search heads
- head, streamstats
- order of events matter
- only work on searchhead
Describe non-streaming commands:
- Force the entire set of events to the search head.
- Sort, dedup, top
Describe transforming commands:
- Non-Streaming Command that operates on the entire Dataset
- Generate a reporting Data structure.
Chart, TimeChart, stats. - Can be either Streaming or Reporting
– Streaming Reporting (Stats, Chart) generates output in Batches
– Reporting (CLUSTER, GEOSTATS) Takes all events at once.
Describe generating commands:
- Invoked at the beginning of a search with a leading |
- Do not expect or require input.
Dbinspect, datamodel, inputcsv - Most Generating commands are centralized
- Results are usually returned in a list or table