Advanced Searching and Reporting Flashcards
What kind of searches are prime candidates for optimization
Searched that run often or query broad amounts of data
What is stored in a journal.gz file and a .tsidx file on the buckets within indexers?
Compressed raw event data is stored in journal
Reference to journals raw events is stored in .tsidx
What are the components of the .tsidx file?
Lexicon with unique terms from event data
Posting list provides reference to values array Values Array is has posting value and a seek address as reference into the journal.gz
What is a bloom filter?
A bit array associated with each bucket and search string used to predict if a lexicon term is likely to be found in the bucket
Are false positives and negatives possible with a bloom filters?
False positive are possible,
False negatives are not possible
What is the series of events for retrieving event data with a bloom filter?
- Searchstring bloom filter created
- Find buckets in index within timerange
- Compare search bloom to bucket bloom
- If a match, find search terms in .tsidx
- Use .tsidx to get events from journal.gz
- Do search time extractions for final filter
What does the job inspector command.search.index inform you of?
The time to get location info in .tsidx
What does the job inspector command.search.rawdata inform you of?
Time to extract event data from journal.gz
What does the job inspector command.search.kv inform you of?
Time to perform search time field extractions
What do you use to calculate performance with the job inspector?
scanCount/time to get events per second including the time to read all events from disk
In a distributed environment will the search execute faster if commands are on the SH or the indexer?
Execute faster on the indexer
Where are transforming commands executed?
Operate on the entire results set on the Search head
Does order of events matter when running a transforming command?
no
What are the two types of streaming command?
Distributable - could be run on indexer
centralized - always run on search head
Is the event order important for streaming commands?
Distributable - No
Centralized - Yes
When is a distributable command run on the search head vs the indexer?
Search head if any preceding commands are executed on search head
Indexer if all preceding commands execute on indexer
Do streaming commands need the entire event result set prior to executing?
Distributable - no
Centralized - yes
Do streaming commands operate on the entire results set of event data?
No they operate on each event returned by a search
How does having more disk reads affect search execution?
More disk reads leads to longer search execution time
How does splunk decide which events to read after determining which buckets match bloom filters?
Tokens (or terms) from search string are compared to tokens in events and match results in event being read from disk
How are event tokens derived?
Derived by breaking up searches and event data using segmenters
What are segmenters?
Major or minor breakers that separate searches and events into smaller pieces
What are major breakers?
Character set used to divide words, phrases, terms into large tokens: space, newline, carriage return, tab, [] () {} ! ? ; , ‘ “ &
What are minor breakers?
Used to divide large tokens into smaller tokens: / : = @ . - $ # % \ \ _