Advanced Searching and Reporting Flashcards
What kind of searches are prime candidates for optimization
Searched that run often or query broad amounts of data
What is stored in a journal.gz file and a .tsidx file on the buckets within indexers?
Compressed raw event data is stored in journal
Reference to journals raw events is stored in .tsidx
What are the components of the .tsidx file?
Lexicon with unique terms from event data
Posting list provides reference to values array Values Array is has posting value and a seek address as reference into the journal.gz
What is a bloom filter?
A bit array associated with each bucket and search string used to predict if a lexicon term is likely to be found in the bucket
Are false positives and negatives possible with a bloom filters?
False positive are possible,
False negatives are not possible
What is the series of events for retrieving event data with a bloom filter?
- Searchstring bloom filter created
- Find buckets in index within timerange
- Compare search bloom to bucket bloom
- If a match, find search terms in .tsidx
- Use .tsidx to get events from journal.gz
- Do search time extractions for final filter
What does the job inspector command.search.index inform you of?
The time to get location info in .tsidx
What does the job inspector command.search.rawdata inform you of?
Time to extract event data from journal.gz
What does the job inspector command.search.kv inform you of?
Time to perform search time field extractions
What do you use to calculate performance with the job inspector?
scanCount/time to get events per second including the time to read all events from disk
In a distributed environment will the search execute faster if commands are on the SH or the indexer?
Execute faster on the indexer
Where are transforming commands executed?
Operate on the entire results set on the Search head
Does order of events matter when running a transforming command?
no
What are the two types of streaming command?
Distributable - could be run on indexer
centralized - always run on search head
Is the event order important for streaming commands?
Distributable - No
Centralized - Yes
When is a distributable command run on the search head vs the indexer?
Search head if any preceding commands are executed on search head
Indexer if all preceding commands execute on indexer
Do streaming commands need the entire event result set prior to executing?
Distributable - no
Centralized - yes
Do streaming commands operate on the entire results set of event data?
No they operate on each event returned by a search
How does having more disk reads affect search execution?
More disk reads leads to longer search execution time
How does splunk decide which events to read after determining which buckets match bloom filters?
Tokens (or terms) from search string are compared to tokens in events and match results in event being read from disk
How are event tokens derived?
Derived by breaking up searches and event data using segmenters
What are segmenters?
Major or minor breakers that separate searches and events into smaller pieces
What are major breakers?
Character set used to divide words, phrases, terms into large tokens: space, newline, carriage return, tab, [] () {} ! ? ; , ‘ “ &
What are minor breakers?
Used to divide large tokens into smaller tokens: / : = @ . - $ # % \ \ _
Where are tokens created from event data stored?
.tsidx files
Where can you see how the base search was tokenized?
Use the job inspector and look for the token after ‘base lispy’
In prefix notation where does the operator appear?
Before the operands. Ex with search index=web 21.12: lispy: [ AND 12 21 index::web]
What is a directive?
An instruction for how part of a search should be processed
What does the case sensitive ‘TERM’ directive do?
Forces Splunk to only look for a complete value by searching only based on major breakers and skip minor breakers - term must be bound by major breakers
Can key value pairs be passed into the TERM() directive?
Yes because key=value only has a minor breaker in it
Does negation in a search yield negation in a lispy?
Works for negating single terms but not for terms that include minor breakers UNLESS you use TERM()
Ex that will work: NOT TERM(example.1)
Ex does not work: NOT example.1
Do wildcards in a search work in a lispy?
Only when they are at the middle or end of a string and have no major or minor breakers
How do index time fields appear in lispy?
Appear as field::value instead of field=value like they do in a search
As more fields are extracted at index time what happens to the size of .tsidx files and resource usage at indexer?
Both are increased
What type of data can have indexed field extraction to a specific source type>?
Sourcetypes with certain types of structured data (JSON, CSV, W3C)
Do comparisons against fields extracted at search time result in filtering events returned from the disk?
No, all events from a sourcetype will still be read from the disk (except for an equals operator).
What would the lispy be for a search: index=web sourcetype=a_c status>400
Lispy: [ AND index::web sourcetype::a_c ]
Comparisons are not included when filtering what is read from disk
Does the TERM() directive work with aliases?
No
When are lookups completed while using a lispy?
In the search itself, lookup can be done before the lispy is created.
Lookups done for transformations or other pipe commands can be done post lispy
What does a subsearch do?
Takes the results from an inner search and using boolean AND, combines the results with the outer search - an OR boolean is inserted between each inner search result
What command do subsearches typically begin with?
[ search search_criteria… ]
Is an inner search or outer search completed first?
Inner subsearch completed first
How do you send only specific fields of a subsearch to the outer search?
Using fields or results command