Advanced Searching and Reporting Flashcards
What kind of searches are prime candidates for optimization
Searched that run often or query broad amounts of data
What is stored in a journal.gz file and a .tsidx file on the buckets within indexers?
Compressed raw event data is stored in journal
Reference to journals raw events is stored in .tsidx
What are the components of the .tsidx file?
Lexicon with unique terms from event data
Posting list provides reference to values array Values Array is has posting value and a seek address as reference into the journal.gz
What is a bloom filter?
A bit array associated with each bucket and search string used to predict if a lexicon term is likely to be found in the bucket
Are false positives and negatives possible with a bloom filters?
False positive are possible,
False negatives are not possible
What is the series of events for retrieving event data with a bloom filter?
- Searchstring bloom filter created
- Find buckets in index within timerange
- Compare search bloom to bucket bloom
- If a match, find search terms in .tsidx
- Use .tsidx to get events from journal.gz
- Do search time extractions for final filter
What does the job inspector command.search.index inform you of?
The time to get location info in .tsidx
What does the job inspector command.search.rawdata inform you of?
Time to extract event data from journal.gz
What does the job inspector command.search.kv inform you of?
Time to perform search time field extractions
What do you use to calculate performance with the job inspector?
scanCount/time to get events per second including the time to read all events from disk
In a distributed environment will the search execute faster if commands are on the SH or the indexer?
Execute faster on the indexer
Where are transforming commands executed?
Operate on the entire results set on the Search head
Does order of events matter when running a transforming command?
no
What are the two types of streaming command?
Distributable - could be run on indexer
centralized - always run on search head
Is the event order important for streaming commands?
Distributable - No
Centralized - Yes
When is a distributable command run on the search head vs the indexer?
Search head if any preceding commands are executed on search head
Indexer if all preceding commands execute on indexer
Do streaming commands need the entire event result set prior to executing?
Distributable - no
Centralized - yes
Do streaming commands operate on the entire results set of event data?
No they operate on each event returned by a search
How does having more disk reads affect search execution?
More disk reads leads to longer search execution time
How does splunk decide which events to read after determining which buckets match bloom filters?
Tokens (or terms) from search string are compared to tokens in events and match results in event being read from disk
How are event tokens derived?
Derived by breaking up searches and event data using segmenters
What are segmenters?
Major or minor breakers that separate searches and events into smaller pieces
What are major breakers?
Character set used to divide words, phrases, terms into large tokens: space, newline, carriage return, tab, [] () {} ! ? ; , ‘ “ &
What are minor breakers?
Used to divide large tokens into smaller tokens: / : = @ . - $ # % \ \ _
Where are tokens created from event data stored?
.tsidx files
Where can you see how the base search was tokenized?
Use the job inspector and look for the token after ‘base lispy’
In prefix notation where does the operator appear?
Before the operands. Ex with search index=web 21.12: lispy: [ AND 12 21 index::web]
What is a directive?
An instruction for how part of a search should be processed
What does the case sensitive ‘TERM’ directive do?
Forces Splunk to only look for a complete value by searching only based on major breakers and skip minor breakers - term must be bound by major breakers
Can key value pairs be passed into the TERM() directive?
Yes because key=value only has a minor breaker in it
Does negation in a search yield negation in a lispy?
Works for negating single terms but not for terms that include minor breakers UNLESS you use TERM()
Ex that will work: NOT TERM(example.1)
Ex does not work: NOT example.1
Do wildcards in a search work in a lispy?
Only when they are at the middle or end of a string and have no major or minor breakers
How do index time fields appear in lispy?
Appear as field::value instead of field=value like they do in a search
As more fields are extracted at index time what happens to the size of .tsidx files and resource usage at indexer?
Both are increased
What type of data can have indexed field extraction to a specific source type>?
Sourcetypes with certain types of structured data (JSON, CSV, W3C)
Do comparisons against fields extracted at search time result in filtering events returned from the disk?
No, all events from a sourcetype will still be read from the disk (except for an equals operator).
What would the lispy be for a search: index=web sourcetype=a_c status>400
Lispy: [ AND index::web sourcetype::a_c ]
Comparisons are not included when filtering what is read from disk
Does the TERM() directive work with aliases?
No
When are lookups completed while using a lispy?
In the search itself, lookup can be done before the lispy is created.
Lookups done for transformations or other pipe commands can be done post lispy
What does a subsearch do?
Takes the results from an inner search and using boolean AND, combines the results with the outer search - an OR boolean is inserted between each inner search result
What command do subsearches typically begin with?
[ search search_criteria… ]
Is an inner search or outer search completed first?
Inner subsearch completed first
How do you send only specific fields of a subsearch to the outer search?
Using fields or results command
What is returned by default when using the results command in a subsearch?
First value of each specified field is returned with the field name and the field value
How do you adjust the defaults for the |return command in a subsearch?
Specify a number with ‘count’ and omit a field name with $before field name:
|return 5 $ip_address => returns first 5 values of ip_address
Can you return subsearch results as an alias?
Yes, use |return alias_name=field
What is the time and event count limit on a subsearch?
60 seconds and 10,000 events
Over what time range will a subsearch execute if the root search is run in real-time?
Run over all time by default unless setting earliest and latest restriction in subsearch
Should you use stats and/or eval over using subsearch?
Yes, whenever possible especially if searches are executed often
What does the append command do?
Using only historical data, appends results of subsearch to the current results
Does the |append command overlay primary and subsearch results?
No, they will appear one after another in a graph (when used with both stats and timechart as main search)
How do you get appended results to overlay primary search results?
Use | stats first(*) as *
Or | timechart first(*) as *
What command will overlay search and subsearch results in one step?
|appendcols
When should you take caution using the appendcols command?
When used with stats because if there is a null value in the data, the empty values get pushed up and values may not be aligned to fields appropriately
For larger amounts of data, should you use append and appendcols?
No, they should be avoided as more efficient searches should be used
If a search can be done with stats or join/union which is usually more efficient?
stats command is usually more efficient
What does the join command do?
Combines results from two searches with a default of inner join - results only include events from first search that match the second search
What does the left, or outer join argument of the join command define?
Includes results from only the first search and those events matching in the second search
How do you define which fields to use for the join command?
Define after the |join command:
| join fields_to_use
What search combines two or more result sets into a single search?
union
What can be used to specify a result search when using the union command?
Use a regular search, subsearch or a data model(defined with datamodel command)
Does the union command execute on the search head or indexer?
As a distributable streaming command it executes in parallel on indexers if all searches are distributable otherwise on search head
What is the union syntax?
union datamodel:name1.dataset datamodel:name2.dataset …
Search1 |union [search2]
Search1 |union datamodel:search2 …
What command puts numerical values into discrete sets?
| where binoptions is optional
bin binoptions field
How are bin sizes set?
Through the span=size option
| bin fieldname span=size
What happens if span created more buckets than the max specified by bins?
bins is ignored
What command reformats chartable, tabular output as a stats-like output?
| xfield:x-axis yfield:data labels datafield:fields with the charted data
untable xfield yfield datafield
What command reformats stats-like output as chartable, tabular output?
xyseries xfield yfield datafield
What does the forearch command do?
foreach replaces the <> token with field names that match: |foreach www* [eval <> = round(<>/2)]
What are multivalue eval functions used for?
Used to analyze and format multivalue data
What command do you use to convert a single value into a multivalue field /
|makemv command
What do JSON array contents become when auto extracted by Splunk?
Contents become multivalue fields
In JSON data, what to the {} and [] indicate?
{} is an object, a grouping of field value pairs
[] is an array of objects
How are fields nested within a JSON event represented when extracted into Splunk?
Event_name{}.field1
Event_name{}.field2 …
Which commands can you use with multivalue functions?
eval where and fieldformat commands
What does the mvsort() function do?
Intakes a multivalue field and returns the values sorted lexicographically
How are numbers sorted in lexicographical order?
Numbers are sorted before letters and are sorted based on the first digit not the number as a whole: 100 200 70 9 is in order
Are uppercase or lowercase letters first in lexicographical order?
Uppercase is first
Can functions and commands process fields that contain a {}
No, so mv fields extracted from JSON will have to be renamed
What does the mvfilter function do?
Filters(refines) one mvfield based on a boolean expression
How do you remove null values returned from mvfilter function?
Use mvfilter(isnotnull(x))
What function concatenates individual values from a mvfield and uses a delimiter as a separator?
mvjoin(fieldname, “delimiter”)
What function takes 2 mvfields and concatenates the first values of each, the second values of each etc. with a delimiter to separate?
mvzip(mvfield1,mvfield2,”delimiter”)
What does the mvcount function do?
Returns count of values in the specified mvfield returning null if no field values or field does not exist: mvcount(fieldname)
What does the split function do?
Takes a single value field and a delimiter to split by and creates a new mvfield: split(fieldname, “delim”)
What does the mvindex function do?
Take an mvfield and an integer to return a value at the specified integer index in the mv array - indexing starts at 0 NOT 1 : mvindex(fieldname, indexnum)
What command converts an existing single value field to a mvfield based on delimiter or regex(referred to as a tokenizer)
makemv (delim=string | tokenizer=regex) fieldname
What does the mvexpand command do?
Takes mvfield and creates separate event for each value in the mvfield:
|mvexpand fieldname
Does mvexpand command create new events on disk/in index?
No, only created in memory for purposes of search at hand
What is referred to as any group of conceptually related events?
A transaction via the | transaction command
What does the |transaction command enable?
Enables you to specify criteria used to determine how to group events via ranges of time, # of events, text contained in events
Is stats or transaction faster ?
Stats, as transaction is resource intensive and should be used only when stats is insufficient
In what order do events need to be ingested for transactions to work?
Reverse chronological order
In order to use |transaction, how do you correct events that are not coming in reverse cron order?
Use | sort -_time
Use right before |transaction
How do you find events that occur before or after a specific event?
Use |transaction fieldname (endswith=() | startswith=() )
What function is used to normalize field names by taking a number of arguments and returning first one that is not null and storing it as new field?
coalesce(field1,field2,…)
What does the keepevicted=1 argument used for when dealing with transactions?
It is a setting used to retain any transactions where one or both of beginning/ending criteria are not satisfied (transaction did not complete successfully)
What field is used to determine if a transaction is complete or incomplete?
closed_txn = 1 if transaction is a success or closed_txn = 0 if transaction is not a success
What has to be met for a transaction to be closed?
One or more of these criteria are met: maxevents, maxpause, maxspan, startswith, endswith
Where does |transaction execute?
Executes on the SH as it is a centralized streaming command
Does |transaction require access to all the _raw data?
Yes, because search is forced to send all _raw data back from indexers to search head as transactions require all event data
Can the timepicker be overridden during a search?
Yes through the earliest= & latest= time modifiers
When snapping to a time, does the time round up or down?
Always rounds down (backward to a previous time)
How do you define a search for the past 24 hours using earliest and latest modifiers?
Mainsearch earliest=-24h@h latest=@h
What are default time fields?
date_* fields (time/date) stamps taken directly from raw events providing extra info for searching but these are not representative of time zone conversions or time value changes
How would you exclude events from the current day?
latest=@d
How would you include data beginning at the start of the day, 2 days ago?
earliest=-2d@d
What does date_hour.=2 AND date_hour<5 represent?
Find events between 2am and 5am
What function can be used to create a new field with adjusted time zones?
|eval new_time_field =strftime(_time, “%H”)
will get the hour from the event and convert the hour to you local time based on time zone setting