Advanced Searching and Reporting Flashcards

1
Q

What kind of searches are prime candidates for optimization

A

Searched that run often or query broad amounts of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is stored in a journal.gz file and a .tsidx file on the buckets within indexers?

A

Compressed raw event data is stored in journal

Reference to journals raw events is stored in .tsidx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the components of the .tsidx file?

A

Lexicon with unique terms from event data
Posting list provides reference to values array Values Array is has posting value and a seek address as reference into the journal.gz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a bloom filter?

A

A bit array associated with each bucket and search string used to predict if a lexicon term is likely to be found in the bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Are false positives and negatives possible with a bloom filters?

A

False positive are possible,

False negatives are not possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the series of events for retrieving event data with a bloom filter?

A
  1. Searchstring bloom filter created
  2. Find buckets in index within timerange
  3. Compare search bloom to bucket bloom
  4. If a match, find search terms in .tsidx
  5. Use .tsidx to get events from journal.gz
  6. Do search time extractions for final filter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the job inspector command.search.index inform you of?

A

The time to get location info in .tsidx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the job inspector command.search.rawdata inform you of?

A

Time to extract event data from journal.gz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the job inspector command.search.kv inform you of?

A

Time to perform search time field extractions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do you use to calculate performance with the job inspector?

A

scanCount/time to get events per second including the time to read all events from disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In a distributed environment will the search execute faster if commands are on the SH or the indexer?

A

Execute faster on the indexer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Where are transforming commands executed?

A

Operate on the entire results set on the Search head

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Does order of events matter when running a transforming command?

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two types of streaming command?

A

Distributable - could be run on indexer

centralized - always run on search head

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Is the event order important for streaming commands?

A

Distributable - No

Centralized - Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is a distributable command run on the search head vs the indexer?

A

Search head if any preceding commands are executed on search head
Indexer if all preceding commands execute on indexer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Do streaming commands need the entire event result set prior to executing?

A

Distributable - no

Centralized - yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Do streaming commands operate on the entire results set of event data?

A

No they operate on each event returned by a search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does having more disk reads affect search execution?

A

More disk reads leads to longer search execution time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does splunk decide which events to read after determining which buckets match bloom filters?

A

Tokens (or terms) from search string are compared to tokens in events and match results in event being read from disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How are event tokens derived?

A

Derived by breaking up searches and event data using segmenters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are segmenters?

A

Major or minor breakers that separate searches and events into smaller pieces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are major breakers?

A

Character set used to divide words, phrases, terms into large tokens: space, newline, carriage return, tab, [] () {} ! ? ; , ‘ “ &

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are minor breakers?

A

Used to divide large tokens into smaller tokens: / : = @ . - $ # % \ \ _

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Where are tokens created from event data stored?
.tsidx files
26
Where can you see how the base search was tokenized?
Use the job inspector and look for the token after ‘base lispy’
27
In prefix notation where does the operator appear?
Before the operands. Ex with search index=web 21.12: lispy: [ AND 12 21 index::web]
28
What is a directive?
An instruction for how part of a search should be processed
29
What does the case sensitive ‘TERM’ directive do?
Forces Splunk to only look for a complete value by searching only based on major breakers and skip minor breakers - term must be bound by major breakers
30
Can key value pairs be passed into the TERM() directive?
Yes because key=value only has a minor breaker in it
31
Does negation in a search yield negation in a lispy?
Works for negating single terms but not for terms that include minor breakers UNLESS you use TERM() Ex that will work: NOT TERM(example.1) Ex does not work: NOT example.1
32
Do wildcards in a search work in a lispy?
Only when they are at the middle or end of a string and have no major or minor breakers
33
How do index time fields appear in lispy?
Appear as field::value instead of field=value like they do in a search
34
As more fields are extracted at index time what happens to the size of .tsidx files and resource usage at indexer?
Both are increased
35
What type of data can have indexed field extraction to a specific source type>?
Sourcetypes with certain types of structured data (JSON, CSV, W3C)
36
Do comparisons against fields extracted at search time result in filtering events returned from the disk?
No, all events from a sourcetype will still be read from the disk (except for an equals operator).
37
What would the lispy be for a search: index=web sourcetype=a_c status>400
Lispy: [ AND index::web sourcetype::a_c ] | Comparisons are not included when filtering what is read from disk
38
Does the TERM() directive work with aliases?
No
39
When are lookups completed while using a lispy?
In the search itself, lookup can be done before the lispy is created. Lookups done for transformations or other pipe commands can be done post lispy
40
What does a subsearch do?
Takes the results from an inner search and using boolean AND, combines the results with the outer search - an OR boolean is inserted between each inner search result
41
What command do subsearches typically begin with?
[ search search_criteria… ]
42
Is an inner search or outer search completed first?
Inner subsearch completed first
43
How do you send only specific fields of a subsearch to the outer search?
Using fields or results command
44
What is returned by default when using the results command in a subsearch?
First value of each specified field is returned with the field name and the field value
45
How do you adjust the defaults for the |return command in a subsearch?
Specify a number with ‘count’ and omit a field name with $before field name: |return 5 $ip_address => returns first 5 values of ip_address
46
Can you return subsearch results as an alias?
Yes, use |return alias_name=field
47
What is the time and event count limit on a subsearch?
60 seconds and 10,000 events
48
Over what time range will a subsearch execute if the root search is run in real-time?
Run over all time by default unless setting earliest and latest restriction in subsearch
49
Should you use stats and/or eval over using subsearch?
Yes, whenever possible especially if searches are executed often
50
What does the append command do?
Using only historical data, appends results of subsearch to the current results
51
Does the |append command overlay primary and subsearch results?
No, they will appear one after another in a graph (when used with both stats and timechart as main search)
52
How do you get appended results to overlay primary search results?
Use | stats first(*) as * | Or | timechart first(*) as *
53
What command will overlay search and subsearch results in one step?
|appendcols
54
When should you take caution using the appendcols command?
When used with stats because if there is a null value in the data, the empty values get pushed up and values may not be aligned to fields appropriately
55
For larger amounts of data, should you use append and appendcols?
No, they should be avoided as more efficient searches should be used
56
If a search can be done with stats or join/union which is usually more efficient?
stats command is usually more efficient
57
What does the join command do?
Combines results from two searches with a default of inner join - results only include events from first search that match the second search
58
What does the left, or outer join argument of the join command define?
Includes results from only the first search and those events matching in the second search
59
How do you define which fields to use for the join command?
Define after the |join command: | | join fields_to_use
60
What search combines two or more result sets into a single search?
| union
61
What can be used to specify a result search when using the union command?
Use a regular search, subsearch or a data model(defined with datamodel command)
62
Does the union command execute on the search head or indexer?
As a distributable streaming command it executes in parallel on indexers if all searches are distributable otherwise on search head
63
What is the union syntax?
| union datamodel:name1.dataset datamodel:name2.dataset … Search1 |union [search2] Search1 |union datamodel:search2 …
64
What command puts numerical values into discrete sets?
| bin binoptions field | where binoptions is optional
65
How are bin sizes set?
Through the span=size option | | bin fieldname span=size
66
What happens if span created more buckets than the max specified by bins?
bins is ignored
67
What command reformats chartable, tabular output as a stats-like output?
| untable xfield yfield datafield | xfield:x-axis yfield:data labels datafield:fields with the charted data
68
What command reformats stats-like output as chartable, tabular output?
| xyseries xfield yfield datafield
69
What does the forearch command do?
| foreach replaces the <> token with field names that match: |foreach www* [eval <> = round(<>/2)]
70
What are multivalue eval functions used for?
Used to analyze and format multivalue data
71
What command do you use to convert a single value into a multivalue field /
|makemv command
72
What do JSON array contents become when auto extracted by Splunk?
Contents become multivalue fields
73
In JSON data, what to the {} and [] indicate?
{} is an object, a grouping of field value pairs | [] is an array of objects
74
How are fields nested within a JSON event represented when extracted into Splunk?
Event_name{}.field1 | Event_name{}.field2 …
75
Which commands can you use with multivalue functions?
eval where and fieldformat commands
76
What does the mvsort() function do?
Intakes a multivalue field and returns the values sorted lexicographically
77
How are numbers sorted in lexicographical order?
Numbers are sorted before letters and are sorted based on the first digit not the number as a whole: 100 200 70 9 is in order
78
Are uppercase or lowercase letters first in lexicographical order?
Uppercase is first
79
Can functions and commands process fields that contain a {}
No, so mv fields extracted from JSON will have to be renamed
80
What does the mvfilter function do?
Filters(refines) one mvfield based on a boolean expression
81
How do you remove null values returned from mvfilter function?
Use mvfilter(isnotnull(x))
82
What function concatenates individual values from a mvfield and uses a delimiter as a separator?
mvjoin(fieldname, "delimiter")
83
What function takes 2 mvfields and concatenates the first values of each, the second values of each etc. with a delimiter to separate?
mvzip(mvfield1,mvfield2,”delimiter”)
84
What does the mvcount function do?
Returns count of values in the specified mvfield returning null if no field values or field does not exist: mvcount(fieldname)
85
What does the split function do?
Takes a single value field and a delimiter to split by and creates a new mvfield: split(fieldname, “delim”)
86
What does the mvindex function do?
Take an mvfield and an integer to return a value at the specified integer index in the mv array - indexing starts at 0 NOT 1 : mvindex(fieldname, indexnum)
87
What command converts an existing single value field to a mvfield based on delimiter or regex(referred to as a tokenizer)
| makemv (delim=string | tokenizer=regex) fieldname
88
What does the mvexpand command do?
Takes mvfield and creates separate event for each value in the mvfield: |mvexpand fieldname
89
Does mvexpand command create new events on disk/in index?
No, only created in memory for purposes of search at hand
90
What is referred to as any group of conceptually related events?
A transaction via the | transaction command
91
What does the |transaction command enable?
Enables you to specify criteria used to determine how to group events via ranges of time, # of events, text contained in events
92
Is stats or transaction faster ?
Stats, as transaction is resource intensive and should be used only when stats is insufficient
93
In what order do events need to be ingested for transactions to work?
Reverse chronological order
94
In order to use |transaction, how do you correct events that are not coming in reverse cron order?
Use | sort -_time | Use right before |transaction
95
How do you find events that occur before or after a specific event?
Use |transaction fieldname (endswith=() | startswith=() )
96
What function is used to normalize field names by taking a number of arguments and returning first one that is not null and storing it as new field?
coalesce(field1,field2,...)
97
What does the keepevicted=1 argument used for when dealing with transactions?
It is a setting used to retain any transactions where one or both of beginning/ending criteria are not satisfied (transaction did not complete successfully)
98
What field is used to determine if a transaction is complete or incomplete?
closed_txn = 1 if transaction is a success or closed_txn = 0 if transaction is not a success
99
What has to be met for a transaction to be closed?
One or more of these criteria are met: maxevents, maxpause, maxspan, startswith, endswith
100
Where does |transaction execute?
Executes on the SH as it is a centralized streaming command
101
Does |transaction require access to all the _raw data?
Yes, because search is forced to send all _raw data back from indexers to search head as transactions require all event data
102
Can the timepicker be overridden during a search?
Yes through the earliest= & latest= time modifiers
103
When snapping to a time, does the time round up or down?
Always rounds down (backward to a previous time)
104
How do you define a search for the past 24 hours using earliest and latest modifiers?
Mainsearch earliest=-24h@h latest=@h
105
What are default time fields?
date_* fields (time/date) stamps taken directly from raw events providing extra info for searching but these are not representative of time zone conversions or time value changes
106
How would you exclude events from the current day?
latest=@d
107
How would you include data beginning at the start of the day, 2 days ago?
earliest=-2d@d
108
What does date_hour.=2 AND date_hour<5 represent?
Find events between 2am and 5am
109
What function can be used to create a new field with adjusted time zones?
|eval new_time_field =strftime(_time, “%H”) | will get the hour from the event and convert the hour to you local time based on time zone setting