Fundamentals 3 Flashcards
What command enables you to calculate stats on data that matches your search criteria?
stats command
What does the | fieldsummary command do?
Calculates summary stats for all/subset of fields and displays in table form:
| fieldsummary [maxvals=num] [field-list]
- maxval: max distinct vals to return for the values stat of each field
- field-list: fields to calc stats for
What does the is_exact boolean indicate in the |fieldsummary results?
is_exact represents whether the distinct_count is exact
What does the |appendpipe command do?
- Takes existing results and pushes them into sub pipeline
- Appends sub pipeline results as new lines to the outer search
How do you name the appendpipe subtotals field after appending?
Use |eval column_name= “subtotals name”
How do you create a grandtotal field when using the |appendpipe command?
Use another |appendpipe command to search for and total only the subtotals fields
How do you use count and list functions to remove duplicates for info in tabular form?
- Use |stats count as normal
- Use |stats list(columnBfield), list(columnCfield) … by columnAfield
- Column A is no longer duplicated
What does the |eventstats command do?
Generates summary stats of all existing fields in search results and saves as new fields
- Works on entire results
What does |streamstats do
Generates stats on fields and compiles to previous data
- Works on entire results but calculates stats for each result row at the time command encounters it
- index order matters
What are two arguments that can be used with |streamstats
- current= t or f :include or not include current event in summary calc
- window=# : calc over past # of events
What does the |eval command do
Manipulate and calculate expression and creates a new field or overwrites existing one
|eval fieldname1=expression1, fieldname2=expression2
What are the |eval command conversion functions
tostring
tonumber
printf
What are the options for and syntax of the tostring function
tostring(field, “option”)
Options being: commas(also rounding to 2 decimals), duration(hh:mm:ss), hex
What are the options for and syntax of the tonumber function
tonumber(numstr,base)
Where numstr can be a field name or a number and base is optional
What are the options for and syntax of the printf function
printf(“format”,arguments)
Where format is conversion specifiers(%d,%f%s…) and arguments are optional
What does the eval now() function return
Time a search was started
What does the eval time() function return
Time event was processed by the eval command
What does the eval strftime function do
Converts timestamp to string format using strftime(X,Y) to convert epoch time to a readable format. Where x is UNIX time in seconds to be converted to a string
EX: Y= “%B-%d-%Y” yields format example February-19-2018
What does the eval strptime function do
Converts time in string format and parses it into a timestamp using strptime(x,y) where x is a time in string format and y is a timestamp format defined by variables
What does the eval relative_time function return
Returns timestamp relative to a supplied time as if asking for data a day prior to a certain event
What do the lower() and upper() functions of the eval command return
Conversion of string to lower or upper case
What does the eval substr(X,Y,Z) command return
Returns substring of X, according to the starting index Y and the length of Z
What does the eval replace(X,Y,Z) command do
Where X,Y,Z are all strings and Y is a regex, return a string where Z replaces each occurrence of Y in X
Note: eval commands do not alter the indexed data or write new data to index
Do non-numeric values need to be in quotations when using the if() function?
yes
What does the eval cidrmatch(X,Y) function return
Returns t/f based on whether provided IP address Y matches subnet specified in X
What does the eval match(subject,regex) function return
Returns t/f depending on whether subject matches defined regex
What does the eval coalesce(X1,X2…) function do
Retrieves the first value from the first field defined in the current event - used to normalize field names from results sets where two or more field names represent the same data field
Ex: combining fields with different names, but representing same data field, into one normalized field
What does the eval isnull() function return
Returns t/f if field is null
What does the eval typeof() function return
Returns a string that represents the data type of the argument (number, string, boolean etc)
Are strings or numbers considered greater than when dealing with min() and max() functions?
Strings are greater
What do the eval ceiling() and floor() functions return
Rounded up or down to the nearest whole integer
What are the eval cryptographic functions used for?
Used to compute and return secure, encrypted hash values of a string: md5, sha1, sha256, sha512
What does the | makeresults command return
By itself, generates one result with only _time field
Must be first command in search
Can be used with one or more eval commands
What is the default case sensitivity for Lookups
Default is case insensitive but this can be changed in advanced options when creating a lookup
What kind of lookup should be used for large tables or ones that are updated often?
KV(Key Value) Store
Where do KV Stores and CSV files live?
KV Store collections are on the SH.
CSV files are replicated to indexer.
Which type of lookup provides REST API access, multiuser access locking, and per-record insert and updates?
KV Store
Why would you use a CSV lookup over KV store?
Small csv table performs well, need case insensitive lookups, or integrating with other apps
Where is a KV Store collection defined?
Admin defines in configuration stanza in the collections.conf
Can you add results to a KV Store collection from SPL
Yes, use the outputlookup command to write results from a search to the collection provided data is shared and field names do not have . or $
What are scripted(external) lookups?
Lookup facilitated through use of a script used to populate events with field values from an external source
What language must the scripted lookup be written in?
Python script or binary executable
What are the arguments passed to the script when creating a new external lookup?
Arguments are the field headers from the input/output CSV files
What are geospatial lookups used for?
To create chloropleth map visualizations by matching coordinates from events to geographic feature collections in a KMZ or KML file
What command is used to access a geospatial lookup?
|geom featurecollectionname
What does the DB Connect (DBX) app do?
Allows you to use lookups to reference fields in an external SQL db; import data, export machine data to external db, or use SQL to build dashboard mixing splunk and db data
How are database lookups completed?
Through the DBX app via Data Lab and New lookup options
What command is used to access DBX lookups?
|dbxlookup lookup=”lookup name”
When using |dbxlookup do you need to reference input fields in your search?
Yes, to get results you must explicitly refer to input fields in search via |fields command or running search in Verbose mode
Can alerts contain lookups?
Yes - run a search that contains a lookup command and save as alert
How can alert results be output to a lookup?
‘Output results to lookup’ is one of the action options when creating an alert OR you can use: | outputlookup filename. OR tablename
What do search metadata tokens provide?
Metadata about the alert and associated search:
$name$ $description$ $app$ $owner$ $trigger_date$ etc
What do results tokens provide?
Field values from first row returned by the search associated with the alert - taking the form: $result.fieldname$
What do server tokens provide?
Details about your splunk deployment:
$splunk.version$ $splunk.build$ etc
What do job info tokens provide?
Data specific to a job search:
$job.eventSearch$ $job.messages$ $job.resultCount& etc
What is a webhook alert action?
Action allows you to define custom callbacks on web resource via generation of JSON formatted info about alert and sending of HTTP POST request to specified URL
What are ways of extracting fields?
Field Extractor using GUI (persistent and easy to use)
Manually coding a REGEX (precise and persistent)
Using erex SPL command (temporary and easy to use)
Using rex SPL command (temporary and precise)
What is a regular expression (regex)?
Case sensitive sequence of characters, either regular with literal meaning or a metacharacter with special meaning, to define a pattern
What regex type does splunk use?
Perl compatible
What do \d \w \s match in regex
Any digit, word, or white space
What to \D \W \S match in regex
Any NON digit word or whitespace
What do ? * + match in regex
0 or 1 ; 0 or more ; 1 or more occurrences of the previous character
What does . match in a regex
. is a wild card matching one character → so
.* would match anything
How do you specify exactly n occurrences in regex
{n} after the character
How do you turn a match into nongreedy matching as few characters as possible
Adding ? after the count
How do perform a capture group in regex?
Parenthesis create a capture group which can be named using ? and references using $1, $2 …
IF you want to group something, but not capture it, how do you write that in regex
(?:)
How do you write an OR statement in regex
Pipe | character represents OR:
?: invalid|wrong
What are the two search time extraction commands?
|erex just requires an example
|rex requires a regex
What is the |erex syntax
|erex temp_fieldname examples=”ex1,ex2…”
What is the |rex syntax
|rex field=fieldname “regex”
Where field is optional and used if you want to narrow down where the regex is going to search rather than all data with the default field of _raw
How do you name a field while searching to match a regex with the |rex command?
( ? < field_name > regex )
Which regex command should be used in saved reports?
|rex command
What can be done to avoid backtracking and making multiple passes through the data when using regex
Limit use of quantifiers such as * and alternation constructs such as |
What are regex best practices?
Avoid multiple .* matches; use + instead of *; use simple ungreedy expressions; use parentheses to multiple extractions
What is self describing data?
Schema or structure is embedded in data and comprised of metadata (element names, data types…). Include JSON, XML and tabular files
Can splunk automatically interpret self describing data?
Splunk recognized JSON so data will be accessible as fields. Additional steps are needed to interp XML
If fields show up with a format of name{}.fieldname, what type of data has been ingested
JSON data
What command interprets XML format to have access to data as splunk fields
spath command
What is the syntax for the |spath command
|spath input=field_extract_from output=field_extract_to path=datapath_value_to_extract
Where all arguments are optional
How is the |spath path argument defined?
Contains one or more location steps separated by periods and position of data in array is specified by digit in {}
Ex: entities.hashtags{3}.text
Does numbering in path steps {} begin with 0 or 1
Begins with 0 for JSON and 1 for XML
Can spath be used with |eval?
Yes, spath becomes the function spath(X,Y) where x is input and y is path
How can you automatically extract data from XML at search time?
Set KV_MODE=XML in props.conf
What does the |multikv command do?
Creates an event for each row of tabular data (headers at top row, values as the rest)
When creating nested macros, the outer macro should be created before or after the inner
Create inner macro first
What command allows you to check contents of search macros before executing?
Control/Command Shift E
What are the three types of data summary creation methods
Report accel, summary indexing, data model accel
What is the easiest and most efficient acceleration option and should be first choice?
Data Model Accel
What is acceleration?
Using auto created accel summaries to improve search time completion
What is report accelerations
Saving a qualifying report as accelerated then creates an acceleration summary that can be used to efficiently run future searches/reports on large volumes of data
What search mode and user privileges are needed to accelerate a report?
Search in smart or fast mode with the scheduled_search privilege (power has by default)
What happens to an acceleration summary if all reports that use it are deleted?
The summary is auto deleted
What are the requirements for a report to accelerate?
Search must have a transforming command; commands before must be streaming, commands after non-streaming
What is a streaming command?
Operate on each event as the event is returned by the search: eval, search, fields, rex, rename, replace etc
What is a transforming command?
Commands massage raw data into a table transforming cell values for each event into numerical values: stats, chart, timechart, top, rare
What is a non-streaming command?
Commands wait until all events are gathered from indexers before command gets executed: eval and rename become non streaming after a transforming command
When do searches run faster without an acceleration summary?
<100K events in hot buckets or summary size projected to be too big
What is the automatic backfill feature?
Report acceleration feature allowing automatic update/rebuild of summaries as needed during a data interruption
When should you consider deleting an acceleration summary?
When the summarization load (effort to update summary) is high and the access count is low
What is summary indexing?
An alternative to unqualified report acceleration where you schedule frequently running reports to extract only needed info into a summary index and run subsequent searches against that summary
What type of transforming commands must be used in the report to create a summary index?
si commands: sichart, sitimechart, sistats, sitop, sirare
Does a scheduled report automatically create a summary index?
No, you have to save a search as a scheduled report -> edit via edit summary indexing -> check enable summary indexing
When do gaps occur in a summary index?
Populating reports run too long past next scheduled runtime
Forced real time scheduling
Splunk is down
How do you backfill gaps in summary indexes?
Run the fill_summary_index.py script
How do overlaps in summary indexes occur?
Setting report time range to be longer than frequency of report schedule
What are data models?
Hierarchical structured datasets generating searches and driving pivots
What does the |datamodel command do?
Returns description of all(or specified) data model and objects Ex: |datamodel [datamodel_name] [object_name] [search]
-Also used to search against data model
Is |datamodel a generating command?
Yes, so it must be first command in pipe
What is an acceleration summary built on the search head after user selects dataset and enters pivot editor?
Ad Hoc data model acceleration
When are ad hoc acceleration summaries available to use?
Only while working in the pivot editor - not on reports or dashboards based on pivot
What is a persistent data model acceleration?
Acceleration summary composed of multiple time-series index files optimized for speed to be used with pivot editor or tstats command
Can ad hoc data model accelerations run for particular time ranges?
No they run over all time, only persistent acceleration can be scoped to time ranges
What user privileges are needed to accelerate a data model?
Admin permissions or accelerate_datamodel privilege
What type of events and datasets can be accelerated through Persistent acceleration?
Only root events can be accelerated - if multiple root events only the first is accelerated
How often are the underlying data model acceleration tsidx files updated and removed?
Updated every 5 minutes and outdated items removed every 30min
What does a tsidx file consist of?
A lexicon: alpha-numeric term list pointing to posting list
Posting list: array containing seek address, _time, etc mapping each term to events in the rawdata files containing term
What files make up an index?
rawdata files and corresponding tsidx files
How do you perform stats on indexed fields in the tsidx file?
By using the |tstats stat_function command
Can you use |tstats with data models and summary indexes?
Yes use |tstats from datamodel = name OR summariesonly=t
Does stats or tstats work best with massive amounts of data using indexed fields?
tstats
How does tstats search an accelerated data model object?
Use FROM |datamodel to return model and its objects -> find field and its owner and use dot notation to input into | tstats sum(owner.field)