T-Forms(post E3 #1) Flashcards
what is transforms.conf
where we specify transformations and lookups that can then be applied to any event.
These transforms and lookups are referenced by name in props.
main difference between props and transforms
-props is responsible for how data breaks and parses
-transforms is responsible for how data looks
Props and Transfroms work together
Transforms will not work without props
-props says “do this, but go see transforms for instruction”
-transforms says, “this is how you do it”
Name more functions of Transforms.conf
-Manipulates data before it gets indexed
-Transforms and alters raw data
-Has the ability to change metadata
List the common uses of Transfroms
-Sending events to Null queue
-Separating a single sourcetype into multiple sourcetypes
-Host and source overrides based on Regex
-Delimiter-based field extractions
-Anonymizing data
what is a splunk pipeline
In Splunk, a pipeline refers to the sequence of data processing stages that data goes through as it is ingested, indexed, and made available for searching and analysis.
what is a splunk queue
queues are designed to help manage data flow, ensure data integrity, and handle system load efficiently.
What is Splunkd considered and what are its subprocesses?
*Considered a main process
*the core Splunk process that runs on all Splunk components
1-Parsing queue/parsing pipeline
(linebreaking occurs)
2-Aggregation queue/merging pipeline
(Line merging & Time extraction)
3-Typing queue/typing pipeline
(More Regex occurs here than in parsing pipeline)
4-Indexing queue/indexing pipeline
(syslog out,tcp out, indexer)Final stop is disk after this
What is the solution for when you do not want to index unwanted data?
Send events to null queue
Why do we want to prevent unwanted data from indexing ?
Do not want extra work, takes more time processing, and is a waste of storage
List all 4 Splunk pipelines in order
- Parsing pipeline
- Merging pipeline
- Typing pipeline
- Index pipeline
Explain process of Splunk pipeline and queues
see class notes
What should you do if backlog is caused because process is frozen?
Restart Instance
Facts about Regex in Splunk pipeline
-More Regex happening in Typing pipeline than parsing pipeline
-More Regex in transfroms.conf(mostly in typing pipeline)
What configurations do we control in Splunk pipeline?
props.conf and transforms.conf
what is a debug event in splunk
a log entry or event that contains information about the internal workings, troubleshooting details, or debugging information related to the Splunk platform itself.
how do you generate a debug event in splunk
command-line tool called logger -located in the bin directory
use it to log custom messages or debug information.
Define transforms.conf stanza
-DEST_KEY= destination key telling splunk what you want to send to queue
- FORMAT = specify which queue
-REGEX=DEBUG is looking for any event associated with that match
Filepath for transfroms.conf and props.conf
Forwarder Level- $SPLUNK_HOME/etc/deployment-apps/app-name/local/props.conf
Indexer Level
$SPLUNK_HOME/etc/master-apps/app-name/local/props.conf
Why would we want to split a single sourcetype into multiples ?
-a log might contain events that are kind of different from each other
-sometimes logs are just written in a messed up and funny way
Explain the process of splitting single sourcetype into multiples
- First make a reference in props.conf
- Then, define in transforms.conf
(See class notes for pic)
**Same exact steps for Host and source override
What is a delimiter? and What is a field extraction?
Delimiter=limit
Field Extraction=key value pair
Delimiter-based field extractions in Splunk are used to extract fields from data that is separated by delimiters. Delimiters are characters that are used to separate fields in data, such as commas, spaces, or tabs.
Where does transforms.conf go on the searchhead level?
$SPLUNK_HOME/etc/apps/app-name/local/transforms.conf
OR
$SPLUNK_HOME/etc/shcluster/apps/<app>/local/transforms.conf</app>
What must you use when setting delimiter-based field extractions ?
REPORT instead of Transforms.conf
Types of data that should be anonymized
Title 13 Data = Class of protected data containing private business or personal information
PII = Personal Identifying Information
PHI = Protected Health Information
Title 26 Data = Class protected data containing Federal Tax Information
Review how to set up anonymization
Look at class notes
At what point or stage is anonymizing done?
It is done before the indexing pipeline where data is written to disk –i.e. private information is never stored in Splunk.
Anonymizing happens in the typing pipeline.
$SPLUNK_HOME/etc/master-apps/<same_app_name_>/local/props.conf</same_app_name_>
$SPLUNK_HOME/etc/master-apps/<same_app_name_>/local/transforms.conf</same_app_name_>
Where else can anonymizing occur?
Heavy Forwarder
Clients of the DS:
$SPLUNK_HOME/etc/deployment-apps/<same_app_name_>/local/props.conf</same_app_name_>
$SPLUNK_HOME/etc/deployment-apps/<same_app_name_>/local/transforms.conf</same_app_name_>
Standalone HFs:
$SPLUNK_HOME/etc/apps/<same_app_name_>/local/props.conf</same_app_name_>
$SPLUNK_HOME/etc/apps/<same_app_name_>/local/transforms.conf</same_app_name_>
What do you use to hash data?
props.conf only + SEDCMD
Explain how to hash data with props.conf only + SEDCMD
See class notes
What should you do when your Splunk pipeline has a fill ratio of 100%?
When you have a fill ratio of 100% and a backlog you can open up a second pipeline and direct overfill of events to it
Set number of events a queue can hold = Fill Ratio