Data Collection Flashcards
What are the forwarder types?
A universal forwarder:
- A streamlined binary package that contains only the components needed to forward data
- Data is forwarded unparsed
Heavy Forwarder:
- Uses the Splunk Enterprise binary with all capabilities
- Parses data before forwarding
- Can consume more resources than parsing on the indexer
Lightweight Forwarder:
- A full install of splunk, but it does not parse inputs
- Data is sent uncooked/raw
- Deprecated as of Splunk 6.0
What is cooked data?
Parsed and unparsed are considered “cooked” data, otherwise is sent in raw form like syslog
What does each event’s header contain?
host, source, sourcetype, and target index
When unparsed data is forwarded, what size in kb is it delivered?
64kb blocks
Describe Parsed Data
Parsed data is loadbalanced to all indexers using automatic load balancing.
Automatic switchover is done on a 30 second timer (default)
This can be configured based on volume in outputs.conf
By time: autoLBFrequency - 30 second default
By volume: autoLBVolume setting. Default is 0 bytes. If set with anything other than 0, then the forwarder will change indexers based on the amount of data.
If both time nd volume are set than the first one to hit wins
Describe Unparsed data
Unparsed data stream is sent to the indexer tagged with minimal metadat to identify host, source, sourcetype, and target index
Divide stream in to 64kb blocks
stamps stream with time zone of the originating forwarder
How is unparsed data loadbalanced?
Adheres to the same load balancing parameters are parsed data.
There are some issues with unparsed data and load balancing on a timer or volume setting. Events can be truncated or trashed.
To not have trashed or truncated events, use EVENT_BREAKER(regex) and EVENT_BREAKER_ENABLE = true in props.conf on the forwarder.
Only the forwarder is required to be at least at 6.5, the indexers need not be upgraded.
What configuration in props.conf be copied to EVENT_BREAKER to work?
LINE_BREAKER
Describe the Monitor input.
Most common input
Continuously watches a file or directory, ingesting new events as they arrive.
Files should be local to the host doing the monitoring
- use a forwarder on the host where the logs reside
Configure using the [monitor::/] stanza in inputs.conf
Describe the batch input.
The batch input reads a file, indexes the data, and deletes the file
Best used with large archives of historic data, not files that continue to be written to.
Use a forwarder on the host where the files reside.
[batch://] stanza has several unique settings, but also uses the same settings as the monitor input.
Describe the Script input
Splunk runs the script in the [script://] stanza and ingests the output (STDOUT/STDERR)
Splunk supports a number of script types including PowerShell, Python, Windows batch files, or any other utility that can format and stream the data that you want to index
Place scripts in bin/ directory of your app:
$SPLUNK_HOME/etc/apps//bin/
Describe the FIFO input
A First In, First Out (FIFO) input reads data from the FIFO queue reference in the path specified in the stanza
- not currently supported by Splunk web
- If using Splunk Cloud, use an HF for FIFO inputs
It is important to note that data sent over FIFO queues does not remain in memory and can be an unreliable method for data sources.
Describe an fschange input
File System change (fsmonitor) monitors the directory and subdirectories referenced in the path for the updates, additions, and deletions
- note the stanza does not preface with “//”
Events arrive at Splunk to indicate a change from the prior state
The stanza uses different setting than other inputs
A directory cannot be simultaneously monitored by [fschange:] and [monitor://…].
Describe Perfmo, WinEventLog, WMI, and admon Inputs
Windows inputs that monitor perfmon counters, windows event logs, and active directory(admon).
WMI can be used for remote servers, but is highly discouraged. A UF should be placed on the remote server and WinEventLog used
Describe the http input
This configures for HTTP Event Collector (HEC) which is a token-based HTTP input that is secure and scalable
Notethat the inputs.conf must live within the application scope of $SPLUNK_HOME/splunk_httpinput/local/inputs.conf to be active.
Describe the network inputs
The network inputs listen directly on network ports UDP, TCP, or TCP SSL
Protocol is dictated by the inputs.conf stanza
Describe splunktcp
SPlunktc is an input method for Splunk to Splunk Communication
Most commonly found on indexers, but can also be used on intermediate forwarders, like on-prem-to-cloud-configurations
What are the great eight properties in props.conf?
TIME_PREFIX - Regex TIME_FORMAT - strftime MAX_TIMESTAMP_LOOKAHEAD - integer SHOULD_LINEMERGE (t|f) usually f LINE_BREAKER - regex TRUNCATE - Integer EVENT_BREAKER - regex ENABLE_EVENT_BREAKER - t|f
What splunk cli command can use to troubleshoot inputs via the TailingProcessor?
splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus
What is a fishbucket?
A data structure that keeps track of files that have been ingested by “monitor” statements
What can you use to troubleshoot the fishbucket?
Btprobe monitor What splunk has indexed
–reset reset an individual checkpoint
Clean the entire Index Fishbucket
Splunk clean eventdata _thefishbucket
Manually delete FB on forwarder
Rm -r ~/splunkforwarder/var/lib/splunk/fishbucket
How do you one shot a file into splunk?
Onetime upload to Splunk to validate input configuration
- Local only
What is initCrcLength?
sets the amount of data that is read to validate if the file has been read before (default is 256 Characters)
- Can be edited to read further for long headers that are reused, or rolling log files
What is crcSalt?
Forces the input to ingest files that have matching CRC’s by creating unique CRC
- crcSalt = - string added to crc
- crcSalt = - Full directory path added to the CRC (Do not use on Rolling Log Files
Useful if you need to:
- Forward to a new environment
- Transition from test to production
What are Pretrained Sourcetypes?
/opt/splunk/etc/system/default/props.conf
“Known” sources - Splunk will format automatically