Data Collection Flashcards

1
Q

What are the forwarder types?

A

A universal forwarder:

  • A streamlined binary package that contains only the components needed to forward data
  • Data is forwarded unparsed

Heavy Forwarder:

  • Uses the Splunk Enterprise binary with all capabilities
  • Parses data before forwarding
  • Can consume more resources than parsing on the indexer

Lightweight Forwarder:

  • A full install of splunk, but it does not parse inputs
  • Data is sent uncooked/raw
  • Deprecated as of Splunk 6.0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is cooked data?

A

Parsed and unparsed are considered “cooked” data, otherwise is sent in raw form like syslog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does each event’s header contain?

A

host, source, sourcetype, and target index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When unparsed data is forwarded, what size in kb is it delivered?

A

64kb blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe Parsed Data

A

Parsed data is loadbalanced to all indexers using automatic load balancing.

Automatic switchover is done on a 30 second timer (default)

This can be configured based on volume in outputs.conf

By time: autoLBFrequency - 30 second default

By volume: autoLBVolume setting. Default is 0 bytes. If set with anything other than 0, then the forwarder will change indexers based on the amount of data.

If both time nd volume are set than the first one to hit wins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe Unparsed data

A

Unparsed data stream is sent to the indexer tagged with minimal metadat to identify host, source, sourcetype, and target index

Divide stream in to 64kb blocks

stamps stream with time zone of the originating forwarder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is unparsed data loadbalanced?

A

Adheres to the same load balancing parameters are parsed data.

There are some issues with unparsed data and load balancing on a timer or volume setting. Events can be truncated or trashed.

To not have trashed or truncated events, use EVENT_BREAKER(regex) and EVENT_BREAKER_ENABLE = true in props.conf on the forwarder.

Only the forwarder is required to be at least at 6.5, the indexers need not be upgraded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What configuration in props.conf be copied to EVENT_BREAKER to work?

A

LINE_BREAKER

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the Monitor input.

A

Most common input

Continuously watches a file or directory, ingesting new events as they arrive.

Files should be local to the host doing the monitoring
- use a forwarder on the host where the logs reside

Configure using the [monitor::/] stanza in inputs.conf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the batch input.

A

The batch input reads a file, indexes the data, and deletes the file

Best used with large archives of historic data, not files that continue to be written to.

Use a forwarder on the host where the files reside.

[batch://] stanza has several unique settings, but also uses the same settings as the monitor input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the Script input

A

Splunk runs the script in the [script://] stanza and ingests the output (STDOUT/STDERR)

Splunk supports a number of script types including PowerShell, Python, Windows batch files, or any other utility that can format and stream the data that you want to index

Place scripts in bin/ directory of your app:
$SPLUNK_HOME/etc/apps//bin/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the FIFO input

A

A First In, First Out (FIFO) input reads data from the FIFO queue reference in the path specified in the stanza

  • not currently supported by Splunk web
  • If using Splunk Cloud, use an HF for FIFO inputs

It is important to note that data sent over FIFO queues does not remain in memory and can be an unreliable method for data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe an fschange input

A

File System change (fsmonitor) monitors the directory and subdirectories referenced in the path for the updates, additions, and deletions
- note the stanza does not preface with “//”

Events arrive at Splunk to indicate a change from the prior state

The stanza uses different setting than other inputs

A directory cannot be simultaneously monitored by [fschange:] and [monitor://…].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe Perfmo, WinEventLog, WMI, and admon Inputs

A

Windows inputs that monitor perfmon counters, windows event logs, and active directory(admon).

WMI can be used for remote servers, but is highly discouraged. A UF should be placed on the remote server and WinEventLog used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the http input

A

This configures for HTTP Event Collector (HEC) which is a token-based HTTP input that is secure and scalable

Notethat the inputs.conf must live within the application scope of $SPLUNK_HOME/splunk_httpinput/local/inputs.conf to be active.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the network inputs

A

The network inputs listen directly on network ports UDP, TCP, or TCP SSL

Protocol is dictated by the inputs.conf stanza

17
Q

Describe splunktcp

A

SPlunktc is an input method for Splunk to Splunk Communication

Most commonly found on indexers, but can also be used on intermediate forwarders, like on-prem-to-cloud-configurations

18
Q

What are the great eight properties in props.conf?

A
TIME_PREFIX - Regex
TIME_FORMAT - strftime
MAX_TIMESTAMP_LOOKAHEAD - integer
SHOULD_LINEMERGE (t|f) usually f
LINE_BREAKER - regex
TRUNCATE - Integer
EVENT_BREAKER - regex
ENABLE_EVENT_BREAKER - t|f
19
Q

What splunk cli command can use to troubleshoot inputs via the TailingProcessor?

A

splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus

20
Q

What is a fishbucket?

A

A data structure that keeps track of files that have been ingested by “monitor” statements

21
Q

What can you use to troubleshoot the fishbucket?

A

Btprobe monitor What splunk has indexed
–reset reset an individual checkpoint

Clean the entire Index Fishbucket
Splunk clean eventdata _thefishbucket

Manually delete FB on forwarder
Rm -r ~/splunkforwarder/var/lib/splunk/fishbucket

22
Q

How do you one shot a file into splunk?

A

Onetime upload to Splunk to validate input configuration

- Local only

23
Q

What is initCrcLength?

A

sets the amount of data that is read to validate if the file has been read before (default is 256 Characters)
- Can be edited to read further for long headers that are reused, or rolling log files

24
Q

What is crcSalt?

A

Forces the input to ingest files that have matching CRC’s by creating unique CRC

  • crcSalt = - string added to crc
  • crcSalt = - Full directory path added to the CRC (Do not use on Rolling Log Files

Useful if you need to:

  • Forward to a new environment
  • Transition from test to production
25
Q

What are Pretrained Sourcetypes?

A

/opt/splunk/etc/system/default/props.conf

“Known” sources - Splunk will format automatically