Splunk 102 Flashcards
What is a bucket in Splunk?
A file system directory containing a portion of index
A Splunk Enterprise index typically consists of many buckets, organized by age.
Whate are type of Splunk buckets?
Hot, warm, cold, frozen, thawed
What is Bucket’s “aging” process?
As buckets age, they “roll” from one state to the next. When data is first indexed, it gets written to a hot bucket. Hot buckets are buckets that actively being written to. An index can have several hot buckets open at a time. Hot buckets are also searchable.
When certain conditions are met (for example, the hot bucket reaches a certain size or the indexer gets restarted), the hot bucket becomes a warm bucket (“rolls to warm”), and a new hot bucket is created in its place. The warm bucket is renamed but it remains in the same location as when it was a hot bucket. Warm buckets are searchable, but they are not actively written to. There can be a large number of warm buckets.
Once further conditions are met (for example, the index reaches some maximum number of warm buckets), the indexer begins to roll the warm buckets to cold, based on their age. It always selects the oldest warm bucket to roll to cold. Buckets continue to roll to cold as they age in this manner. Cold buckets reside in a different location from hot and warm buckets. You can configure the location so that cold buckets reside on cheaper storage.
Finally, after certain other time-based or size-based conditions are met, cold buckets roll to the frozen state, at which point they are deleted from the index, after being optionally archived.
If the frozen data has been archived, it can later be thawed. Data in thawed buckets is available for searches.
Settings in indexes.conf determine when a bucket moves from one state to the next.
What is single-instance deployment? How can we scale our deployment?
In single-instance deployments, one instance of Splunk Enterprise handles all aspects of processing data, from input through indexing to search. A single-instance deployment can be useful for testing and evaluation purposes and might serve the needs of department-sized environments.
To support larger environments, however, where data originates on many machines and where many users need to search the data, you can scale your deployment by distributing Splunk Enterprise instances across multiple machines. When you do this, you configure the instances so that each instance performs a specialized task. For example, one or more instances might index the data, while another instance manages searches across the data.
What is an instance?
A single running installation of Splunk Enterprise.
What is a Splunk component?
One of several types of Splunk Enterprise instances.
What categories of Splunk there is? Give some examples of components for each category.
These are the available processing component types:
Indexer
Forwarder
Search head
Management components include: license master monitoring console deployment server indexer cluster master node search head cluster deployer
What is a License Master?
It is a component that is responsible for keeping track of data ingestion quota.
What is a License Quota?
It is the maximum daily volume of data ingested into Splunk to a given purchased license.
What are some sizes of enviroment?
<3 tb is a small enviroment
10-30 tb - large environment
>50+ tb = massive envionment
Tell me your environment!
Paweł: 26tb ingestion - 28tb quota around 20k forwarders around 145 indexers around 250 clients 14 search heads but company's plans include adding 20% devices more to the network
How to create fake environment:
a) quota = ingestion +2tb
b) 1tb = around 1000 forwarders (go under a little bit)
c) 1tb = 6 indexers (go under 6-8%)
d) 200 users per 17 tb
Mariusz:
? ? ? ?
Which Splunk components typically share an instance?
Deployment Server and License Master
How does data enter the indexer?
Through a port and an IP adress.
Which port we have to open to enable indexers to recieve data?
9997 (sometimes 9998)
Some people call it “the Indexer port”
What two types of files indexes store? (before parsing)
Raw data (full log files) and indexed files (tsidx)
What is tsidx?
Copy of raw data with metadata attached to it. (indexed files)
What is metadata?
Metadata is “data that provides information about other data”. In Splunk the metadata attached to the events includes:
host (typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”)
Source - event source is the name of the file, stream or other input from which te event originates. Think “path”
Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user
How does Splunk process data/logs?
Process occurs from two stages:
Parsing stage:
- From a continuous log (a stream of data), the data is split into events, and these events (indiviciual occurrences of recorded activity) are stored in the indexes.
- A set of metadata is then attached to each event; metadata includes host, source, and sourcetype and they are used as and identifier, along with the timestamp, of that particular event
Indexing Stage:
- Places events into storage segments called “buckets” that then can be searched upon. Determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of data compression
- Writing the raw data nad index files to disk, where post indexing compression occures
What is an event?
A single piece of data in Splunk software, similar to a record in a log file or other data input. When data is indexed, it is divided into individual events. Each event is given a timestamp, host, source, and source type. Often, a single event corresponds to a single line in your inputs, but some inputs (for example, XML logs) have multiline events, and some inputs have multiple events on a single line. When you run a successful search, you get back events.
What does happen in Parsing Stage of data processing process?
Parsing stage:
- From a continuous log (a stream of data), the data is split into events, and these events (indiviciual occurrences of recorded activity) are stored in the indexes.
- A set of metadata is then attached to each event; metadata includes host, source, and sourcetype and they are used as and identifier, along with the timestamp, of that particular event
What does happen in Indexing stage of the data processing process?
Indexing Stage:
- Places events into storage segments called “buckets” that then can be searched upon. Determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of data compression
- Writing the raw data and index files to disk, where post indexing compression occures
What is a host, source, and sourcetype?
Host - typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”
Source - event source is the name of the file, stream or other input from which te event originates. Think “path”
Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user
What is included in metadata that is attached to each event in Parsing Stage?
Host - typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”
Source - event source is the name of the file, stream or other input from which te event originates. Think “path”
Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user
What is a preconfigured Index?
Those are the indexes that come OOTB with Splunk.