Splunk 102 Flashcards
What is a bucket in Splunk?
A file system directory containing a portion of index
A Splunk Enterprise index typically consists of many buckets, organized by age.
Whate are type of Splunk buckets?
Hot, warm, cold, frozen, thawed
What is Bucket’s “aging” process?
As buckets age, they “roll” from one state to the next. When data is first indexed, it gets written to a hot bucket. Hot buckets are buckets that actively being written to. An index can have several hot buckets open at a time. Hot buckets are also searchable.
When certain conditions are met (for example, the hot bucket reaches a certain size or the indexer gets restarted), the hot bucket becomes a warm bucket (“rolls to warm”), and a new hot bucket is created in its place. The warm bucket is renamed but it remains in the same location as when it was a hot bucket. Warm buckets are searchable, but they are not actively written to. There can be a large number of warm buckets.
Once further conditions are met (for example, the index reaches some maximum number of warm buckets), the indexer begins to roll the warm buckets to cold, based on their age. It always selects the oldest warm bucket to roll to cold. Buckets continue to roll to cold as they age in this manner. Cold buckets reside in a different location from hot and warm buckets. You can configure the location so that cold buckets reside on cheaper storage.
Finally, after certain other time-based or size-based conditions are met, cold buckets roll to the frozen state, at which point they are deleted from the index, after being optionally archived.
If the frozen data has been archived, it can later be thawed. Data in thawed buckets is available for searches.
Settings in indexes.conf determine when a bucket moves from one state to the next.
What is single-instance deployment? How can we scale our deployment?
In single-instance deployments, one instance of Splunk Enterprise handles all aspects of processing data, from input through indexing to search. A single-instance deployment can be useful for testing and evaluation purposes and might serve the needs of department-sized environments.
To support larger environments, however, where data originates on many machines and where many users need to search the data, you can scale your deployment by distributing Splunk Enterprise instances across multiple machines. When you do this, you configure the instances so that each instance performs a specialized task. For example, one or more instances might index the data, while another instance manages searches across the data.
What is an instance?
A single running installation of Splunk Enterprise.
What is a Splunk component?
One of several types of Splunk Enterprise instances.
What categories of Splunk there is? Give some examples of components for each category.
These are the available processing component types:
Indexer
Forwarder
Search head
Management components include: license master monitoring console deployment server indexer cluster master node search head cluster deployer
What is a License Master?
It is a component that is responsible for keeping track of data ingestion quota.
What is a License Quota?
It is the maximum daily volume of data ingested into Splunk to a given purchased license.
What are some sizes of enviroment?
<3 tb is a small enviroment
10-30 tb - large environment
>50+ tb = massive envionment
Tell me your environment!
Paweł: 26tb ingestion - 28tb quota around 20k forwarders around 145 indexers around 250 clients 14 search heads but company's plans include adding 20% devices more to the network
How to create fake environment:
a) quota = ingestion +2tb
b) 1tb = around 1000 forwarders (go under a little bit)
c) 1tb = 6 indexers (go under 6-8%)
d) 200 users per 17 tb
Mariusz:
? ? ? ?
Which Splunk components typically share an instance?
Deployment Server and License Master
How does data enter the indexer?
Through a port and an IP adress.
Which port we have to open to enable indexers to recieve data?
9997 (sometimes 9998)
Some people call it “the Indexer port”
What two types of files indexes store? (before parsing)
Raw data (full log files) and indexed files (tsidx)
What is tsidx?
Copy of raw data with metadata attached to it. (indexed files)
What is metadata?
Metadata is “data that provides information about other data”. In Splunk the metadata attached to the events includes:
host (typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”)
Source - event source is the name of the file, stream or other input from which te event originates. Think “path”
Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user
How does Splunk process data/logs?
Process occurs from two stages:
Parsing stage:
- From a continuous log (a stream of data), the data is split into events, and these events (indiviciual occurrences of recorded activity) are stored in the indexes.
- A set of metadata is then attached to each event; metadata includes host, source, and sourcetype and they are used as and identifier, along with the timestamp, of that particular event
Indexing Stage:
- Places events into storage segments called “buckets” that then can be searched upon. Determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of data compression
- Writing the raw data nad index files to disk, where post indexing compression occures
What is an event?
A single piece of data in Splunk software, similar to a record in a log file or other data input. When data is indexed, it is divided into individual events. Each event is given a timestamp, host, source, and source type. Often, a single event corresponds to a single line in your inputs, but some inputs (for example, XML logs) have multiline events, and some inputs have multiple events on a single line. When you run a successful search, you get back events.
What does happen in Parsing Stage of data processing process?
Parsing stage:
- From a continuous log (a stream of data), the data is split into events, and these events (indiviciual occurrences of recorded activity) are stored in the indexes.
- A set of metadata is then attached to each event; metadata includes host, source, and sourcetype and they are used as and identifier, along with the timestamp, of that particular event
What does happen in Indexing stage of the data processing process?
Indexing Stage:
- Places events into storage segments called “buckets” that then can be searched upon. Determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of data compression
- Writing the raw data and index files to disk, where post indexing compression occures
What is a host, source, and sourcetype?
Host - typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”
Source - event source is the name of the file, stream or other input from which te event originates. Think “path”
Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user
What is included in metadata that is attached to each event in Parsing Stage?
Host - typically the hostname, IP address, or a fully qualified domain name of the network (FQDN) host from which the event orginated. Think “server”
Source - event source is the name of the file, stream or other input from which te event originates. Think “path”
Sourcetype - format of the data input that identifies the structure of the data, usually named by the admin or user
What is a preconfigured Index?
Those are the indexes that come OOTB with Splunk.
What does OOTB mean?
Out Of The Box means that some software’s feature comes with the “base” of the software, and it doesn’t need to be installed seperatly etc. to be accessed and used.
Tell us about 5 Splunk preconfigured indexes?
main: This is the default index. All processed data will be stored here unless otherwise specified
_internal: Stores all splunk component’s internal logs and processing metrics. It is often used for troubleshooting. Search for logs that say ERROR or WARN.
_audit: Stores events related to the activities conducted in the component - including files system changes, and user auditing such as search history and user-activity error logs.
_summary: Summamy indexing allows you to run fast searches over a large data set by scheduling Splunk to summarize data then “import” data into the summary index from another larger index over time
_fishbucket: This index tracks how far into a file indexing has occurred to prevent duplicate data from being stored. This is especially useful in the event of a server shutdown or connection error.
How to activate preconfigured indexes?
By configuring indexes.conf properly.
What does the main index do?
This is the default index. All processed data will be stored here unless otherwise specified
What does the _internal index do?
Stores all splunk component’s internal logs and processing metrics. It is often used for troubleshooting. Search for logs that say ERROR or WARN.
IT HOUSES INFORMATION FROM SPLUNKD.LOG WHICH IS A VERY IMPORTANT LOG FILE THAT TELLS YOU ABOUT THE HEALTH OF THE SPLUNK COMPONENT THAT YOU ARE ON.
What does _audit index do?
Stores events related to the activities conducted in the component - including files system changes, and user auditing such as search history and user-activity error logs.
What does _summary index do?
Summamy indexing allows you to run fast searches over a large data set by scheduling Splunk to summarize data then “import” data into the summary index from another larger index over time
What does _fishbucket index do?
This index tracks how far into a file indexing has occurred to prevent duplicate data from being stored. This is especially useful in the event of a server shutdown or connection error.
What does will happen when you dont specify (or specify wrong index) to which index the data has to go to?
It will go to the main index
What is splunkd.log?
It is one of the most important Splunk internal logs. It is stored in internal index.
How would you troubleshoot?
Check splunkd.log
How can you access splunkd.log?
Through back end or through search head