Splunk 103 Flashcards
What is Indexer Clustering?
Clustering is where multiple indexers are connected in order to maintain multiple identical copies of data. Clusters featuer “automatic failover”, which simply means when or if one indexer fails, the others will pick up the slack and maintain continuity in its activites.
This means:
- Data is protected from sudden loss
- More copies are available for users who are actively searching
- The above acitvities will continue even when an indexer goes down
What determines the number of copies kept withint a cluster?
Replication Factor
What is the Replciation Factor?
This determines how many copies are maintained within an indexer cluster.
Default RF is 3.
What is the Search Factor?
This determines how many copies in the cluster are immediately searchable
Default SF is 2.
What is the minimum amount of indexers that you have to have in cluster?
3
What are components of a cluster?
Cluster Peer (Peer Node) Cluster Master (Master Node)
What are Cluster Peers?
Peers are the indexers that are in the cluster. They recieve and index incoming data, and replicate it to other peers. They respond to incoming searches by supplying search results.
What is Cluster Master?
It manages cluster activities (such as adding peers, distributing configurations, determining number of copies to maintaing)
It maintains memory of peers, their buckets and configs, and tells search heads where to request data
How does distributed search work in the cluster?
Search head “asks” Master Node in which indexers it should should search for the data it is searching for, and then it accesses those indexers.
What are benefits of clustering?
Data availability and fast recovery
Easier overall administration:
- Coordinated indexer configuration management
- Automatic distributed search set up
- Elastic indexer discovery
- Indexer health dashboard on Cluster Master
Scalability of Indexing
No additional license cost for data replication
Data fidelity
Data Resilency
Disaster Recovery
Search Affinity
What are cons of clustering?
Increased storage requirements
Increased processing load (depends of RF and SF)
Requires additional Splunk instances:
Minimum: RF + CM + SH = # of insances required
REcommended: More than RF, and multiple SHs
Indexers require the same OS and versions
Requires cluster specific deployment management
What are configuration bundles?
A set of configuration files and apps common to all peers.
Where do configuration bundles reside on cluster master and cluster peer
Cluster Master:
$SPLUNK_HOME/etc/master-apps
Cluster Peer:
$SPLUNK_HOME/etc/slave-apps
What are some of the configuration changes that require a restart?
Changes to indexes.conf, inputs,conf
Changes to a home path in indexes.conf
Dleeting an existing app
What are some of the configuration changes that do not need a restart?
Adding a new index or a new app with reloadable configs
Changes or additions to transforms.conf or props.conf
Through which port do peers communicate with?
With replication port: 8080
Through which port does master node, search head, and peers communicate?
Through management port: 8089
Through which port does forwarder push data to indexers?
Through 9997 port (recieving port)
To participate in the indexecr cluster, all nodes, including the search head must use the same…
pass4SymmKey
What is the best practice for setting up cluster master/deployment server architectures?
DS ———-> CM ———> INDEXING CLUSTER
What is search affinity?
The ability to configure a multisite indexer cluster so that each search head gets its search results from peer nodes on its local site only, as long as the site is valid. Search affinity has the benefit of reducing network traffic while still providing access to the full set of data.
What is a multisite indexer cluster?
An indexer cluster that spans multiple physical sites, such as data centers. Each site has its own set of peer nodes and search heads. Each site also obeys site-specific replication and search factor rules.
Name the 5 pros of indexer clustering
- Data Availability
- Data Fidelity
- Data resilency
- Disaster Recovery
- Search Affinity
What is Data Availbility in Splunk?
..
What is Data Fidelity?
This term is used to define when data is transmitted from one sensor node to another, retains its actual meaning and granularity.
What is Data Resliency
Data Resiliency
The term “data resiliency” refers to data’s ability to “spring back” in situations where it is compromised. In the cloud, data is resilient because it can be stored in a number of different locations. No one location is better than the other, availability is just improved by the more places data is stored, specifically in the event a location goes down or the data becomes corrupted. Users have access to data so long as the location they are storing their data at is accessible and the data isn’t compromised. If the one location goes down, users are directed to the second location. If all locations go down, then the organization no longer has access to its data.
Comparable to having keys to one’s house… The more keys you have, the less likely you are to get locked out. Hiding a key outside and keeping one hooked on your key chain assure higher resiliency. If you lose your keys or a key breaks, you can go use your hidden key outside. If you lose all the keys to your house, then you aren’t able to get in.
Resiliency is the ability of a server, network, storage system, or an entire data center, to recover quickly and continue operating even when there has been an equipment failure, power outage or other disruption.