1.0 Deploying Splunk Flashcards
What is an SVA
Proven reference architectures for stable, efficient, and repeatable deployments. Guidelines and certified architectures to ensure that their initial deployment is built on a solid foundation.
Why and how does Splunk grow from standalone to distributed?
- Ingests more data
- Distributed search across indexers
- Adding high availability
- Dedicating LM and CM
- Adding ES
- SH Cluster for searching
- Disaster Recovery
What does and doesn’t SVA provide?
- Implementation choices (OS, baremetal vs. virtual vs. Cloud etc.).
- Deployment sizing.
- A prescriptive approval of your architecture.
- A topology suggestion for every possible deployment scenario.
What does and doesn’t SVA provide?
Does:
- Implementation choices (OS, baremetal vs. virtual vs. Cloud etc.).
- Deployment sizing.
- A prescriptive approval of your architecture.
Doesn’t
- A topology suggestion for every possible deployment scenario.
What is HA? How can Splunk accomplish?
continuously operational system bounded by a set of tolerances
ex. IDX cluster. 1 node goes down - still send data to others
SHC - multiple SHs can look at the data.
What is DR? How can Splunk accomplish?
Process of backing-up and restoring service in case of disaster.
- Standby nodes - backed up copies of node managers
- Multisite
- SF and RF
What instances are suitable to become MC?
- Dedicated SH that is has connectivity to entire environment.
NEVER INSTALL ON:
- Prod (distributed) SH
- Member of SHC
- An IDX
- A DS OR LM with > 50 clients
- Deployer sharing with CM
https://docs.splunk.com/Documentation/Splunk/8.0.3/DMC/WheretohostDMC
How to configure MC for single or distributed environment?
Single:
1) In Splunk Web, navigate to Monitoring Console > Settings > General Setup.
2) Check that search head, license master, and indexer are listed under Server Roles, and nothing else. If not, click Edit to correct.
3) Click Apply Changes.
Distributed:
1) Log into the instance on which you want to configure the monitoring console. The instance by default is in standalone mode, unconfigured.
2) In Splunk Web, select Monitoring Console > Settings > General Setup.
3) Click Distributed mode.
4) Confirm the following:
The columns labeled instance and machine are populated correctly and show unique values within each column.
- The server roles are correct. For example, a search head that is also a license master must have both server roles listed. If not, click Edit > Edit Server Roles and select the correct server roles for the instance.
- If you are using indexer clustering, make sure the cluster master instance is set to the cluster master server role. If not, click Edit > Edit Server Roles and select the correct server role.
- If you are hosting the monitoring console on an instance other than the cluster master, you must add the cluster master instance as a search peer and configure the monitoring console instance as a search head in that cluster.
- Make sure anything marked as an indexer is actually an indexer.
5) (Optional) Set custom groups. Custom groups are tags that map directly to distributed search groups. You might find groups useful, for example, if you have multisite indexer clustering in which each group can consist of the indexers in one location, or if you have an indexer cluster plus standalone peers. Custom groups are allowed to overlap. For example, one indexer can belong to multiple groups. See Create distributed search groups in the Distributed Search manual.
6) Click Apply Changes.
If you add another node to your deployment later, click Settings > General Setup and check that these items are accurate.
Why do server roles matter MC?
Server roles are used to create searches, reports, and alerts based off what server roles are specified.
Why do groups matter MC?
Groups are used to in order to correlate among similar instances. Single clusters etc.
How are health checks performed on the MC?
Each health check item runs a separate search. The searches run sequentially. When one search finishes, the next one starts. After all searches have completed, the results are sorted by severity: Error, Warning, Info, Success, or N/A.
You are able to disable and enable certain health check items as needed as well as change their threshold.
The Health Check page lets you download new health check items provided by the Splunk Health Assistant Add-on on splunkbase.
Or you can create a new health check option.
What authentication methods are supported by Splunk?
LDAP - can’t use if SAML is enabled
SAML and SSO
Native Splunk accounts (created locally/internally)
Scripted authentication
Describe LDAP concepts.
Standard for accessing AD creds and services.
LDAP directories are arranged in a tree-like structure. The information model is based on entries:
- The distinguished name (DN) is based off attributes
cn=admin1,ou=people,dc=splunk,dc=com
Tree structure with cn at bottom and dc at top.
Describe LDAP configs.
https://docs.splunk.com/Documentation/Splunk/8.0.3/Security/ConfigureLDAPwithSplunkWeb
Authentication.conf
- host =
- port =
- groupBaseDN =
- groupMemberAttribute =
- groupNameAttribute =
- realNameAttribute =
- userBaseDN =
- userNameAttribute =
List SAML and SSO options
Review Slide Deck
Configure:
1) download the Splunk Service Provider Metadata file
2) Import the IdP metadata into Splunk
- SSO
- SLO (optional)
- IdP cert path
- IdP cert chains
- Replicate certs
- Issuer ID
- Entity ID
- Sign AuthnRequest
- Verify SAML Response
Roles in Splunk?
admin – this role has the most capabilities assigned to it.
power – this role can edit all shared objects (saved searches, etc) and alerts, tag events, and other similar tasks.
user – this role can create and edit its own saved searches, run searches, edit its own preferences, create and edit event types, and other similar tasks.
can_delete – This role allows the user to delete by keyword. This capability is necessary when using the delete search operator.
How can roles secure data?
Restrict index access capability by roles.
How can data be ingested by indexer?
Monitored Batch Script - opt/spl/etc/apps/bin/ Modular inputs Syslog Network inputs - http Splunk tcp REST
How does Splunk communicate with Splunk?
Ports: 8000 - web 8089 - mgmt 8088 - HEC 9997 - tcp listening 9887 - replication indexers - shc replication 8191 - kv store 514 - network input
Troubleshoot data inputs - monitor:
TailingProcessor for monitor inputs:
splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus
Will show:
- what files it found
- whether they matched the wild card
- how far into the file it read
./splunk list monitor
list of currently monitored inputs.
Troubleshoot data inputs - conf files:
splunk btool conf-name list –debug
gives on-disk configs
What are examples of indexing artifacts
rawdata - compressed form (journal.gz)
Time Series Index (tsidx) - indexes that point to raw data
Buckets - directories of index files organized by age
- splunk_home/var/lib/splunk/myindex/db
- bucket locations defined in indexes.conf
Describe event processing
Splunk processes incoming data, stores result events in index
When Splunk indexes events:
- configures character set encoding
- configures line breaking for multi line events
- identifies timestamps
- extracts fields
- segments events
Name and describe the data pipelines
Parsing:
UTF-8 - Splunk will attempt to apply UTF-8 encoding to data
Line Breaker - Splunk will split data stream into events using default line breaker
Header - Splunk can multiplex different data streams into one “channel”
Merging/Aggregator:
- Splunk will merge lines separated by line breaker into events
Line breaking v line merging: LINE_BREAKER & SHOULD_LINEMERGE
Determining Time: TIME_PREFIX, TIME_FORMAT, MAX_TSL, DATETIME_CONFIG
Typing:
Regex replacement – performs any regular expression replacements called on in props.conf/tranforms.conf
Annotator – extracts the punct field
Indexing:
TCP/Syslog out – sends data to a remote server
Indexer – writes the data to disk
Describe the underlying text parsing process:
Splunk breaks events into segments at index and search time.
Index time - uses segments to create lexicons - which point to where on disk
Search time - uses segments to split terms
Describe Times Series Index
Optimized to execute arbitrary boolean keyword searches and return millions of events in revers time order
Inverted index
- allows fast full text searches
- maps keywords to locations in raw data
2 components:
lexicon
value arrays containing info about events
What is a lexicon TSIDX?
Each unique term from the raw event has its own row. Each row has a list of events containing the given term.
Looks like a table Term. Postings List (event #) bacon. 0 beets. 2, 5, 7 crab 0, 1, 9
What is a lexicon TSIDX?
Each unique term from the raw event has its own row. Each row has a list of events containing the given term.
Looks like a table Term. Postings List (event #) bacon. 0 beets. 2, 5, 7 crab 0, 1, 9
What is a TSIDX Value Array?
Each raw event has its own row
Each row contains metadata including the seek address
Looks like a table #. Seek address. _time. host. source. sourcetype.
What is data retention?
Management of the storage of indexed data. Allows Splunk to expire old data and make room for new data.
*Most restrictive rule wins - prompts a change in state
What are retention bucket controls?
maxDataSize - max size for hot bucket
maxWarmDBCount - max num of warm buckets
maxTotalDataSizeMB - max size of an index -> cold to frozen
frozenTimePeriodInSecs - max age of bucket -> cold to frozen
homePath.maxDataSizeMB - max size for hot/warm storage
coldPath.maxDataSizeMB - max size for cold storage
maxVolumeDataSizeMB - max size for volume
maxHotBuckets - max number of hot buckets
timePeriodInSecBeforeTsidxReduction - how long indexers retain tsidx files
Bucket Controls - Volumes
- Allows you to manage dis usage across multiple indexes
- Allows you to create max data size for them
- Typically separates hot/warm from cold storage
- Take precedence over other bucket controls
What is the bucket control precedence?
“Most restrictive rule wins”
- Oldest bucket will be frozen first
- Age determined by most recent event
- Hot buckets are measured by size but are exempt from age controls
What are typical storage multiplication factors?
.35 SF and .15 RF
Review formulas
What is the search head dispatch search sequence? (same with stats)
User Query -> SH -> Check search quota -> Check disk -> dispatch directory -> indexers
Sequence of events for searching for events in Splunk?
SH sends search request:
1) request is received by indexer
2) Indexer checks disk
3) Creates dispatch directory in var/run/spl/dispatch
4) configures subsystem - initializes configs (props, transforms etc) using bundle identified by SH
5) Implement time range - finds buckets in range
6) uses bloom filters to minimize resource usage
7) checks the lexicon- find events matching keywords within the lexicon (tsidx files)
8) Use results returned to find the event offsets within raw data from the values array
9) uncompresses raw data - uncompresses appropriate raw data to get the _raw event
10) Process field extractions
11) Send results to the search head
What metrics does the job inspector provide?
Time spent in search
Time spent searching the index
Time spent fetching data
Workload undertaken by search peers
What is the REST endpoint to find job properties?
REST /services/search/jobs
Search job inspector components?
Header
Execution costs
- categories listed as command.* and command.search.* reflect various phases of the search process
Search job properties
- Bundle
- Can summarize
- Create time
- Cursor time
- diskUsage
- dropCount
What are the type of search commands?
Generating Streaming Transforming Centralized (stateful) streaming Non-streaming
Characteristic of a generating search command?
Invoked at beginning of a search.
Does not expect or require an input
| search is implied
Characteristic of a transforming search command?
Generates a report data structure
Operate on the entire event set
ex. chart, timechart, stats
Characteristic of a streaming search command?
Operates on each event individually
Distributable streaming - run on indexers
ex. eval, fields, rename, regex
Characteristic of a centralized (stateful) streaming search command?
Runs on SH
ex. head, streamstats
Characteristic of a non-streaming search command?
Fore the entire set of events to the SH (sort, dedup, top
Examples of a generating search command?
search datamodel inputcsv metadata rest stats
Examples of a streaming search command?
fields lookup rex spath where
Examples of a centralized streaming search command?
head eventstats streamstats tail transaction lookup local=t
Examples of a transforming search command?
append chart join stats table timechart top
How does Splunk minimize searching?
Parsed map-reduce
Job inspection -> Search job properties (2 parts):
remoteSearch - done on indexer
reportSearch - done on SH
Splunk tstats search sequence?
1) request received from SH on indexer
2) checks disk
3) creates remote dispatch folder
4) configs subsystem
5) checks time range
6) bloom filter
7) splunk lexicon
8) results to SH
No raw data, no field extractions, doesn’t use results to offset in array
When to use sub-searches?
- Small result sets. Max of 10000 events. Max runtime of 60 sec
- Certain commands require (join, set)
- Used to produce search terms fr outer search
aka find subset of hosts, determine time, craft main search string dynamically
*subsearches always run first before main
When not to use sub-searches?
For subsearches that return many results - better to use stats or eval
- typically subsearches take longer than main
- GUI provides no feedback when subsearch runs
Best practice for maximizing search efficiency:
- filter early
- specify index
- utilize indexed extractions where avail
- use the TERM directive if applicable
- place streaming/remote commands before non-streaming
- avoid using table, except very end. Causes data to be pushed to SH
- Remove unnecessary data using | fields
Describe a deployment app:
An arbitrary unit of content deployed by the DS to a group of deployment clients
- Fully developed apps (Splunkbase)
- Simple groups of configs
- Usually focused on a specific type of data or business need
Where are deployment apps store on DS and client?
DS: /etc/deployment-apps —> Client: etc/apps
What is the DS?
A centralized config manager that delivers updated content to deployment clients.
- units of contents know as deployment apps
- operates on a “pull” model - clients phone home
- DS does not have SHC or IDXC clients
DS capacity
2000 polls/minute (Windows) 10000 polls/minute (Linux)
- Utilize the phoneHomeIntervalInSecs attribute in deploymentclient.conf
For more thank 50 clients the DS should be on its own server.
App deployment process from DS:
1) Client polls at certain interval: Client X, architecture
2) Determine apps for client using serverclasses
3) List of apps and checksums sent to client
4) Compares remote and local lists to determine updates
5) Client downloads now or updated apps from DS
6) Client restarts if necessary
What is a client state on the DS?
The client records its class membership, apps, checksums.
Client caches the bundle (tar archive) with the app content
App update procedure from DS?
App removed from DS, app removed from client
If app update is found on DS, client will delete and download a new copy
- Apps that store user settings locally will have those settings “erased”
- crossServerChecksum uses checksum rather than modtime
Describe deployment system configs
Use base configs to provide consistency
Benefit of base configs
Fast initial deployment time
Reusable, predictable, supportable
Faster troubleshooting
Common naming scheme
Base configs by deployment type:
Review
How is forwarder management maintained?
Serverclass:
- allows you to group Splunk instances by common characteristics and distribute content based on those characteristics
blacklist (takes precedence), whitelist, filter by instance type
Editing the serverclass.conf - what are the stanza levels?
[global] - global level
[serverClass:serverClassName] - Individual serverclass, can be multiple erverClass stanzas -one for each serverClass
[serverClass: serverClassName:app:appname] - app within the server class. Used to specify apps the serverclass applies to - one for each app in the serverClass
Serverclass non-filtering attributes?
repositoryLocation stateOnClient - only thing enabled by default restartSplunkWeb restartSplunkd issueReload
Types of DS scaling
Horizontal
- all DS are peers
- all DS are on same level
- all DS respond to clients
Tiered
- primary DS that servers other DS (parent/child)
- any peers can respond to client
Why would you need more than 1 DS?
Multiple regions
More than 2000/10000 clients
Network Segregation
HA is required
Load balancer if too many clients are phoning home
What do you need to set in tiered DS? child and parent
child:
deploymentclient.conf -
repostitoryLocation= $SPL_HOME/etc/deployment-apps
serverRepositoryLocationPolicy = rejectAlways
child and parent:
serverclass.conf -
crossServerChecksum = true
What are the components of an indexer cluster?
Master node - a single node to manage the cluster
Peer nodes - to index, maintain, and search the data
Search heads - one or more to coordinate searches across peers
Purpose of the cluster master:
- Validates config settings before sending to the indexers
- Monitors indexer peers and attempts to migrate node failures
- Acts as a single point of contact for the SHs/MC
Describe the indexer cluster communication sequence:
1) Indexers stream copies of their data to other indexes
2) Master node coordinates activities involving search peers and search head
3) Forwarders send load-balanced data to peer nodes
4) Indexers send search results to SH
Steps in deploying indexer cluster:
1 - identify requirement 2 - install Splunk Enterprise on instances 3 - enable clustering 4 - complete peer node configuration 5 - forward master node data to peers
What requirements are needed when determining your idxc?
DR and failover needs single site v multi site RF SF Quantity of data indexed and search load
Threshold on RF?
Do not set = # of indexers bc cluster will not be able to handle failures
Enable the master node CLI and .conf for IDXC:
./splunk edit cluster-config -mode master -replication_factor -search_factor -secret your_key -cluster_label cluster1
server.conf [clustering] mode=master replication_factor = search_factor = pass4SymmKey = pwd cluster_label = label
Enable the peer node CLI and .conf for IDXC:
./splunk edit cluster-config -mode slave -master_uri https://:8089 -replication_port 9887 -secret your_key
server.conf [clustering] mode=slave master_uri = https://:8089 pass4SymmKey = pwd
[replication_port://9887]
disabled = false
Enable the SH node CLI and .conf for IDXC:
./splunk edit cluster-config -mode searchead -master_uri https://:8089 -replication_port 9887
server.conf [clustering] mode=searchhead master_uri = https://servername:8089 pass4SymmKey = pwd
Enable the SH node CLI and .conf for multi-site IDXC:
./splunk add cluster-master -master_uri https://:8089 -secret your_key
server.conf
[clustering]
mode=searchhead
master_uri = clustermaster:one, clustermaster:two
[clustermaster:one]
master_uri = https://:8089
pass4SymmKey = pwd
[clustermaster:two]
master_uri = https://:8089
pass4SymmKey = pwd
How can you guard against data loss?
Indexer acknowledgement - which retains a copy of the raw data until events are acknowledged
What is indexer discovery?
forwarders query the master node to get a list of all indexers in cluster
Where do apps to be pushed to indexers live on the master?
etc/master-apps –> etc/slave-apps
How to forward Splunk internal data to indexers?
outputs.conf
[tcpout]
defaultGroup = peer_nodes
forwardedindex.filter.disable = true
[tcpout:peer_nodes]
server=server1:9997, server2:9997, etc
Index cluster upgrade procedure:
1 Cluster master
2 SH tier
3 Indexers
Options:
Tier by tier
Site by site
Rolling peer-by-peer (7.1.x+)
Forwarders only should be upgraded opportunistically
Jobs of the CM
- Listens for cluster peers - adds to cluster when peer registers
- Waits for RF number of peers to be satisfied before starting its functions
- Listens for heartbeat of peers - if doesn’t hear back for x amount of time peer it is marked down
- Checks the manifest of buckets provided by all peers to determine if policy is met. If not fix up is triggered
Where does CM config bundle live?
SPL/var/run/splunk/cluster/remote-bundle
bundle contains the contents from master-apps
What are the 3 parts of a bucket ID
index name
local ID
orig indexer GUID
Hot bucket lifecycle
1 idx notifies CM of new hot bucket
2 CM replies with list of streaming targets for replication
3 orig indexer begins replicating to new indexer
Warm bucket lifecycle
1 IDX notifies CM when bucket rolls to warm
2 Rep target notified that bucket is complete and rolls to warm
Frozen bucket lifecycle
CM notified when freezes
CM stops doing fix up tasks
What happens if min free space is hit on indexer?
stops processing events
CM is notified and idx enters detention
What happens if indexer is in detention?
Auto:
Stops indexing internal and external data.
Stops replication.
Doesn’t participate in searches.
Manual:
Stops indexing external data (can be switched)
Stops replication
What happens when CM goes down?
Cluster runs as normal as long as there are no other failures
If peer creates hot bucket - it will try to contact master and fail. It will continue sending to previous peers
SH will continue to function but will eventually begin to access incomplete data
Forwarders continue to send to their list
When it comes back up:
Master starts fix up tasks
Peers continue to send heartbeats and reconnect
How to replace CM?
No failover
Must have standby
Copy over server.conf and master/apps
Ensure peer nodes can reach new master
What happens when indexer goes down?
Stops sending heartbeat.
Master detects after 60s and starts fix up tasks
Searches will continue but only provide partial results
What happens when indexer comes back up?
Starts sending heartbeat.
Master detects and adds back to cluster
Master rebalances cluster
IDX downloads latest conf bundle from master (if necessary)
Multi-site clustering configs
A multi-site cluster requires additional configuration in the [clustering] stanza
- The Cluster Master requires at least one host per site
- Origin site is the site originating the data or the site where data first entered the cluster
- origin : is minimum number of copies held on the origin site
- site#: defines the minimum copies for that site
- total: defines the total copies across all sites
server.conf [general] site = site2 [clustering] mode = master multisite = true available_sites = site2, site8, site44 site_replication_factor = origin:2, site8:4, total : 8 site search factor = origin:1, site8:l, total : 4 constrain_singlesite_buckets=false
Single-site to multi-site characteristics:
After migration from single to multi:
- cluster holds both sing and multi buckets
- buckets created with marker in journal.gz indicating site origin
- All SHs and CM are required to declare site membership
- Indexers only return data if their “primary” status matches the requested site
Indexer migration to new indexers
1 install new indexers
2 add new indexers to cluster and ensure they receive common configs
3 Decommission and remove old indexers from master list by removing one at a time to allow bucket fix up to complete (selectively fail node) ./splunk offline –enforce-counts
1) Install new Indexers
2a) Bootstrap indexers to join the cluster as peers
2b) Ensure new indexers receive common configuration
- Distribute files and apps with the configuration bundle
- Common Files: indexes . conf, props . conf, transforms . conf
3a) Prepare to decommission old indexers
- Point forwarders to new indexers
- Put old indexers into detention
3b) Decommission old indexers (one at a time)
- Run command splunk offline –enforce-counts
- Wait for indexer status to show as GracefulShutdown in CM Ul
- Repeat for remaining indexers
- CM will fix / migrate buckets to new hardware
3c) Remove the old peer from the masters list
Upgrade CM
stop master
upgrade using normal Splunk Enterprise procedure
start the master
Upgrade search tier
stop all SHs
Upgrade using normal procedures
- if integrated with IDXC
- upgrade one member and make it the captain
- upgrade additional members one by one
- upgrade deployer
start SH
Upgrade IDX tier
- on CM enable maintenance mode to prevent unnecessary fix-ups
- stop indexers
- upgrade normally
- start indexers
- on CM disable maintenance-mode
Or searchable rolling restart on indexers
Manage SHC
Deployer - pushes out apps
SHs replicate knowledge object
SHC Deployment
1) identity reqs
2) set up deployer
3) Install Splunk instance
4) initialize cluster members
5) bring up cluster captain
6) perform post-deployment set up
Deployer set up config:
server.conf
[shclustering]
pass4SymmKey = pwd
shcluster_label = label
How to initialize SHC members
splunk init shcluster-config - auth usr:pwd -mgmet_uri servername:port -replication_port -replication_factor -conf_deploy_fetch_url :<8089> -secret key -cluster_label label
SHC benefits
- Horizontal scaling for increased capacity
- HA for scheduled search activity
- Centralized management of baseline configs
- Replication of user generated content for consistent user experience
What does SHC captain do?
Coordinates replication of artifacts, maintains registry.
- artifacts stored in /var/run/splunk/dispatch
Pushes knowledge bundles to peers
Replicates runtime config updates
Assigns jobs to members based on relative current loads
What causes new captain election SHC?
Uses a dynamic election - election occurs:
- current captain fails or restarts
- network errors cause 1+ members to disconnect
- current captain steps down after detecting that majority of members have stopped participating in the cluster
- New captain is elected with majority vote
Captain Election Implications
Cluster consists of at least 3 members
Captain election requires 51%
If deploying across 2 sites - primary site must contain majority nodes bc network distruption will still allow election
Why/how to control captaincy SHC?
server.conf preferred_captain=true
- to have one member always run as captain
- you don’t want captain performing ad hoc jobs
- repair the cluster