Section 5.1 Flashcards
What does the Indexing Layer do?
Allows you to clean up data.
Allows you to refine data.
Allows you to store data.
What is Index clustering?
When multiple indexers are connected in order to replicate copies of the indexers buckets (data).
Where is data stored?
In indexes on the indexer that have buckets.
What is automatic failover?
Basically backing up data. If one indexer fails, the others will pickup the slack and maintain continuity.
High availability means…
Data is highly available for searching.
Index Clustering in summary means
Data is protected from sudden loss
More copies are available for users who are actively searching
Indexer activities will continue in the event an indexer goes down
Replication Factor determines
How many copies are maintained within an indexer cluster.
Deafult RF is 3
Maximum RF is determined by the number of indexers you have or nodes.
Search Factor determines
How many of these copies are immediately searchable.
Default SF is 2
In a clustering environment you need a minimum of ____ Indexers
3
Most important fact about a Search Factor (SF)
The Search Factor can never be more than the Replication Factor.
Explain RF & SF
RF factor tells us how many times we want the data to be copied over. Two of those copies are highly available and just incase something happens to the first copy. If both copies go down, the third copy is usually stored at an offsite location.
When does the Cluster Master come in?
The Cluster Master comes into play when we start copying our data (when the environment becomes clustered).
Cluster Master Manages what layer?
It manages the indexing layer.
What is the Cluster Master?
A centralized configuration Manager who’s job is to manage the indexer cluster.
Once the environment becomes clustered, the Deployment Server….
Only manages the forwarders.
What does a Cluster Master do?
Manages cluster activities (adding peers, distributing configurations, determines the number of copies to maintain).
Maintains memory of peers, their buckets, and configs
Tells search head where to request data.
What are Peers (Cluster Peer)?
Peers are Indexers
What do Peer Nodes do?
Peers receive and index incoming data typically from forwarders)
Replicate data to other peers
Respond to incoming searches by supplying search results
A clustered architecture is called ..
A distributed search
Clustering is Smart because it provides….
Data Availability
Data Fidelity
Data Resiliency
Disaster Recovery
Search Affinity
Multi-site clustering =
Storing copies of your data at a different site
Data fidelity =
The act of not losing data; reliability
Benefits of Clustering =
1.Data Availability & fast recovery
2.Easier overall administration
3.Scalability of indexing
4.No additional cost for data replication
Cons of clustering =
1.Increased storage requirements
2.Increased processing load
3.Requires additional Splunk instances
4.Indexers require the same OS and versions
When you enable a search head in cluster environment you must specify what?
Cluster settings (i.e. Master Node) and the port on which it receives data.
Transforms.conf=
specify transformations and lookups that can then be applied to any event
What is the filepath of the CM that sends apps to its peers ?
splunkhome/etc/master-apps
Where do bundles reside for cluster peer?
splunkhome/etc/slave-apps
Splunkhome etc slave apps =
where you will always find pushed configuration files (sent from CM to indexer)
Config changes that require restart?
A.Changes to indexes.conf,inputs.conf
B.Home path changes to Indexes.conf
C.Deleting an existing app
Configuration changes that do not need a restart ?
Adding a new index or new app with reloadable configs
Changes or additions to transforms.conf or props.conf
Tell me about your environment
In my environment we have a current quota of about 50TB, and we are currently ingesting about 49TB per day with 600 users. We have about 290 indexers, with close to 32,000 forwarders and about 12 search heads.
Environment with too many forwarders for you to manage one at a time-what Splunk instance would you install and how would you configure it to manage all the forwarders?
Use Deployment Server and put the forwarder in serverclass and create deployment apps to configure all of them.
In your deployment app you are Configuring inputs.conf to bring in new data-you then search with search head and cannot find the data. What happened?
-didn’t send deployment apps to correct serverclass
-mistake in monitoring stanza
-did not put right index
-severclass has not phoned home
-turn monitoring on(BEST ANSWER)
-Splunk does not have permissions to read source file
what directory must you place your inputs.conf file in the deployment app
local directory
indexer uses what port
9997
fishbucket index importance
allows you to see how far into a file indexing has occurred-helps to avoid duplicates and comes in handy after server shutdown or connection errors.
advantages of indexer clustering
1.Data Availability & fast recovery
2.Easier overall administration
3.Scalability of indexing
4.No additional cost for data replication
A. Data Availability = how often your data is available to be utilized.
B. Data Fidelity = the act of not losing data.
C. Data Reliability = refers to the accuracy, consistency, and dependability of the data being ingested, indexed, and queried within the platform.
D. Data Resiliency = platform’s ability to maintain data availability, integrity, and accessibility even in the face of unexpected failures.
E. Disaster Recovery = set of processes and strategies put in place to ensure availability and continuity of Splunk services and data.
F. Search Affinity = search local sites; mechanism for intelligently routing and distributing search jobs across a distributed Splunk environment.
explain data availability
how often your data is available to be utilized
who manages all indexes in cluster environment? Explain
Cluster Master/Master Node
how would you configure hot bucket to roll over by time
Maxhotspansecs
default port used for replication
8080 is replication, 8089 is the management port(goes between config manager and clients-ds vs clients and then CM vs indexers-to ANY client it is managing), and 9997 is the data (receiving port)
what is metadata and what does it contain?
Meta data=bar code=tells you where a product is coming from (ip address, log path, and format of data)
What is source
name of the event or other input from which the event originates
give examples of sourcetypes you worked with
json and syslog or CSV
what is the largest sourcetype you have worked with?
syslog is network data and large
high availability
-High availability=when we are replicating data within our indexers
-Multiple copies available for searching
-Data gets into our indexers in round robin fashion
distributed search?
key feature that allows you to search and analyze data across multiple Splunk instances or indexers in a distributed Splunk deployment. This is especially useful in large-scale environments where the volume of data to be searched and analyzed exceeds the capacity of a single Splunk instance.
how replicated buckets are stored in indexers
1.once the data comes to the indexers the method of distributing data will be round-robin 2.once the data is written on the indexers 3.then the process of replicating data will move from indexer to indexer trying to find a healthy one to store that specific data.
how does forwarder distribute data among indexers without replication (regular data)
round robin fashion
reloading vs restarting DS
When updating clients of the DS-reload deployment server
when you make updates for DS itself you restart DS.
when increasing ingestion in cluster environment
add more indexers to the cluster
some considerations to consider when going into clustered environment
cost of more splunk instances
ingestion of data
storage requirements
processing requirements
You notice that your newly monitored data is not in the index that you have configured it to be in. Where is data possibly being stored and how would you troubleshoot it?
Go to the inputs.conf and validate that the ‘index’ is correct.
If index is wrong it will be in the main index
Recently got fresh new data in the splunk
Hot bucket
Under what circumstances would the data in the hotbucket stop writing?
If the hot bucket is too full or if their is restart.
In order to have have splunk search head what would you need to download?
Splunk Enterprise
Maximum number of concurrent users per search head
12
What is Maxhotbucket?
Maximum hot bucket that can be in an index
Which default port is for replication?
8080 port
What is the thawing process
Frozen data has to be unthawed and sent back to cold
Move that file into thaw directory and rename it to a name that splunk recognizes
What must happen before indexer can be part of a cluster?
Indexer must become cluster member
Cluster Master/Master Node
You only need ONE
Internal Index?
Used for troubleshooting; stores all Splunk components’ internal logs and processing metrics.
Searches for logs that say ERROR or WARN
Monitoring stanza in Windows vs Linux
Windows = [monitor://C:\app\log\data\catalina.out]
Linux = [monitor:///another/random/path]
Two types of files indexes consist of
raw data (full log files) and indexed files (tsidx)
To disable the monitor to stop sending logs
Go to monitoring and change disable to true or 1
Explain summary index
Summary indexing allows you to run fast searches over a large data set by scheduling Splunk to summarize data then import data into the summary index over time
When increasing ingestion of data by 2 TB what will you have to do? In clustered environment-how would you accommodate it.
Adding indexers to the cluster to accommodate growth.
What directory are apps deployed to in a clustered environment
slave-apps filepath
When will you use management port?-8089
when CM is communicating with with its clients or slaves