Homework Week 4 - Indexer Layer Flashcards
- How does data flow through these 4 components of Splunk: Deployment Server, Universal Forwarder, Indexer, Searchhead.
The primary function of Universal Forwarders is to collect and forward data from source machines such as servers and routers to Splunk. The Deployment Server is used to centrally manage and configure Universal Forwarders across one’s environment; sends configurations, apps, and updates to the Universal Forwarders .Indexers are responsible for storing and indexing the data received from Universal Forwarders. The Search Head retrieves data from the Indexers based on user queries, and then presents the results to the user.
- What storage space would you allocate for the cold data retention per indexer given the following metrics: 4 TBs of daily ingestion with hot data being retained for 5 days, cold data for 60 days, and frozen data for 5 years?
6.9 TB per indexer;117.2 TB all indexers
- Give 2 examples and explanation of each: a) Virtual machines, b) Network devices, c) Databases, d) Logs, e) Configurations, f) Metrics, g) Alerts
a) Virtual Machines:
Development Server VM: Main purpose is for when a software development team needs a dedicated environment for testing new applications. A virtual machine that emulates the target production environment is created. This sort of VM allows the team to isolate their development work and test new code without affecting the production system.
Virtual Desktop Infrastructure (VDI): Virtual machines are often used to give corporate employees virtual desktops. These VMs are hosted on servers in the data center and allow employees to access their desktop environment remotely. This centralizes management, security, and reduces the need for individual desktop hardware.
b) Network Devices:
Router: Routers are critical network devices that connect different networks together. For instance, a home router connects a local area network (LAN) to the internet, allowing multiple devices in the home to access online resources.
Switch: A network switch is used within a local network to connect multiple devices (e.g., computers, printers) and efficiently manage data traffic within the LAN. They are commonly used in corporate and data center networks.
c) Databases:
Customer Database: A company maintains a customer database containing information such as names, contact details, and purchase history. This database is used by the marketing team for targeted advertising and the sales team for customer relationship management.
Inventory Database: A retail store uses a database to track inventory levels of products in real-time. When a product is sold, the database is updated to reflect the change in stock, ensuring accurate inventory management.
d) Logs:
Server Logs: Server logs record activities on a web server. For instance, access logs contain information about each web request, including the IP address of the client, the requested URL, and the response code. These logs are critical for diagnosing issues, monitoring traffic, and ensuring security.
Security Logs: In a cybersecurity context, logs from firewall devices, intrusion detection systems (IDS), and antivirus software provide a record of potential security threats and events. Analyzing these logs helps security teams detect and respond to cyberattacks.
e) Configurations:
Router Configuration: Network devices like routers are configured with settings such as IP addresses, access control lists (ACLs), and routing protocols. These configurations dictate how network traffic is managed and routed within an organization.
Application Configuration: Software applications often have configuration files that control their behavior. For example, an email client may have configuration settings for incoming and outgoing mail servers, email signatures, and notification preferences.
f) Metrics:
Website Performance Metrics: Web servers generate metrics such as response time, request rate, and error rate. Monitoring these metrics helps ensure that a website is performing well and provides a good user experience.
Server Resource Utilization Metrics: Servers often generate metrics related to CPU usage, memory utilization, and disk space. Monitoring these metrics helps IT teams proactively manage server resources and detect performance bottlenecks.
g) Alerts:
Network Intrusion Alerts: Intrusion detection systems (IDS) generate alerts when they detect potentially malicious activity on a network, such as unauthorized access attempts or suspicious traffic patterns. These alerts trigger security incident response procedures.
Application Error Alerts: Software applications can be configured to generate alerts when critical errors occur. These alerts notify administrators or developers when issues need immediate attention, helping to minimize downtime and disruptions.
- What is the relationship between the deployment servers and Universal forwarders?
Splunk administrators define configurations for data collection and forwarding on the Deployment Server. These configurations are specific to each Universal Forwarder. The Deployment Server pushes these configurations to the respective Universal Forwarders. This process ensures that all forwarders are set up correctly and consistently .Universal Forwarders periodically check in with the Deployment Server to see if there are any updates or changes to their configurations. With the configurations in place, Universal Forwarders collect data from the source machines and forward it to the designated Indexers.
- Splunk license usage is measured by
D. New data being indexed
- Describe a heavy forwarder and what it does.
Heavy forwarder is equipped with the ability to both collect data inputs, forward them to indexers, and parse the data as an indexer would. Heavy forwarders can be taken into consideration when dealing with data that require index-time extractions or as a system requirement for certain apps like DBConnect.
- Explain the process of how would you bring data in Splunk?
First the logs are sent to the indexers from the forwarder. Then the indexers take the logs and format them and organizes them into indexes. Next, configurations determine how long the data will stay in the index and when it gets moved to a slower storage or deleted. Finally, the search heads search the indexes to create visualizations and reports.
- Explain the licensing structure of Splunk – how does Splunk charge for the use of this software?
The Splunk licensing model is based on the indexing volume that is processed by Splunk on a per day basis. Splunk charges for the amount of data being ingested per day.
- Explain in your own words the two stages of indexing and what each stage does to the data.
The two stages of indexing, parsing, and indexing, help to prepare data for effective analysis in Splunk.
Parsing examines raw data and extracts meaningful information from it, while indexing creates an organized, searchable catalog of that parsed data.
Together, these two stages facilitate the efficient searching, analysis, and visualization of data, which is an essential element of Splunk.
- What is the default port of an indexer and what does it do?
The default port of an indexer is port 9997. It enables the indexers to receive data from the forwarder.
- What are some attributes of a monitoring stanza?
Some attributes of a monitoring stanza are, monitor, disabled, sourcetype, and index.
- Tell me about your environment?
In my environment we have a current quota of about 50TB, and we are currently ingesting about 49TB per day with 600 users. We have about 290 indexers, with close to 32,000 forwarders and about 12 search heads.
- What is the file path to the location of Splunk’s buckets?
$SPLUNK_HOME/var/lib/splunk
- What is the difference between the following attributes of indexes.conf: maxTotalDataSizeMB and frozenTimePeriodInSecs?
The difference between the two is maxTotalDataSize is measured in MB and it controls when data rolls over by size.
frozenTimePeriodInSecs is measured in seconds and controls when data rolls over from cold to frozen by time.
- How would we configure hot bucket retention by time in indexes.conf?
Use the maxHotSpanSecs setting