Module 2: Index Management Flashcards
Splunk stores data in ___, which are organised in directories and files in disk.
indexes
What are the features of a Splunk index?
- Indexes are stored in $SPLUNK_HOME/var/lib/Splunk
- the directory location is customisable for each index
- Indexes contain raw data and index files
- Indexes can be created by an admin
- There are many prebuilt indexes such as_internal, _audit and main.
What is inside a Splunk index?
- Raw data stored in a compressed format
- Tdsidx files - time series index files that point to the raw data
- Metadata – files such a sources.data, sourcetypes.data and hosts.data
Indexes store data in ___. __ are a set of directories organised by age.
Buckets
What are the bucket types?
Bucket types are hot, warm, cold. froze, thawed
Hot buckets
- newest data
- open for read and write
- can be more than one hot bucket in an index
- searchable
- roll to warm when reaching certain size, or upon a Splunk restart
Warm buckets
- created when hot buckets roll
- not open for writing, but searchable
- reside in the same directory as hot buckets but renamed
- roll to cold buckets when exceeding max setting.
Cold buckets
- Starting from the oldest bucket (based on time), warm buckets roll to cold
- Reside in different directory from hot and warm
- Searchable
- The directory location can be configured
- Possible to save cost by using cheaper storage.
Frozen buckets
- After cold buckets age out based on policy, the roll to frozen
- Default action is to delete, but can be configured to archive.
- coldToFrozenDir or coldToFrozenScript in indexes.conf configures archiving
- Archived frozen buckets are not searchable
Thawed buckets
- Frozen buckets can be thawed
- Thawed buckets are rebuilt into index and searchable
- Location can be configured in indexes.conf
- No age restriction
- Use Splunk rebuild command to rebuild data into the index
Where is the bucket location of hot/warm buckets?
$SPLUNK_HOME/var/lib/Splunk/indexname/db
Where is the bucket location of cold buckets?
$SPLUNK_HOME/var/lib/Splunk/indexname/colddb
Where is the bucket location of frozen buckets?
• Frozen buckets are deleted by default. Can be optionally archived
Where is the bucket location of thawed buckets?
$SPLUNK_HOME/var/lib/Splunk/indexname/thaweddb
How are hot buckets named?
hot_v1_
How are warm buckets named?
• Warm – db_newesttime_oldesttime_localid
o Local id is the ID of the bucket
o Newest and oldest time are in UTC epoch time in seconds
Why create multiple indexes?
- Security – RBAC options set by index
* Retention policies are applied at index level
What is index data integrity?
- Set in GUI, or enableDataIntegrityControl=true in indexes.conf
- Allows to ensure indexed data has not been tampered with
- Creates hash files (SHA 256) as the data is indexed
- Check intregrity: ./Splunk check-intregrity -index [index name]
- Regenerate hash: ./Splunk generate-hash-files -index [index name]
- Hash files are stored in the rawdata directory within the index
___.conf is used to configure Splunk indexes and their properties
indexes
Where is the default indexes.conf saved?
default indexes.conf in $SPLUNK_HOME/etc/system/default, but do not edit this. Create a new indexes.conf in $SPLUNK_HOME/etc/system/local
What is the structure of indexes.conf?
- Default stanza for defining global properties
- If a property is defined outside of any specific stanza, at the top of the file, it is considered a global property
- If a property is defined at both global level and in a specific stanza, value in the specific stanza takes precedence
- If there are multiple definitions of the same settings in a stanza, last setting wins
What is the fishbucket?
- fishbucket is a special Splunk internal index that is automatically created
- keeps track of ingestion progress of monitored files and directories
- located in $SPLUNK_HOME/var/lib/Splunk/fishbucket
- upon restart, using fishbucket, Splunk can start ingesting from where it left off.
How can you use the fishbucket to reindex files?
• reindex a particular file (btprob command)
o ./Splunk cmd btprob -d $SPLUNK_HOME/var/lib/Splunk/fishbucket/splunk_private_db –file –reset
• Reindex all monitored files (remove entire fishbucket directory)
o Rm -rf /opt/Splunk/var/lib/Splunk/fishbucket
o You must restart Splunk forwarder
What is data retention?
- must define a retention policy for the data indexed
- retention policy applied at index level
- set maxTotalDataSizeMB and/or frozenTimePeriodInSecs. maxTotal overrides frozenTime
- indexes.conf is used to configure retention policy
the frozenTimePeriodinSecs is set to _ __ by default.
6 years
What happens to expired data?
• When the bucket rolls from cold to frozen, by default the data is deleted
• If coldToFrozenScript is configured, the scrip is executred
• If coldToFrozenDir is configured, Splunk moves the expired buckets to this directory. This is what you do if you want to archive frozen buckets
• To restore expired data, copy the archived buckets to thaweddb location and rebuild
o ./Splunk rebuild $SPLUNK_HOME/var/lib/Splunk//thaweddb/
How do you back up Splunk?
- must regularly back Splunk
- back up warm and cold buckets, and entire etc directory
- hot buckets can’t be backed up without stopping Splunk, as Splunk is actively writing to them
- in a clustered environment, you may not need to back up data as it is replicated.
- in distributed environments, ensure you backup search heads and heavy forwarders
How do you delete events in a Splunk index?
- best way is to let the data expire instead of deleting
- can only do a virtual delete, not removed from disk
- even admins can’t delete data by default
- create a user with a can_delete capability
- run a search to list the desired events to be deleted, pipe the delete command
How do you delete ALL events from a Splunk index?
- extremely dangerous command in production
- Destroys index data from disk
- Syntax: Splunk clean all -index
- If you do not specifiy an index name, all indexes are considered for deletion