Module 10: Data Protection (Data Deduplication + Data Archiving) Flashcards
What are the cons of duplicate data?
impacts backup windows
increases network bandwidth
difficult to protect data within budget
What is data deduplication?
process of detecting and identifying the unique data segments within a given set of data to eliminate redundancy
What is the deduplication ratio?
ratio of data before deduplication to the amount of data after deduplication
What are the key benefits of data deduplication?
reduces infrastructure costs
enable longer retention periods
reduces backup windows
reduces network bandwidth
What is source based deduplication?
data is deduplicated at the source (backup client)
When is source based deduplication recommended?
ROBO environments
also commonly used by cloud service providers
What are the advantages of source based deduplication?
reduces storage capacity and network bandwidth requirements
What is target based deduplication?
data is deduplicated at the target (inline vs postprocess)
What are the advantages and disadvantages of target based deduplication?
offloads backup client from deduplication process
requires sufficient network bandwidth
What is a disadvantage of source base deduplication?
puts more burden on the host since its responsible for generating safe set and deduping
What does inline deduplication mean?
dedupes in cache and than send to disk
What is file based dedupe?
takes full backups of a file and can dedupe it to reduce copies - but if any part of file changes need to do another backup
What is sub-file based dedupe?
when you generate a file the first day it breaks it down into sub-file/objects
What is data archiving?
moves fixed content that is no longer actively accessed to a separate low cost archive storage system
What are the advantages of data archiving?
saves primary storage capacity
reduces backup window and backup storage costs
What is a data archive?
primary copy of data
available for data retrieval without recovery
typically long term retention
What is the difference between data archive and data backup?
secondary copy of data
used for data recovery operations
typically short term retention
What are the components of data archiving?
archiving agent
archiving server (policy engine)
archive storage
What does the archiving agent do?
scans primary storage to find files that meet archiving policy
What is the archive server?
indexes the files
What is a small stub file?
contains the addess of the archived file and stays on the primary storage - small in size/capacity