Chapter 6 - Data Deduplication Flashcards
Name three scenarios that would be ideal for data deduplication.
General-purpose file servers
VDI deployments
backup targets
What data deduplication policy specifies that files should be considered for data
deduplication?
Optimization policy
What two fields in Get-DedupStatus are relevant to the optimization rate?
OptimizedFilesSavingsRate and SavingsRate
What is a Chunk?
A part of a file that Data Deduplication selected by the chunking algorithm as likely to occur in other, similar files
What is a Chunk Store?
An organized series of container files in the System Volume Information folder that DDPEval uses to uniquely store chunks
What is Dedup?
An abbreviation for data deduplication that is commonly used in PowerShell, Windows Server APIs and components, and the Windows Server community
What is File Metadata?
Information that describes properties about the file that are not related to the main content of the file
What is File Stream?
The main content of the file
What is a File system?
The software and on-disk data structure that the operating system uses
to store files on storage media
What is a File System Filter?
A plugin that modifies the default behavior of the file system
What is Optimization?
The process of chunking a file and storing its unique chunks in the chunk store
What is Optimization Policy?
A policy which specifies the files that should be considered for data deduplication
What is Reparse Point?
A special tag that notifies the file system to pass off I/O to a specified file system filter; in data deduplication, it is the way optimized files are stored (pointers to a chunk map)
What is Volume?
A Windows construct for a logical storage drive that may span multiple physical storage devices across one or more servers
What is Workload?
An application that runs on Windows Server
What are some usage scenarios for data deduplication?
User documents: 30% to 50%
Deployment shares: 70% to 80%
Virtualization libraries: 80% to 95%
General file shares: 50% to 60%
How does data deduplication help in general file servers?
There will be plenty of opportunity for data deduplication to work its magic in these environments—often consisting of team shares, user home folders, work folders, and software development shares.
How does data deduplication help in Virtualized Desktop Infrastructure (VDI) deployments?
Many virtual hard disks are practically identical.
How does data deduplication help with backup targets?
So much of the data we store as backups is identical to other data we have backed up!
What is DDPEval?
Data Deduplication Savings Evaluation tool can evaluate the potential for optimization against directly connected volumes and mapped or unmapped network shares.
Can data deduplication affect performance negatively?
data deduplication is a periodic task that could interrupt the performance requirements of your workload. This is of most concern for workloads stored in traditional HDDs as opposed to SSDs.
What are the resource requirements of the workload?
Storage that has “downtime,” such as weekends, is often an excellent candidate for data deduplication since this processing can occur during those times.
How does Windows Server 2016 enhance data deduplication in four ways?
Support for larger volumes
Support for larger files
Support for Nano Server
Simplified backup support
How does Windows Server 2016 enhance support for data deduplication in larger volumes?
Server 2016 now supports volume sizes up to 64 TB.
How does Windows Server 2016 enhance support for data deduplication in larger files?
Files up to 1 TB are fully supported
How does Windows Server 2016 enhance data deduplication in backup support?
A new default usage type now supports seamless deployment of data deduplication for virtualized backup applications.
What are the Three types of Data Deduplication?
Default
Hyper-V
Backup
What is the default type of data deduplication used for?
This is the option to choose for general-purpose file servers. It uses Background optimization.
What is the optimization policy for the default type of data deduplication?
Minimum file age = 3 days
Optimize in-use files = No
Optimize partial files = No
What is the Hyper-V type of data deduplication used for?
This is deduplication tuned specifically for VDI servers.. It uses Background Optimization and has “under the hood tweaks” for Hyper-V interoperability
What is the optimization policy for the Hyper-V type of data deduplication?
Minimum file age = 3 days
Optimize in-use files = Yes
Optimize partial files = Yes
What is the Backup type of data deduplication?
This is tuning for virtualized backup applications. It has priority optimization and “Under-the-hood” tweaks for interop with DPM/DPM-like solutions
What is the optimization policy for the backup type of data deduplication?
Minimum file age = 0 days
Optimize in-use files = Yes
Optimize partial files = No
What four jobs make data deduplication possible?
Optimization
Garbage Collection
Integrity Scrubbing
Unoptimization
What is Garbage Collection?
Reclaims disk space by removing unnecessary chunks that are no longer being referenced by files that have been recently modified or deleted
What is Integrity Scrubbing?
Identifies corruption in the chunk store due to disk failures or bad sectors
What is Unoptimization?
Undoes the optimization done by deduplication and disables data deduplication for that volume
How do you add the role of data deduplication in PowerShell?
Install-WindowsFeature -Name FS-Data-Deduplication
How do you add the role of data deduplication in Nano Server?
Install-WindowsFeature -ComputerName -Name FS-Data-Deduplicatio
What are five cmdlets used to implement data deduplication in PowerShell?
Enable-DedupVolume Start-DedupJob Stop-DedupJob Get-DedupJob Start-DedupJob
What does the Enable-DedupjobVolume cmdlet do?
Enables data deduplication on one or more volumes.
What does the Start-DedupJob cmdlet do?
Starts a new data deduplication job
What does the Stop-DedupJob cmdlet do?
Stops a data deduplication job that’s already in progress (or removes it from the queue)
What does the Get-DedupJob cmdlet do?
Shows all the active and queued data deduplication jobs
What does the Start-DedupJob cmdlet do?
To disable date deduplication
What cmdlet is useful for Powershell data deduplication monitoring?
Get-DedupStatus
Which Fields are important in the Get-DedupStatus cmdlet for data deduplication monitoring?
LastOptimizationResult LastGarbageCollectionResult LastScrubbingResult OptimizedFilesSavingsRate SavingsRate
How do you interpret the monitoring of the LastOptimizationResult field in data deduplication monitoring?
(0 = success), LastOptimizationResultMessage, and LastOptimizationTime (should be recent)
How do you interpret the monitoring of the LastGarbageCollectionResult in data deduplication monitoring?
0 = success), LastGarbageCollectionResultMessage, and LastGarbageCollectionTime (should be
How do you interpret the monitoring of the LastOptimizationResult field in data deduplication monitoring?
(0 = success), LastScrubbingResultMessage, and LastScrubbingTime (should be recent)
How do you interpret OptimizedFilesSavingsRate in data deduplication monitoring?
applies only to the files that are “in-policy” for optimization (space used by optimized files after optimization/logical size of optimized files)
How do you interpret SavingsRate in data deduplication monitoring?
applies to the entire volume (space used by optimized files after optimization/total logical size of the optimization)