TSM V7.1 : Understanding TSM Deduplication Flashcards
Is there a TSM license fee for deduplication?
No, it is an optional feature
Which functionality is incompatible with dedup? (name 6)
- client side encryption (SSL, data-in-flight ok though)
- LAN-free backup with client side (server side ok though)
- simultaneous write
- subfile backup
- UNIX HSM (server side dedup ok though)
- advice: no Client-side compression with server side dedup! (client side dedup with client side compression ok)
What type of dedup should you use when the network bandwith between client-server is constrained?
client side dedup
What is the advantage of client side dedup in terms of workload?
The workload can be distributed over several clients
when should server side dedup be considered as opposed to client side dedup (3X)?
- when you need the fastest possible backup times
- when you need the shortest possible window for copying dedup storage pools to offsite copypools
- when cpu resources on client side inadequate
when is a dedup appliance preferrable over TSM dedup? (3X)
- if you backup up mostly very large files (>2TB)
- if you choose not to add multiple TSM server instances
- if your daily backup exceeds 20TB (30TB Clientside dedup)
How can you determine how much storage you saved using dedup? (2X)
- tsm>query stgpool f=d
* special script via …/support/docview….
what is the correct order to ensure the fastest restore (in general)?
1) deduplicated stgpool
2) non-deduplicated stgpool
3) tape storage pool
2,1,3 (usually, can differ)
what reduction ratio’s can typically be achieved with dedup?
2:1 (50%) to 15:1 (93%) depending on type of data (unique or highly repetitive)
what reduction ratio for dedup for storagepools should you use for planning purposes ?
3:1 (maybe 4:1)
what is the additional size for dedup required for the TSM database when the amount of total backup data (that will go to a dedup pool) is 50TB?
0,5 TB (1 %) additional size
what types of Client store data are suitable for deduplication (4X)
backup, archive HSM data (server side only), API
the duplicates of the original pattern are replaced by a hash value of how many bytes?
20 bytes
What is an alternative to TSM client side dedup for LANfree backups (backups over SAN)?
Dedup storage pools are which device class ?
FILE device class only (sequential-access disk)
what type of storage pools in the storage pool hierarchy can be a deduplication pool?
primary, copy and active-data stgpools
is HSM supported for client side deduplication?
server side dedup identification takes place on the server , after the data arrives in the dedup backup storage pool. When does the actual reduction of data take place?Explain!
The data is backed up into the dedup storage pool. Then the identify duplicate processes that run regularly on the server, identify the duplicates. It is only after a reclamation or move data process that the data is in fact reduced. The reduction occurs when the data moves to another volume within the storage pool. [remember this is a sequential storage pool]
How is the algorithm to identify duplicates different from server to client?
it is not, it is the same algorithm.
When is the data deduplicated when doing client side dedup?
in-line during backup
where is the data deduplicated when doing client side dedup?
on the client itself.
Which three requirements must exist for client deduplication to be more scalable than server side dedup?
- sufficient CPU resources on the client;
- TSM database on fast disk & high network bandwith with low latency;
- the server has the ability to run more parallel client sessions than “identify duplicate” sessions
Which 2 types of TSM Clients benefit from a deduplication cache ? [as to client side dedup]
backup archive clients & VMWare clients (note: when concurrent pref a cache-per-session) .
for which type of client is a deduplication cache not recommended and why? [as to client side dedup] ?
TSM API clients (cache may get out of sync)
What is the function of a deduplication cache? [client side dedup]
To maintain a local list of already identified duplicates. This way the client does not have to go to the server to find out. [although this entails very little traffic].
Does the deduplication cache always result in faster backup times?
No,if the database is on fast disk and the network bandwith is high with low latency, disabling the cache may be wise.
Mention 5 TSM features that are incompatible with client side dedup.
- encryption
- UNIX HSM client
- subfile backup
- simultaneneous storage pool write
- Lan-free /Storage agent
Which 2 parameters for storage pool definition enable deduplication?
-device class “ FILE & deduplication setting is enabled :
tsm>define stgpool MYPOOL FILE … deduplicate=yes
3 extra settings/requirements to enable client side dedup (in addition to dedup stgpool settings on server side) …
1) tsm>register node/update node DEDUPNODE…dedup=clientorserver ….
2) client options file (dsm.opt): DEDUPLICATION YES
3) files must be bound to mgmt class that has dedup stg pool as its destination.
How can I exclude files from deduplication (client side)
-client options file: exclude.dedup option
From which TSM client + server version onwards is both client and server side dedup supported?
both client and server V6.2.0 or later !
How is it that the usage of deduplication can reduce TSM license software costs?
For capacity based pricing, capacity is calculated after the deduplication has occurred, so that reduces the cost.
What are the recommended limitations for dedup ( per server instance) as to maximum amount of backup data and daily ingest of data?
max amount of backup data is 400TB, daily ingest less than 30 TB.
What are things to consider when using tape storage pools in a dedup environment?
Configuration with dedup should preferrably be to a dedup pool only or a dedupdiskpool-to dedupdiskpool configuration.However if tape is needed for disaster recovery purposes, make sure that you copy the data out of the primary pool before the dedup takes place (e.g before reclamation), that way the data does not need to be reconstructed before it is copied to tape.
Dedup requires the TSM database & active log to be configured on what type of disk?
Low latency, High performance disk!
typically SAS/FC or SSD/Flash
Minimal amount of CPU cores for dedup?
12 CPU cores (may get up to 32 CPU cores with maximum load)
minumal amount of RAM for dedup?
64 GB of RAM (may getup to 192 GB with maximum load)
how is dedup useful in combination with node replication?
for disaster recovery purposes, accross various geographically dispersed sites
Explain the “identify duplicates” process on the server:
- where is it configured?
- when is it started?
- how can I influence the amount of processes that run?
1)Configuration is at dedup storage pool definition:
tsm>define stgpool DEDUPPOOL ….IDENTIFYPROCESS=[0-50] [0: turn off the auto-start of identification processes].
2) auto started at server start if so specified in stgpool definition.
3) influence by using the manual command: IDENTIFY DUPLICATES
If I were to copy data from a server side dedup storage pool to a non-dedup copy pool, when would I do that and why?
before the physical dedup takes place (before reclamation). Otherwise the data would have to be reconstructed in order to copy it and that would take up a long time.
How can I enforce server side dedup to take place after the backup to a copy stg pool?
optionsetting: deduprequiresbackup. [can be changed dynamically by using SETOPT] [default setting : yes] setting in dsmserv.opt: deduprequiresbackup yes
3 options for disaster recovery in combination with deduplication:
1) disk to tape (dedup to non-dedup: careful planning)
2) disk to disk (dedup to dedup; data remains in primary pool until it expires)
3) node replication to 2nd server (incremental, only unique data (dedupped) )
Why is it not needed to empty the primary diskpool now that dedup is being used?
The amount of diskspace is significantly reduced in the primary diskpool because of dedup. It can remain there until it expires (in theory)
What kind of data does not dedup well (2)?
unique binary files and encrypted data (exclude them from the mgmt class)
When not to consider dedup? (2)
- when VTL or tape is your storage location
- when the backup window is already constrained
What is the general purpose of deduplication?
to limit the amount of (primary) disk storage needed for backup.
what is the active log and how much space should you plan for it when dedup is used?
The active log maintans info about transactions that are in progress. Plan for it to be its maximal size: 128GB
What is the archive log and how should you size it for dedup?
The archive log maintains info about old transactions. It is emptied after a full database backup. It can grow very large.
Plan for 500GB filesystem space for daily ingests of 4TB: 1 TB filesystem.
What is the calculation that needs to be made for the size of the primary dedup storage pool when server-side dedup is in effect?
It must be able to maintain enough free capacity to receive the complete daily ingest of data + maintaining the dedupped data for the amount of time in the retention policies + some uplift (30% after dedup)
How is the daily ingest of data calculated?
The total amount of data on the clients X the change rate (because of progressive incremental backup).
What kind of disk configuration is required for TSM database and logs?
separate LUNS for TSM database and logs. No sharing with any other TSM storage pools or any other data file
-spread disk i/o across as many disk controllers as possible.
Can server-side dedup and client side dedup be configured in 1 TSM server instance?
Yes, TSM is designed to allow dedup storage pools to handle both types of dedupped data. TSM is optimized to not perform server side dedup on client-side dedupped data. Objects are recognized by both types of dedup, even if they were dedupped “ by the other side”
How to get the most storage savings out of client side dedup?
use compression.
How can client-side dedup put a significant workload on the server?
with many clients simultaneous asking for the processsing of duplicate chunks.
If you are using Lan Free backup what type of deduplication should you consider (server or client)?
Only server-side dedup (client side not supported)
At what level is deduplication enabled in TSM?
storage pool level
Does TSM identify duplicates across storage pools?
What is the advantage of using 1 large TSM dedup storage pool vs many smaller ones?
Duplicates are not identified across storage pools.
What considerations should be made regarding devclass definition for the dedup storage pools?
- high mountlimit
- maxcapacity of volumes 50GB
- directories should represent separate filesystems on separate logical volumes on as many possible separate disks.
What has proven to be most efficient in stgpool definition: use of scratch volumes or simple pre-allocated volume only?
use of scratch volumes, use the MAXSCRATCH parameter
What is the preferred method of duplicate identification: scheduled or automatic?
scheduled. Therefore set the IDENTIFYPROCESS parameter (in DEFINE STG) to 0.
How is automatic storage pool reclamation prevented?
set the RECLAIM parameter to 100 (in define stgpool)
Example of policy setting:
define domain DEDUPDISK > define policy DEDUPDISK POLICY1 > define mgmtclass DEDUPDISK POLICY1 STANDARD > assign defmgmtclass DEDUPDISK POLICY1 STANDARD > define copygroup DEDUPDISK POLICY1 STANDARD type=backup destination=DEDUPPOOL VEREXISTS=nolimit VERDELETED=10 RETEXTRA=30 RETONLY=80 > define copygroup DEDUPDISK POLICY1 STANDARD type=archive destination=DEDUPPOOL RETVER=30 > activate policyset DEDUPDISK POLICY1