TSM V7.1 : Understanding TSM Deduplication Flashcards
Is there a TSM license fee for deduplication?
No, it is an optional feature
Which functionality is incompatible with dedup? (name 6)
- client side encryption (SSL, data-in-flight ok though)
- LAN-free backup with client side (server side ok though)
- simultaneous write
- subfile backup
- UNIX HSM (server side dedup ok though)
- advice: no Client-side compression with server side dedup! (client side dedup with client side compression ok)
What type of dedup should you use when the network bandwith between client-server is constrained?
client side dedup
What is the advantage of client side dedup in terms of workload?
The workload can be distributed over several clients
when should server side dedup be considered as opposed to client side dedup (3X)?
- when you need the fastest possible backup times
- when you need the shortest possible window for copying dedup storage pools to offsite copypools
- when cpu resources on client side inadequate
when is a dedup appliance preferrable over TSM dedup? (3X)
- if you backup up mostly very large files (>2TB)
- if you choose not to add multiple TSM server instances
- if your daily backup exceeds 20TB (30TB Clientside dedup)
How can you determine how much storage you saved using dedup? (2X)
- tsm>query stgpool f=d
* special script via …/support/docview….
what is the correct order to ensure the fastest restore (in general)?
1) deduplicated stgpool
2) non-deduplicated stgpool
3) tape storage pool
2,1,3 (usually, can differ)
what reduction ratio’s can typically be achieved with dedup?
2:1 (50%) to 15:1 (93%) depending on type of data (unique or highly repetitive)
what reduction ratio for dedup for storagepools should you use for planning purposes ?
3:1 (maybe 4:1)
what is the additional size for dedup required for the TSM database when the amount of total backup data (that will go to a dedup pool) is 50TB?
0,5 TB (1 %) additional size
what types of Client store data are suitable for deduplication (4X)
backup, archive HSM data (server side only), API
the duplicates of the original pattern are replaced by a hash value of how many bytes?
20 bytes
What is an alternative to TSM client side dedup for LANfree backups (backups over SAN)?
VTL
Dedup storage pools are which device class ?
FILE device class only (sequential-access disk)
what type of storage pools in the storage pool hierarchy can be a deduplication pool?
(3X)
primary, copy and active-data stgpools
is HSM supported for client side deduplication?
no
server side dedup identification takes place on the server , after the data arrives in the dedup backup storage pool. When does the actual reduction of data take place?Explain!
The data is backed up into the dedup storage pool. Then the identify duplicate processes that run regularly on the server, identify the duplicates. It is only after a reclamation or move data process that the data is in fact reduced. The reduction occurs when the data moves to another volume within the storage pool. [remember this is a sequential storage pool]
How is the algorithm to identify duplicates different from server to client?
it is not, it is the same algorithm.
When is the data deduplicated when doing client side dedup?
in-line during backup
where is the data deduplicated when doing client side dedup?
on the client itself.
Which three requirements must exist for client deduplication to be more scalable than server side dedup?
- sufficient CPU resources on the client;
- TSM database on fast disk & high network bandwith with low latency;
- the server has the ability to run more parallel client sessions than “identify duplicate” sessions
Which 2 types of TSM Clients benefit from a deduplication cache ? [as to client side dedup]
backup archive clients & VMWare clients (note: when concurrent pref a cache-per-session) .
for which type of client is a deduplication cache not recommended and why? [as to client side dedup] ?
TSM API clients (cache may get out of sync)
What is the function of a deduplication cache? [client side dedup]
To maintain a local list of already identified duplicates. This way the client does not have to go to the server to find out. [although this entails very little traffic].