Transcript
Deduplication for Mainframe A Tape Augmentation and Replacement Solution
EMC Forum Series 2010
© Copyright 2010 EMC Corporation. All rights reserved.
1
Mainframe Tape Market
© Copyright 2010 EMC Corporation. All rights reserved.
2
Backup and Recovery under stress Digital Information Created and Replicated Worldwide
2,500
5 times growth in 4 years
Exabytes
2,000
1,500
1,000
500
0 2008
2009
2010
2011
2012
Source: IDC Digital Universe white paper, sponsored by EMC, May 2009
© Copyright 2010 EMC Corporation. All rights reserved.
3
Mainframe VTL Market breakdown 18%
Open Systems VTL: $664M
Open Systems Tape: $1,382M
24% 24%
Mainframe Tape: $496M
18% 10% 10%
18%
Mainframe VTL: $268M
48% Source: IDC, 2008
• Mainframe represents 28% of the overall tape replacement opportunity • Mainframe tape business is following Open Systems tape decline, being replaced by backup to disk with deduplication
© Copyright 2010 EMC Corporation. All rights reserved.
4
Traditional Mainframe Tape Approaches Physical Tape 3494 ATL
Virtual Tape: Host-Based IBM/Diligent VTF/M, VTape
StorageTek Silo
Virtual Tape: Disk Cache Sun/STK VSM
IBM VTS/7700
HDS, IBM, EMC Disparate tape drives—never enough Good performance once tape is mounted Long mount times, tape rewinds, unloads Susceptible to high error rate Large footprint Media handling issues
Requires started task Mainframe CPU intensive Requires mainframe-class disk Recovery requires application before tape data is available
High initial cost Complex replication Cache management Varied performance
IBM System z
© Copyright 2010 EMC Corporation. All rights reserved.
5
Mainframe Physical Tape Challenges • Performance of batch/backup processing – Not meeting batch/backup windows
• Reliability and availability of tape media – Libraries, drives, and cartridges fail – Tape does not support RAID protection – Old tape formats are not supported
• Management complexity – Constant tuning of environment – Managing physical cartridges
• Disaster recovery operations challenges – Sending tapes offsite – Recovering from tapes
• Media management risk – Lost tapes
© Copyright 2010 EMC Corporation. All rights reserved.
6
Along comes Deduplication
and changes everything!
© Copyright 2010 EMC Corporation. All rights reserved.
7
Deduplication Dramatically Reduces Storage Capacity Requirements Deduplication 10–30 times less data stored versus fulls + incrementals with typical retention policies
Data Stored
30
20
10
0
1
5
10
15
20
Weeks in Use
Deduplication storage Traditional storage
© Copyright 2010 EMC Corporation. All rights reserved.
8
Backup Data Reduction/Deduplication: Implementation in Fortune 1000 (F1000) *Wave 8
**Wave 9
**Wave 10
***Wave 11
Wave 12
22%
9%
9%
15%
15%
7%
12%
8%
Wave 13
40%
31%
16%
25%
28%
15%
4%
“Heat Index” Rank: 1 Storage Networking Wave 13 Study
45%
14%
12%
27%
41%
25%
15%
24%
16%
25%
14%
21%
26%
22%
20%
“Deduplication is now in use by 40% of F1000, with use having accelerated rapidly over the last year.”
In use now
In pilot/evaluation In near-term plan
In long-term plan Not in plan
© Copyright 2010 EMC Corporation. All rights reserved.
Source: TheInfoPro Wave 13 Storage Study (Q4 2009), January 2010. F1000 Sample: Wave 8, n=148; Wave 9, n=150; Wave 10, n=151; Wave 11, n=127; Wave 12, n=147; Wave 13, n=183 *Technology was previously categorized as deduplication **Technology was previously categorized as deduplication/capacity optimized storage/single backup instance store ***Technology was previously categorized as single backup instance store software
9
F1000: Backup Data Reduction/Deduplication Roll-Up Implementation Time Frame
EMC Competitor
In use now
Competitor
… In
Competitor
… In near-term
Competitor
… In long-term
Competitor
Not in plan
Competitor
0.399 0.038 0.142 0.219 0.202
Storage Networking “Heat Index” Rank: 1
2009 Spending Levels and 2010 Projections of Users with the Technology in Use or Under Consideration
Competitor Competitor Competitor Competitor Competitor 0%
10%
20%
30%
40%
50%
In use now In pilot/evaluation In near-term plan (through Q1 2010) In long-term plan (Q2 2010–Q4 2010)
60%
> $10M $5M-$10M $1M-$4.99M $500K-$999K $250K-$499K $100K-$249K $75K-$99K $50K-$74K < $50K No Spending
0 0 0.079 0.053 0.053 0.145 0.105 0.026 0.066 0.474 16%
51%
Less Money
About the Same
33% More Money
Source: TheInfoPro Wave 13 Storage Study (Q4 2009), January 2010. F1000 Sample: left and top right charts: n=183; lower right chart: n=76; lower right chart (row at bottom within lower right chart): n=37. © Copyright 2010 EMC Corporation. All rights reserved.
10
Deduplication Fundamentals
© Copyright 2010 EMC Corporation. All rights reserved.
11
Data Deduplication: Technology Overview Store more backups in a smaller footprint Friday Full Backup
A
B
C
D
A A
Mon Incremental
C
Tues Incremental
E
Weds Incremental
A
Thurs Incremental
E
F
B
G
Backup Data
Logical
Estimated Reduction
Physical
FRIDAY FULL
1 TB
2–4x
250 GB
Monday Incremental
100 GB
7–10x
10 GB
Tuesday Incremental
100 GB
7–10x
10 GB
Wednesday Incremental
100 GB
7–10x
10 GB
Thursday Incremental
100 GB
7–10x
10 GB
Second FRIDAY FULL
1 TB
50–60x
18 GB
2.4 TB
7.8x
308 GB
H
B
I
G
J
C
K
Second Friday Full Backup
B
C
D
E
F
L
G
A B C D E F G H I J K L
© Copyright 2010 EMC Corporation. All rights reserved.
H
TOTAL
12
Store More for Longer with Less Backup Data
Cumulative Logical
First Full
1 TB
4x
250 GB
Week 1
April 7
2.4 TB
8x
308 GB
Week 2
April 14
3.8 TB
10x
366 GB
Week 3
April 21
5.2 TB
12x
424 GB
Month 1
April 28
6.6 TB
14x
482 GB
Month 2
May 31
12.2 TB
17x
714 GB
Month 3
June 30
17.8 TB
19x
946 GB
Month 4
July 31
23.4 TB
20x
1,178 GB
TOTAL
23.4 TB
20x
1,178 GB
© Copyright 2010 EMC Corporation. All rights reserved.
Estimated Reduction
Physical
13
Deduplication Dramatically Reduces Storage Capacity Requirements Deduplication 10–30 times less data stored versus fulls + incrementals with typical retention policies
Data Stored
30
20
10
0
1
5
10
15
20
Weeks in Use
Deduplication storage Traditional storage
© Copyright 2010 EMC Corporation. All rights reserved.
14
Disk Library for Mainframe
© Copyright 2010 EMC Corporation. All rights reserved.
15
Disk Library for Mainframe • True IBM tape emulation • Transparent to mainframe operations • Leverages low-cost SATA II technology • High performance read and write IBM mainframe
Disk Library for mainframe
© Copyright 2010 EMC Corporation. All rights reserved.
• Unmatched remote replication capability
16
Mainframe Application Use Cases
There are two distinct use cases for mainframe tape Primary Storage • • • •
Interactive batch jobs Minimal data redundancy Retention periods of just hours Equivalent read/write transactions
Secondary Storage • • • •
Backup and archive applications Highly redundant data Retention periods of weeks and months Higher writes and lower read transactions
© Copyright 2010 EMC Corporation. All rights reserved.
17
Deduplication Storage Expansion Option
Eliminate Redundancy from Backup and Archive Workloads on EMC Disk Library for Mainframe
© Copyright 2010 EMC Corporation. All rights reserved.
18
Mainframe Application Use Cases There are two distinct use cases for mainframe tape Primary Storage • • • •
Batch jobs Minimal data redundancy Retention periods of just hours Equivalent read/write transactions
Secondary Storage • • • •
Backup and archive applications Highly redundant data Retention periods of weeks and months Higher writes and lower read transactions
© Copyright 2010 EMC Corporation. All rights reserved.
DLm960
Deduplication Storage Expansion Option
19
Deduplication Storage Expansion Option Disk Library for mainframe and industry’s most popular deduplication system
• Based on proven Data Domain DD880 • Nearly 3.5 PBs of logical capacity • System throughput up to 4.3TB per hour – Hardware compression – Deduplication
• Reliability designed for the datacenter – Multipath for access to all tapes – Data Domain Data Invulnerability Architecture – Call home for support
• Easy integration into existing infrastructure
DLm960 Deduplication Storage Expansion Option
© Copyright 2010 EMC Corporation. All rights reserved.
– Behaves like a tape library to the application – Low bandwidth replication between disk library systems – No changes to current management process
20
Disk Library for Mainframe Components Virtual tape emulation controller (VTEC) • Includes the virtual tape engines (VTEs) • Emulates IBM 3480/3490/ 3590 tape drives – 256 tape drives per VTE VTE VTE VTE VTE VTE VTE
– Up to 1,536 with six VTEs
• FICON/ESCON connectivity VTEC
Back-end storage
– Throughput of 1.2 GB/s with six VTEs (FICON)
• • • •
Transparent to mainframe tape management systems Virtual cartridge size up to 16 TB Disk consumption is based on data, not cartridge size Supports hardware compression
Back-end storage • Leverages 1 TB or 2 TB SATA II drives • RAID 6 (12+2) configuration – Hot spare for every disk tray
EMC Disk Library for mainframe
© Copyright 2010 EMC Corporation. All rights reserved.
• Stores all tape images as files • Shares all tape volumes between all VTEs
21
How DLm960 with Deduplication Storage Expansion Option Maps to Mainframe Host • The mainframe host views Disk Library for mainframe as tape drives
VTE 0000-00FF 0100-01FF 0200-02FF 0300-03FF 0400-04FF 0500-05FF
IBM mainframe
EMC Disk Library DLm960
Deduplication Storage Expansion option
• Tape Administrator will map specific range of tape drives to a specific VTE
Batch Backup and Archive
© Copyright 2010 EMC Corporation. All rights reserved.
22
DD880: Industry’s Fastest Backup Engine • Scalability – Up to 5.4 TB/hr aggregate backup (1500 MB/s)* • 1.28 TB/hr single stream write
– Up to 3.5 PB logical storage*
• General features – 4U, quad-socket, Intel processor – 48GB RAM and two 1GB NVRAM cards
• Redundant external storage connectivity – 2 dual port SAS HBAs – 96 TB raw capacity, 71 TB addressable RAID-6 storage – Redundant, hot swap, 1 + 1 power * Based on Data Domain Operating System 4.7 performance and capacity. Version 4.7 is required when using DD880 as the Deduplication Storage Expansion option
© Copyright 2010 EMC Corporation. All rights reserved.
23
Data Integrity: Data Invulnerability Architecture Trust but verify—”hope” is not a strategy Data verification Checksum Deduplication, write to disk Verify
Generate Checksum
Verify Data
File System Global Compression
Self-healing file system Cleaning Expired data Defrag Verify
Local Compression RAID
Verify the file system metadata integrity
Verify user data integrity
Verify stripe integrity
Other RAID 6 NVRAM Snapshots
© Copyright 2010 EMC Corporation. All rights reserved.
24
Disk Library for Mainframe Family
DLm120
DLm960
Number of VTEs
1 or 2
1–6
Connectivity
FICON
FICON
Number of channels to host
2 or 4
2–12
Number of virtual tape drives
Up to 512
Up to 1,536
Maximum capacity (usable)
9.5 TB–47.5 TB
28.5 TB–1.2 PB
Performance
Up to 400 MB/s
Up to 1.2 GB/s
Number of cabinets
1
2–13
Replication Hardware compression
© Copyright 2010 EMC Corporation. All rights reserved.
25
Writing a Tape Volume Mainframe application requests a scratch tape be mounted by the system In response to a “mount” request, the VTE allocates a new VOLSER and mounts it on the requested drive
VTE
VOLSER-xxx VOLSER-yyy VOLSER-zzz
IBM mainframe
© Copyright 2010 EMC Corporation. All rights reserved.
EMC Disk Library for mainframe
As the mainframe application writes the tape, the VTE creates a new file on the disk using VOLSER as the file name When the mainframe application closes the tape, the VTE closes the file (VOLSER), causing it to be retained
26
Reading a Tape Volume Mainframe application requests a tape by VOLSER The VTE opens the file named VOLSER and mounts it on the requested drive
VTE
VOLSER-xxx VOLSER-yyy VOLSER-zzz
IBM mainframe
© Copyright 2010 EMC Corporation. All rights reserved.
EMC Disk Library for mainframe
As the mainframe application reads the tape, the VTE reads the file; re-creating the tape exactly as it was written
When the mainframe application closes the file (VOLSER), the VTE closes the disk file and unmounts the drive
27
Low Bandwidth Bi-directional Replication WAN/IP
Source
Celerra NS-960
IBM Mainframe
© Copyright 2010 EMC Corporation. All rights reserved.
Celerra NS-960
Target
Deduplication Storage Expansion Option
IBM Mainframe
WAN/IP
Deduplication Storage Expansion Option
28
Low Bandwidth Bi-directional Replication WAN/IP Source
Target VTE VTE VTE VTE VTE VTE
IBM Mainframe
VTE VTE VTE VTE VTE VTE
Celerra NS-960
DLm960
© Copyright 2010 EMC Corporation. All rights reserved.
Deduplication Storage Expansion Option
IBM Mainframe
WAN/IP
DLm960
Celerra NS-960
Deduplication Storage Expansion Option
29
Reduce CPU Cycles • Reduce/eliminate DFHSM ML1 – Move directly from ML0 to ML2 • Reuse expensive disk space
– Save CPU compression cycle and time savings • Leverage Disk Library for mainframe hardware compression
ML0 ML0
ML2
ML0
Production DASD
Disk Library for mainframe
– ML2 information is still kept on disk • Fast access time
• DFHSM recycle time optimized – DFHSM will continue to perform tape recycling
Virtual tape reclaimed on disk
• Recycling at disk speed • No constraints for number of tape drives
– Hours of savings potential
© Copyright 2010 EMC Corporation. All rights reserved.
30
© Copyright 2010 EMC Corporation. All rights reserved.
31
Guaranteed Replication 2:00 AM
2:12 AM
2:30 AM
2:35 AM
2:40 AM
2:00 AM DLm replication starts based on RPO 2:12 AM
Backup job starts
Production Site
Remote Site
Backing up 100 GB dataset (will create a 30 GB tape)
2:30 AM Next DLm replication cycle starts based on RPO Backup tape is being replicated First 25 GB of tape are replicated
2:35 AM
Backup job ends “rewind unload” command is sent to the DLm DLm forces replication Last 5 GB of the backup tape are replicated
2:40 AM Replication is complete Return control to the mainframe host to finish the backup job
© Copyright 2010 EMC Corporation. All rights reserved.
32
How Disk Library for Mainframe Is Mapped to the Mainframe Host The mainframe host views Disk Library for mainframe as tape drives VTE
– Each VTE can emulate 256 tape drives
0000-00FF 0100-01FF 0200-02FF 0300-03FF 0400-04FF 0400-04FF
Tape Administrator will map specific range of tape drives to a specific VTE Each tape VOLSER is kept on disk as file
IBM mainframe
EMC Disk Library DLm960
Data Domain DD880
– File name is the same as the VOLSER – No pre-allocation of space
Space maintenance – DD880 Time To Live (TTL) will be used for scratch tape space reclamation
Batch
© Copyright 2010 EMC Corporation. All rights reserved.
Backup
33
Highly Scalable Systems • Separate disk array targets for mainframe workloads
• Scale as required
– DLm960: Scratch tape and balanced read and write – Deduplication Storage Expansion Option: Archival and higher write applications
• Start small
– Scale in increments of 9.5 TB in DLm960 (1 TB) • Up to 13 cabinets • Balance storage between multiple backend storage arrays
– Scale Deduplication Storage Expansion option to 71 TB in 12 TB increments • One cabinet
– One VTE and 19.3 TB with the DLm960 (2 TB) • Minimum of two cabinets – One Deduplication Storage Expansion option
in one cabinet • Minimum 23 TB
VTE VTE VTE VTE VTE VTE
© Copyright 2010 EMC Corporation. All rights reserved.
34
© Copyright 2010 EMC Corporation. All rights reserved.
200
150000
150
100000
100
50000
50
TBs
200000
0 3490
3490E
3480
Datasets
107389
191889
366
TBs
102.39
237.86
0.02
Daily Tape GB Transferred
3500 3000 2500 2000 1500 1000 500 0
GB Read
Mounts
HSM Capacity Analysis HSM Activity Analysis HSM Tape Use HSM Control Parameters Analysis Tape Mount and Transport Activity Tape Library Analysis Tape Bandwidth Analysis
250
0
• Report of the summary of findings: – – – – – – –
Datasets/TB by Devtype
900 800 700 600 500 400 300 200 100 0
Tape Mounts per Hour
12/26/07 at… 12/26/07 at… 12/27/07 at… 12/28/07 at… 12/28/07 at… 12/29/07 at… 12/30/07 at… 12/30/07 at… 12/31/07 at… 01/01/08 at… 01/01/08 at… 01/02/08 at… 01/03/08 at… 01/03/08 at… 01/04/08 at… 01/05/08 at… 01/05/08 at… 01/06/08 at… 01/07/08 at… 01/07/08 at… 01/08/08 at… 01/09/08 at… 01/09/08 at… 01/10/08 at… 01/11/08 at… 01/11/08 at… 01/12/08 at… 01/13/08 at… 01/13/08 at…
HSM MCDS data HSM FSR data HSM list TTOC data HSM control parameters System log data Tape device configuration Tape library management system catalog data (TLMS, CA1-TMS, RMM or ZARA) – SMF data—record types 14, 15, 21, 30, 40 – – – – – – –
Datasets
• A comprehensive analysis of the current tape environment; data sources include:
GB
Tape Assessment
250000
35
Top 20 tape jobs created 96% of tapes From Tape Assessment service
© Copyright 2010 EMC Corporation. All rights reserved.
36
Why Disk Library for Mainframe? • Provides deduplication for backup and recovery solutions – onsite retention and lowers overall disk and tape storage costs
• Eliminates all issues related to traditional tape handling – Eliminates manual intervention, physical movement of tape cartridges, robotic issues, and single points of failure
• Works seamlessly with existing applications – Uses existing tape management processes to automate tape vaulting
• Significantly improves performance – Reallocates all of the data to disk and uses smart I/O buffering, allowing potentially significant reductions in batch windows
• Extends disaster recovery capabilities to the tape workload – Utilizes array-based replication process over IP to seamlessly move tapes offsite
• Easily scales as the workload increases – No need for additional subsystems, libraries, network connections, etc.
© Copyright 2010 EMC Corporation. All rights reserved.
37
THANK YOU
© Copyright 2010 EMC Corporation. All rights reserved.
38