Preview only show first 10 pages with watermark. For full document please download

Emc Disk Library For Mainframe

   EMBED


Share

Transcript

Deduplication for Mainframe A Tape Augmentation and Replacement Solution EMC Forum Series 2010 © Copyright 2010 EMC Corporation. All rights reserved. 1 Mainframe Tape Market © Copyright 2010 EMC Corporation. All rights reserved. 2 Backup and Recovery under stress Digital Information Created and Replicated Worldwide 2,500 5 times growth in 4 years Exabytes 2,000 1,500 1,000 500 0 2008 2009 2010 2011 2012 Source: IDC Digital Universe white paper, sponsored by EMC, May 2009 © Copyright 2010 EMC Corporation. All rights reserved. 3 Mainframe VTL Market breakdown 18% Open Systems VTL: $664M Open Systems Tape: $1,382M 24% 24% Mainframe Tape: $496M 18% 10% 10% 18% Mainframe VTL: $268M 48% Source: IDC, 2008 • Mainframe represents 28% of the overall tape replacement opportunity • Mainframe tape business is following Open Systems tape decline, being replaced by backup to disk with deduplication © Copyright 2010 EMC Corporation. All rights reserved. 4 Traditional Mainframe Tape Approaches Physical Tape 3494 ATL Virtual Tape: Host-Based IBM/Diligent VTF/M, VTape StorageTek Silo Virtual Tape: Disk Cache Sun/STK VSM IBM VTS/7700 HDS, IBM, EMC  Disparate tape drives—never enough  Good performance once tape is mounted  Long mount times, tape rewinds, unloads  Susceptible to high error rate  Large footprint  Media handling issues     Requires started task Mainframe CPU intensive Requires mainframe-class disk Recovery requires application before tape data is available  High initial cost  Complex replication  Cache management  Varied performance IBM System z © Copyright 2010 EMC Corporation. All rights reserved. 5 Mainframe Physical Tape Challenges • Performance of batch/backup processing – Not meeting batch/backup windows • Reliability and availability of tape media – Libraries, drives, and cartridges fail – Tape does not support RAID protection – Old tape formats are not supported • Management complexity – Constant tuning of environment – Managing physical cartridges • Disaster recovery operations challenges – Sending tapes offsite – Recovering from tapes • Media management risk – Lost tapes © Copyright 2010 EMC Corporation. All rights reserved. 6 Along comes Deduplication and changes everything! © Copyright 2010 EMC Corporation. All rights reserved. 7 Deduplication Dramatically Reduces Storage Capacity Requirements Deduplication 10–30 times less data stored versus fulls + incrementals with typical retention policies Data Stored 30 20 10 0 1 5 10 15 20 Weeks in Use Deduplication storage Traditional storage © Copyright 2010 EMC Corporation. All rights reserved. 8 Backup Data Reduction/Deduplication: Implementation in Fortune 1000 (F1000) *Wave 8 **Wave 9 **Wave 10 ***Wave 11 Wave 12 22% 9% 9% 15% 15% 7% 12% 8% Wave 13 40% 31% 16% 25% 28% 15% 4% “Heat Index” Rank: 1 Storage Networking Wave 13 Study 45% 14% 12% 27% 41% 25% 15% 24% 16% 25% 14% 21% 26% 22% 20% “Deduplication is now in use by 40% of F1000, with use having accelerated rapidly over the last year.” In use now In pilot/evaluation In near-term plan In long-term plan Not in plan © Copyright 2010 EMC Corporation. All rights reserved. Source: TheInfoPro Wave 13 Storage Study (Q4 2009), January 2010. F1000 Sample: Wave 8, n=148; Wave 9, n=150; Wave 10, n=151; Wave 11, n=127; Wave 12, n=147; Wave 13, n=183 *Technology was previously categorized as deduplication **Technology was previously categorized as deduplication/capacity optimized storage/single backup instance store ***Technology was previously categorized as single backup instance store software 9 F1000: Backup Data Reduction/Deduplication Roll-Up Implementation Time Frame EMC Competitor In use now Competitor … In Competitor … In near-term Competitor … In long-term Competitor Not in plan Competitor 0.399 0.038 0.142 0.219 0.202 Storage Networking “Heat Index” Rank: 1 2009 Spending Levels and 2010 Projections of Users with the Technology in Use or Under Consideration Competitor Competitor Competitor Competitor Competitor 0% 10% 20% 30% 40% 50% In use now In pilot/evaluation In near-term plan (through Q1 2010) In long-term plan (Q2 2010–Q4 2010) 60% > $10M $5M-$10M $1M-$4.99M $500K-$999K $250K-$499K $100K-$249K $75K-$99K $50K-$74K < $50K No Spending 0 0 0.079 0.053 0.053 0.145 0.105 0.026 0.066 0.474 16% 51% Less Money About the Same 33% More Money Source: TheInfoPro Wave 13 Storage Study (Q4 2009), January 2010. F1000 Sample: left and top right charts: n=183; lower right chart: n=76; lower right chart (row at bottom within lower right chart): n=37. © Copyright 2010 EMC Corporation. All rights reserved. 10 Deduplication Fundamentals © Copyright 2010 EMC Corporation. All rights reserved. 11 Data Deduplication: Technology Overview Store more backups in a smaller footprint Friday Full Backup A B C D A A Mon Incremental C Tues Incremental E Weds Incremental A Thurs Incremental E F B G Backup Data Logical Estimated Reduction Physical FRIDAY FULL 1 TB 2–4x 250 GB Monday Incremental 100 GB 7–10x 10 GB Tuesday Incremental 100 GB 7–10x 10 GB Wednesday Incremental 100 GB 7–10x 10 GB Thursday Incremental 100 GB 7–10x 10 GB Second FRIDAY FULL 1 TB 50–60x 18 GB 2.4 TB 7.8x 308 GB H B I G J C K Second Friday Full Backup B C D E F L G A B C D E F G H I J K L © Copyright 2010 EMC Corporation. All rights reserved. H TOTAL 12 Store More for Longer with Less Backup Data Cumulative Logical First Full 1 TB 4x 250 GB Week 1 April 7 2.4 TB 8x 308 GB Week 2 April 14 3.8 TB 10x 366 GB Week 3 April 21 5.2 TB 12x 424 GB Month 1 April 28 6.6 TB 14x 482 GB Month 2 May 31 12.2 TB 17x 714 GB Month 3 June 30 17.8 TB 19x 946 GB Month 4 July 31 23.4 TB 20x 1,178 GB TOTAL 23.4 TB 20x 1,178 GB © Copyright 2010 EMC Corporation. All rights reserved. Estimated Reduction Physical 13 Deduplication Dramatically Reduces Storage Capacity Requirements Deduplication 10–30 times less data stored versus fulls + incrementals with typical retention policies Data Stored 30 20 10 0 1 5 10 15 20 Weeks in Use Deduplication storage Traditional storage © Copyright 2010 EMC Corporation. All rights reserved. 14 Disk Library for Mainframe © Copyright 2010 EMC Corporation. All rights reserved. 15 Disk Library for Mainframe • True IBM tape emulation • Transparent to mainframe operations • Leverages low-cost SATA II technology • High performance read and write IBM mainframe Disk Library for mainframe © Copyright 2010 EMC Corporation. All rights reserved. • Unmatched remote replication capability 16 Mainframe Application Use Cases There are two distinct use cases for mainframe tape Primary Storage • • • • Interactive batch jobs Minimal data redundancy Retention periods of just hours Equivalent read/write transactions Secondary Storage • • • • Backup and archive applications Highly redundant data Retention periods of weeks and months Higher writes and lower read transactions © Copyright 2010 EMC Corporation. All rights reserved. 17 Deduplication Storage Expansion Option Eliminate Redundancy from Backup and Archive Workloads on EMC Disk Library for Mainframe © Copyright 2010 EMC Corporation. All rights reserved. 18 Mainframe Application Use Cases There are two distinct use cases for mainframe tape Primary Storage • • • • Batch jobs Minimal data redundancy Retention periods of just hours Equivalent read/write transactions Secondary Storage • • • • Backup and archive applications Highly redundant data Retention periods of weeks and months Higher writes and lower read transactions © Copyright 2010 EMC Corporation. All rights reserved. DLm960 Deduplication Storage Expansion Option 19 Deduplication Storage Expansion Option Disk Library for mainframe and industry’s most popular deduplication system • Based on proven Data Domain DD880 • Nearly 3.5 PBs of logical capacity • System throughput up to 4.3TB per hour – Hardware compression – Deduplication • Reliability designed for the datacenter – Multipath for access to all tapes – Data Domain Data Invulnerability Architecture – Call home for support • Easy integration into existing infrastructure DLm960 Deduplication Storage Expansion Option © Copyright 2010 EMC Corporation. All rights reserved. – Behaves like a tape library to the application – Low bandwidth replication between disk library systems – No changes to current management process 20 Disk Library for Mainframe Components Virtual tape emulation controller (VTEC) • Includes the virtual tape engines (VTEs) • Emulates IBM 3480/3490/ 3590 tape drives – 256 tape drives per VTE VTE VTE VTE VTE VTE VTE – Up to 1,536 with six VTEs • FICON/ESCON connectivity VTEC Back-end storage – Throughput of 1.2 GB/s with six VTEs (FICON) • • • • Transparent to mainframe tape management systems Virtual cartridge size up to 16 TB Disk consumption is based on data, not cartridge size Supports hardware compression Back-end storage • Leverages 1 TB or 2 TB SATA II drives • RAID 6 (12+2) configuration – Hot spare for every disk tray EMC Disk Library for mainframe © Copyright 2010 EMC Corporation. All rights reserved. • Stores all tape images as files • Shares all tape volumes between all VTEs 21 How DLm960 with Deduplication Storage Expansion Option Maps to Mainframe Host • The mainframe host views Disk Library for mainframe as tape drives VTE 0000-00FF 0100-01FF 0200-02FF 0300-03FF 0400-04FF 0500-05FF IBM mainframe EMC Disk Library DLm960 Deduplication Storage Expansion option • Tape Administrator will map specific range of tape drives to a specific VTE Batch Backup and Archive © Copyright 2010 EMC Corporation. All rights reserved. 22 DD880: Industry’s Fastest Backup Engine • Scalability – Up to 5.4 TB/hr aggregate backup (1500 MB/s)* • 1.28 TB/hr single stream write – Up to 3.5 PB logical storage* • General features – 4U, quad-socket, Intel processor – 48GB RAM and two 1GB NVRAM cards • Redundant external storage connectivity – 2 dual port SAS HBAs – 96 TB raw capacity, 71 TB addressable RAID-6 storage – Redundant, hot swap, 1 + 1 power * Based on Data Domain Operating System 4.7 performance and capacity. Version 4.7 is required when using DD880 as the Deduplication Storage Expansion option © Copyright 2010 EMC Corporation. All rights reserved. 23 Data Integrity: Data Invulnerability Architecture Trust but verify—”hope” is not a strategy Data verification Checksum Deduplication, write to disk Verify Generate Checksum Verify Data File System Global Compression Self-healing file system Cleaning Expired data Defrag Verify Local Compression RAID Verify the file system metadata integrity Verify user data integrity Verify stripe integrity Other RAID 6 NVRAM Snapshots © Copyright 2010 EMC Corporation. All rights reserved. 24 Disk Library for Mainframe Family DLm120 DLm960 Number of VTEs 1 or 2 1–6 Connectivity FICON FICON Number of channels to host 2 or 4 2–12 Number of virtual tape drives Up to 512 Up to 1,536 Maximum capacity (usable) 9.5 TB–47.5 TB 28.5 TB–1.2 PB Performance Up to 400 MB/s Up to 1.2 GB/s Number of cabinets 1 2–13 Replication Hardware compression © Copyright 2010 EMC Corporation. All rights reserved. 25 Writing a Tape Volume  Mainframe application requests a scratch tape be mounted by the system  In response to a “mount” request, the VTE allocates a new VOLSER and mounts it on the requested drive VTE VOLSER-xxx VOLSER-yyy VOLSER-zzz IBM mainframe © Copyright 2010 EMC Corporation. All rights reserved. EMC Disk Library for mainframe  As the mainframe application writes the tape, the VTE creates a new file on the disk using VOLSER as the file name  When the mainframe application closes the tape, the VTE closes the file (VOLSER), causing it to be retained 26 Reading a Tape Volume  Mainframe application requests a tape by VOLSER  The VTE opens the file named VOLSER and mounts it on the requested drive VTE VOLSER-xxx VOLSER-yyy VOLSER-zzz IBM mainframe © Copyright 2010 EMC Corporation. All rights reserved. EMC Disk Library for mainframe  As the mainframe application reads the tape, the VTE reads the file; re-creating the tape exactly as it was written  When the mainframe application closes the file (VOLSER), the VTE closes the disk file and unmounts the drive 27 Low Bandwidth Bi-directional Replication WAN/IP Source Celerra NS-960 IBM Mainframe © Copyright 2010 EMC Corporation. All rights reserved. Celerra NS-960 Target Deduplication Storage Expansion Option IBM Mainframe WAN/IP Deduplication Storage Expansion Option 28 Low Bandwidth Bi-directional Replication WAN/IP Source Target VTE VTE VTE VTE VTE VTE IBM Mainframe VTE VTE VTE VTE VTE VTE Celerra NS-960 DLm960 © Copyright 2010 EMC Corporation. All rights reserved. Deduplication Storage Expansion Option IBM Mainframe WAN/IP DLm960 Celerra NS-960 Deduplication Storage Expansion Option 29 Reduce CPU Cycles • Reduce/eliminate DFHSM ML1 – Move directly from ML0 to ML2 • Reuse expensive disk space – Save CPU compression cycle and time savings • Leverage Disk Library for mainframe hardware compression ML0 ML0 ML2 ML0 Production DASD Disk Library for mainframe – ML2 information is still kept on disk • Fast access time • DFHSM recycle time optimized – DFHSM will continue to perform tape recycling Virtual tape reclaimed on disk • Recycling at disk speed • No constraints for number of tape drives – Hours of savings potential © Copyright 2010 EMC Corporation. All rights reserved. 30 © Copyright 2010 EMC Corporation. All rights reserved. 31 Guaranteed Replication 2:00 AM 2:12 AM 2:30 AM 2:35 AM 2:40 AM 2:00 AM DLm replication starts based on RPO 2:12 AM Backup job starts Production Site Remote Site Backing up 100 GB dataset (will create a 30 GB tape) 2:30 AM Next DLm replication cycle starts based on RPO Backup tape is being replicated First 25 GB of tape are replicated 2:35 AM Backup job ends “rewind unload” command is sent to the DLm DLm forces replication Last 5 GB of the backup tape are replicated 2:40 AM Replication is complete Return control to the mainframe host to finish the backup job © Copyright 2010 EMC Corporation. All rights reserved. 32 How Disk Library for Mainframe Is Mapped to the Mainframe Host  The mainframe host views Disk Library for mainframe as tape drives VTE – Each VTE can emulate 256 tape drives 0000-00FF 0100-01FF 0200-02FF 0300-03FF 0400-04FF 0400-04FF  Tape Administrator will map specific range of tape drives to a specific VTE  Each tape VOLSER is kept on disk as file IBM mainframe EMC Disk Library DLm960 Data Domain DD880 – File name is the same as the VOLSER – No pre-allocation of space  Space maintenance – DD880 Time To Live (TTL) will be used for scratch tape space reclamation Batch © Copyright 2010 EMC Corporation. All rights reserved. Backup 33 Highly Scalable Systems • Separate disk array targets for mainframe workloads • Scale as required – DLm960: Scratch tape and balanced read and write – Deduplication Storage Expansion Option: Archival and higher write applications • Start small – Scale in increments of 9.5 TB in DLm960 (1 TB) • Up to 13 cabinets • Balance storage between multiple backend storage arrays – Scale Deduplication Storage Expansion option to 71 TB in 12 TB increments • One cabinet – One VTE and 19.3 TB with the DLm960 (2 TB) • Minimum of two cabinets – One Deduplication Storage Expansion option in one cabinet • Minimum 23 TB VTE VTE VTE VTE VTE VTE © Copyright 2010 EMC Corporation. All rights reserved. 34 © Copyright 2010 EMC Corporation. All rights reserved. 200 150000 150 100000 100 50000 50 TBs 200000 0 3490 3490E 3480 Datasets 107389 191889 366 TBs 102.39 237.86 0.02 Daily Tape GB Transferred 3500 3000 2500 2000 1500 1000 500 0 GB Read Mounts HSM Capacity Analysis HSM Activity Analysis HSM Tape Use HSM Control Parameters Analysis Tape Mount and Transport Activity Tape Library Analysis Tape Bandwidth Analysis 250 0 • Report of the summary of findings: – – – – – – – Datasets/TB by Devtype 900 800 700 600 500 400 300 200 100 0 Tape Mounts per Hour 12/26/07 at… 12/26/07 at… 12/27/07 at… 12/28/07 at… 12/28/07 at… 12/29/07 at… 12/30/07 at… 12/30/07 at… 12/31/07 at… 01/01/08 at… 01/01/08 at… 01/02/08 at… 01/03/08 at… 01/03/08 at… 01/04/08 at… 01/05/08 at… 01/05/08 at… 01/06/08 at… 01/07/08 at… 01/07/08 at… 01/08/08 at… 01/09/08 at… 01/09/08 at… 01/10/08 at… 01/11/08 at… 01/11/08 at… 01/12/08 at… 01/13/08 at… 01/13/08 at… HSM MCDS data HSM FSR data HSM list TTOC data HSM control parameters System log data Tape device configuration Tape library management system catalog data (TLMS, CA1-TMS, RMM or ZARA) – SMF data—record types 14, 15, 21, 30, 40 – – – – – – – Datasets • A comprehensive analysis of the current tape environment; data sources include: GB Tape Assessment 250000 35 Top 20 tape jobs created 96% of tapes From Tape Assessment service © Copyright 2010 EMC Corporation. All rights reserved. 36 Why Disk Library for Mainframe? • Provides deduplication for backup and recovery solutions – onsite retention and lowers overall disk and tape storage costs • Eliminates all issues related to traditional tape handling – Eliminates manual intervention, physical movement of tape cartridges, robotic issues, and single points of failure • Works seamlessly with existing applications – Uses existing tape management processes to automate tape vaulting • Significantly improves performance – Reallocates all of the data to disk and uses smart I/O buffering, allowing potentially significant reductions in batch windows • Extends disaster recovery capabilities to the tape workload – Utilizes array-based replication process over IP to seamlessly move tapes offsite • Easily scales as the workload increases – No need for additional subsystems, libraries, network connections, etc. © Copyright 2010 EMC Corporation. All rights reserved. 37 THANK YOU © Copyright 2010 EMC Corporation. All rights reserved. 38