Preview only show first 10 pages with watermark. For full document please download

Sorting Through The Confusion

   EMBED


Share

Transcript

Sorting Through the Confusion Replacing Tape with Disk for Backups Table of Contents Introduction ..................................................................................................................... 2 Backup Requirements ....................................................................................................... 5 Backup or Cloud Services .................................................................................................. 6 Disk Staging .................................................................................................................... 8 Primary Storage SNAPS ................................................................................................... 10 Backup Application Deduplication in the Media Server ......................................................... 11 Backup Application Client Side Deduplication ..................................................................... 13 Purpose-Built Target Side Deduplication Appliances ............................................................ 15 Summary ...................................................................................................................... 16 About ExaGrid ................................................................................................................ 17 Sorting Through the Confusion Page 1| Introduction The reason a 50-year old technology like tape is still around is simple; it’s CHEAP. But there is increasing pressure on businesses to fix their backups, as detailed in many sources including the report, “Best Practices for Addressing the Broken State of Backup” by Dave Russell, research vice president at Gartner. He found that “for many organizations, backup has become an increasingly daunting and brittle task fraught with significant challenges.” The pressure of data growth has increased sharply as businesses need to store both onsite and offsite copies of their data. This can mean storing 40 to 100 times the volume of their primary dataset, due to storing weeks of retention onsite and weeks, months and in some cases, years of retention off site. Maintaining more copies with longer term retention is driven by business needs such as SEC Audits, regulations such as the Gramm Leach Bliley Act (GLBA), Health Insurance Portability and Accountability Act (HIPAA) and Sarbanes Oxley (SOX), legal discovery requirements, Service Level Agreements (SLA), and many other business or legal reasons. The market is full of affordable alternatives to tape, but at the same time confusion exists about technologies like deduplication. Disk has made steady inroads on tape as the primary target for backup software and is just one part of a whole new equation emerging where near real- time business continuity and disaster recovery are the new desired result. Disk eliminates the daily grind and uncertainty that typically surrounds backup to tape, giving IT staffs relief from worrying whether backups and restores are completing successfully. Now that it is economically feasible to move from a tape-based to disk-based backup approach, many vendors with varying approaches have emerged. This has caused a great amount of confusion for IT managers looking to adopt a disk-based backup system. This whitepaper is intended to help sort through this confusion. The white paper presents a general overview of data deduplication and different backup approaches including:       Backup services or cloud backup services Disk staging – storing data on disk inserted between the media servers and the tape libraries Primary storage SNAPs Backup application deduplication in the media server writing to standard disk Backup application deduplication on server agents (client side) writing to standard disk Purpose-built target side appliance with deduplication Each of these six potential solutions often considered to replace tape are reviewed in the context of overall backup requirements, and information about each approach is presented, including the pros and cons of each approach. Sorting Through the Confusion Page 2| Data Deduplication Overview One of the few remaining arguments for tape is that tape libraries will technically never "run out of retention capacity". As soon as a tape cartridge fills up, it can be replaced with another tape cartridge and the full cartridges can be stored. When writing to disk, storing the same amount of data that is stored on tape would require a massive amount of disk, resulting in high cost. However, if you could use a fraction of the space required to store the data on disk and bring the cost of disk storage close to the cost of tape, then disk is clearly the better alternative. From week to week, only about 2% of the bytes change. However, with tape backup 98% of the unchanged data is backed up repeatedly, resulting in saving the identical data dozens and even hundreds of times. With disk, deduplication software can intelligently save only the 2% of the data that changes from week to week, saving only the changed data. The net result of using disk storage and data deduplication together is you only need 1/20th to 1/50th of the storage you would need on tape. Figure 1 - Data Deduplication Taxonomy Since tape costs about 1/20th the price of disk per TB of usable capacity, using data deduplication effectively neutralizes the price gap between tape and disk by using far less disk space than is required to store the same data on tape. There are many methods to data deduplication including:      Fixed data block (64KB to 128KB) - used in Backup Software Applications Changed storage blocks - used in primary storage SNAPS Byte level - used in target side appliances Data block with variable content splitting - used in target side appliances Zone-byte level - used in target side appliances All of these methods reduce redundant data in backups. For example, if a full backup of 50TB of data is completed every Friday night, and 10 weeks are kept onsite, it would take 500TB of disk space to store the backup. However, most of the full backup is unchanged from week to week. Only the data that has been changed, edited or created that week needs to be stored. On average, only about 2% of the data changes from week to week. In this example, 2% is about 1TB per week. Sorting Through the Confusion Page 3| If you were to take out all of the redundant data, over time the storage required can be reduced by as much as 50:1, depending on the deduplication method used. Factors Impacting Deduplication Results In general, the higher the deduplication ratios, the better. A higher deduplication ratio uses less disk space over time and needs far less WAN bandwidth to replicate data to the offsite disaster recover site. Deduplication Approach The deduplication approach selected impacts the amount of storage savings that will result.   Figure 2 - Deduplication Reduces Storage over Time 64KB to 128KB fixed block will average about 7 to 1 Byte, Segment-block and Zone will average from average from about 20: 1 to 50: 1 reduction in data storage Data Mix Affects Results The deduplication ratio can range from 10: 1 to as much as 50: 1, depending on the mix of data types being backed up. Databases can get very high deduplication ratios of over 100: 1. Unstructured file data will see an average ratio of 7-10:1. Deduplicating compressed or encrypted files does not yield a high ratio or significant space savings. Retention Period The longer the retention period, the higher the deduplication ratio will be. Getting the Best Results The best deduplication ratios will be achieved in environments that are:  Using byte, data block or zone-level deduplication  Backing up no compressed or encrypted data  Retaining data for longer-term periods, on the order of 18 weeks  The worst deduplication ratio will be achieved in environments that are:  Using 64KB or 128KB fixed block deduplication  Backing up a large amount of compressed or encrypted data  Retaining data for shorter-term periods, on the order of 4 weeks or less The net is that not all deduplication approaches achieve the same results. Deduplication ratios are clearly impacted by data types and retention periods. All of these factors need to be taken into consideration when choosing the proper disk backup approach. Sorting Through the Confusion Page 4| Backup Requirements The chart below shows the top backup requirements of most IT shops, arranged in priority order. Each of the approaches, including staying with tape, is shown in its own column. As you can see, not all approaches can meet all requirements. The key is to list your requirements and match them against each of the solutions to see which solutions best meet your requirements. The following sections show the strengths and limitations of each of the 6 disk solutions. Backup or Cloud Services There are many backup or cloud services to which backup can be outsourced, and the market is evolving as new players enter the field. These services require replacing the server agents used by the backup application. The service can then remotely manage the backup environment. At the start, one complete backup of the data needs to be sent to the backup server. The logistics of doing this data transfer can be troublesome, due to the large, sustained bandwidth required. After the initial full backup is transferred, just the changes in the data need to be uploaded to the outsourced service. Most of these agents only move changed bytes once the initial full backup is at the server provider (in the cloud). Before a cloud backup recovery strategy is implemented, two key factors should be considered. First, one should ask what the Figure 3 - Typical Cloud Backup recovery point objective (RPO) is for the business service that is being considered. Second, one should ask what the recovery time objective (RTO) is for the business service. Be sure to evaluate carefully the claims made in cloud service contracts. The most important of these contractual promises is the availability of the service, the provider’s service level agreements (SLAs), and the security of your data. According to a Yankee Group report1,”cloud contracts are rife with disclaimers, misleading uptime guarantees, and questionable privacy policies…” Strengths  Frees up IT staff to do other core/critical IT tasks Weaknesses  1 Requires changing all the server agents from your existing backup application to the outsourced service backup agents. Any changes to the agents will require weeks or months of tweaking. http://www.yankeegroup.com/about_us/press_releases/2010-04-21.html Sorting Through the Confusion Page 6|  Good for small amounts of data, typically under 1TB. Best fit for small IT shops or a large company’s small remote office, but not for multi-TB environments. This limitation is due to the time needed to recover the data over the Internet. Under normal operation only the changed bytes or blocks get sent for replication. However, if a full backup restore is required it would take about 31 days to get 1TB of data over 3MB of bandwidth from the internet. It is key to note that it is not the bandwidth between site you are using but rather your bandwidth to the internet.  If the data is over a few TB, most service providers need to place a hardware appliance (cache) in the IT environment to keep at least one week of backups (including a full backup) on-site to overcome the recovery bottleneck presented by bandwidth to the internet. The cost of the cache appliance plus the monthly fees makes a backup or cloud service the most expensive backup choice if you have more than a few TBs of data to protect. Summary  For consumers, small IT environments (<1TB) and small remote offices with a small or nonexistent IT staff, a small data center (if any) and low bandwidth, a backup service is the best way to go.  If there is a reasonable amount of data (>1TB) services become too cumbersome and too costly. Sorting Through the Confusion Page 7| Disk Staging Disk staging places a disk between the media server or storage nodes and the tape library. This is also considered tape augmentation. All backup applications can write directly to a disk volume or NAS share, so disk staging works natively with all backup applications. Disk staging reduces the perceived backup window at the client level, reduces the backup verification window at the server level, and provides the high speed recovery of files from disk, rather than tape. Figure 4 - Disk Staging Concept Overview Strengths  By placing disk between the media servers/storage nodes and the tape library many problems are solved:  Multiple parallel jobs can be handled, without being limited to the number of physical tape drives. This results in faster backups, assuming that media servers can keep up.  Reliable backups and reliable restores for the data are assured using disk. Weaknesses  Disk staging becomes expensive very quickly:  Disk staging does not eliminate the use of tape onsite or offsite. It simply augments tape onsite.  There is no data deduplication when using disk staging so the amount of disk grow very quickly and becomes extremely expensive with any level of retention. For example, two weeks of nightly backups and weekly full backups require storing four times the size of the primary data on disk. This assumes a rotation of full backups for databases and email nightly, incremental backups on files nightly and full backups on Friday. Each night, a combination of incremental backups of files and full backups of databases and email will equal about 25% of a full backup. These Monday through Thursday nightly backups will add up roughly to the size of a full backup. Sorting Through the Confusion Page 8| Using 40TB of data as an example, nightly backups after four nights will be 40TB and a Friday full backup will be 40TB. Together, they will require a total of 80TB of disk storage. After two weeks, this expands to 160TB of disk storage required. Therefore, 90% of customers using disk staging keep between one and two weeks of data on disk. Summary  Disk staging is good for one to two weeks of onsite retention on disk.   It is estimated that about 70% of tape users use disk staging For retention over one or two weeks, or tape replacement onsite, an organization must use data deduplication in order to store only unique data (not the redundant data) in order to use far less disk, thus reducing the cost impact. Sorting Through the Confusion Page 9| Primary Storage SNAPS Primary storage SNAPS (a quick logical copy or snapshot) are useful primarily for shortterm retention. They are just the first line of defense in a layered backup scheme that includes long-term backups. SNAPS save changed storage blocks on a periodic basis (e.g. hourly) that allow for roll back to the last period. Primary storage SNAPS are not intended for long-term or historical backup. Strengths   SNAPS allow rolling back to earlier points and are more granular than a nightly backup Figure 5 - SNAPS Concept SNAPS can be replicated offsite for disaster recovery of short-term, periodic SNAP points Weaknesses  SNAPs write into the same volume as the primary data so they do not offer protection against a system crash, virus attack, data corruption or other event that destroys the primary data. The SNAPs would get destroyed along with the primary data. This is why 99% of IT environments keep a backup copy on a separate system onsite (tape or disk).  SNAPs are not good for long-term retention uses such as legal discovery, regulatory compliance or SEC audits. When years of retention are required, a traditional backup approach is required due to the need to store data at specific points in time but not every interval in between, such as monthly backups for 3 years and then yearly backups for 4 additional years. Summary   Primary Storage SNAPS and long-term traditional backup can co-exist as part of a multi-layered approach to backup tailored to the specific requirements of the business. Primary Storage SNAPS provide for fine-granularity backup points onsite and also offsite, if replicated. It is estimated that 99% of IT environments use a traditional, longer-term backup system. About 50% of IT environments deploy some type of Primary SNAPs as well. Sorting Through the Confusion P a g e 10 | Backup Application Deduplication in the Media Server Some backup applications have a data deduplication feature that can be deployed as an agent in the media server. The intent is to be able to eliminate tape using standard disk in conjunction with the backup application. Data deduplication is a very compute intensive process. If deduplication is run in the media server, resource utilization will increase significantly. This can slow backups down dramatically. To avoid this hit to overall backup performance, backup software uses a form of deduplication that results in a Figure 6 - Running Deduplication on Media Server lower reduction rate. Using the least possible processor and memory resources for the deduplication process avoids starving the media server tasks of resources, but at the cost of lowering deduplication performance. Typically this approach uses 64KB or 128KB fixed blocks and will yield a data reduction ratio of about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach. In addition, software deduplication can only process data that comes from its own proprietary agents. It cannot deduplicate data from other sources including other backup applications, utilities or data base dumps. Some vendors bundle the media server software on a storage server that includes a CPU, memory and disk. This does not change the deduplication rate or the heterogeneous nature of the solution. Strengths   Relatively simple to manage through the backup application Good for environments that have less than 3TB of data to backup, use a single backup application and do not plan to replicate to a second site for disaster recovery Weaknesses  Disk usage is high as the deduplication ratio is only 6-7:1. Over time the disk space required grows sharply. Sorting Through the Confusion P a g e 11 |  Bandwidth needed to send backups to a second site is high as the deduplication ratio is only 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach.  Cannot deduplicate data from:       Veeam, Quest vRanger Lightspeed, SQL Safe, Redgate Direct SQL Dumps, Direct Oracle RMAN Dumps Bridgehead for Meditech data Direct UNIX TAR files Other traditional backup applications Summary  Deduplication in the backup software is good for short-term retention and low amounts of data in environments that are not heterogeneous and where offsite disaster recovery data is not required. Sorting Through the Confusion P a g e 12 | Backup Application Client Side Deduplication Some industry backup applications offer a form of data deduplication in the application server agents or clients. The intent is to be able to eliminate tape using standard disk along with the backup application. The deduplication occurs at the backup agent/client on each application server. Data deduplication is a very compute intensive process. Resource utilization will increase significantly if deduplication is run in the application server (client side), and slow down backups dramatically. To minimize this impact, client side deduplication Figure 7 - Client Side Deduplication software approaches use a less-efficient form of deduplication. Typically they use 64KB or 128KB fixed blocks where they achieve a data reduction rate of about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach. Running a compute intensive deduplication process on your applications servers creates other performance and availability challenges. Furthermore, databases and email, which are 80% of the Monday through Thursday backups, are still sent as full backups. This means that only 20% of the nightly data is actually deduplicated, by client side deduplication, during the week. The true impact is on the Friday night full backup, where 80% of the data is unstructured file data. In addition, the software approach to deduplication can only process data that comes from its own proprietary agents. It cannot deduplicate data from other sources including other backup applications, utilities or data base dumps. Strengths  Great fit for deduplicating data from small remote sites, then replicating it back to a corporate datacenter for backup.  This approach can shorten the backup window, but only on the Friday full backup. During the week, backups are still full backups for data base and email. Sorting Through the Confusion P a g e 13 | Weaknesses  Requires new agents on servers; added risk and cost of changing agents.  Deduplication ratio is only 6-7:1 and the disk space required increases quickly. Bandwidth usage to a second site is high as the deduplication ratio is only 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content splitting average from about 20: 1 to 50: 1 data reduction ratio, or at a minimum three times that of software deduplication.  Cannot deduplicate data from:       Veeam, Quest vRanger Lightspeed, SQL Safe, Redgate Direct SQL Dumps, Direct Oracle RMAN Dumps Bridgehead for Meditech data Direct UNIX TAR files Other traditional backup applications Summary  Very good for replicated remote site data back to a corporate datacenter  Very few businesses actually use this approach due to its risk to application servers and weaknesses Sorting Through the Confusion P a g e 14 | Purpose-Built Target Side Deduplication Appliances Target-side deduplication appliances are built specifically to replace the tape library in the backup process onsite and, optionally, offsite. Because they are dedicated appliances, the hardware and the deduplication methods used can be optimized for that single purpose. Future disk space requirements to deal with data growth are drastically reduced because deduplication ratios from 20:1 to as much as 50:1 can be achieved, Only the data that changes, about 2% of the backup size, is replicated offsite and requires far less bandwidth. In addition, target-side appliances can process data from a variety of utilities and backup applications. Strengths  No change to your backup environment. Use all backup applications, utilities and dumps you are currently using.  Can take in data from: Figure 8 - Target Side Deduplication Appliance  Traditional backup applications  Veeam, Quest vRanger  Lightspeed, Redgate, SQLSafe  SQL Dumps, Oracle RMAN dumps  Direct UNIX TAR files  Many other backup applications and utilities  20:1 to as much as 50:1 deduplication ratios use less disk space and far less bandwidth for replication.  Special features for:  Tracking data to offsite Disaster Recovery  Improving Disaster Recovery RPO (recovery point objective) and RTO (recover time objective)  Purging data as the retention policy calls for aging out data Weaknesses  Backup window improves over using a tape library, but not by as much as client side deduplication for the Friday night full backup Sorting Through the Confusion P a g e 15 | Summary When evaluating different approaches to replacing tape with disk, take the time to ask the right questions and understand the strengths and weaknesses of each alternative. Sorting Through the Confusion P a g e 16 | About ExaGrid ExaGrid is the leader in scalable, cost-effective disk-based backup solutions. A highly scalable system that works with existing backup applications, the ExaGrid system is ideal for companies looking to quickly eliminate the hassles of tape backup while reducing their existing backup windows. ExaGrid’s innovative approach minimizes the amount of data to be stored by providing standard data compression for the most recent backups along with zone-level data deduplication technology for all previous backups. Customers can deploy ExaGrid at primary and secondary sites to supplement or eliminate offsite tapes with live data repositories or for disaster recovery. With offices and distribution worldwide, ExaGrid has more than 4,000 systems installed and hundreds of published customer success stories and testimonial videos available at www.exagrid.com. ExaGrid Systems, Inc | 2000 West Park Drive | Westborough, MA 01581 | 1-800-868-6985 | www.exagrid.com © 2012 ExaGrid Systems, Inc. All rights reserved. ExaGrid is a registered trademark of ExaGrid Systems, Inc.