Transcript
Enterprise Strategy Group | Getting to the bigger truth.™
ESG Lab Review
Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack Abstract Date: May 2017 Author: Vinny Choinski, Senior Lab Analyst TrilioVault from Trilio Data is a tightly integrated data protection and recovery solution for OpenStack clouds. ESG Lab tested TrilioVault with a focus on ease of management and deployment, performance, and scalability. ESG Lab found TrilioVault to be a solid solution to handle the two-fold challenge of managing a complex, distributed OpenStack environment and providing backup and disaster recovery for OpenStack clouds.
The Challenges According to ESG Research, cost is the most often-cited challenge for IT managers implementing data protection processes and technologies (see Figure 1).1 Further compounding IT management in scale-out cloud deployments is that poorly engineered backup and recovery environments can negatively impact application and database performance. Cost and application performance are stressed because of the difficulty with designing, implementing, and operating a backup and recovery solution that is policy-based, automated, and comprehensive.
Figure 1. Top Challenges with Current Data Protection Processes and Technologies
Source: Enterprise Strategy Group, 2017
The Solution: TrilioVault TrilioVault, developed by Trilio Data, is an environment management framework for OpenStack deployments with initial focus on application protection—backup and restore of the entire environment, not just the data. The TrilioVault solution takes point-in-time snapshots of workloads and enables one-click recovery in case of any failure. From an operational
1
Source: ESG Research Report, 2017 Data Protection Modernization, to be published. This ESG Lab Review was commissioned by Trilio Data and is distributed under license from ESG. © 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
2
perspective, the solution was built with cloud characteristics in mind: tenant-driven, self-service, forever scalable, and fully automated deployment and management.
OpenStack OpenStack is a cloud operating system that manages pools of compute, storage, and networking resources in a data center. Administrators use an intuitive dashboard that ensures control while also empowering users to self-provision resources using a web interface. Its low cost, scalability, customizability, and security model make it especially popular with organizations building distributed systems. OpenStack comprises a set of open-source software tools that work together to create private, public, or hybrid cloud environments. Its core components and optional services are shows in Figure 2.2
Figure 2. OpenStack Model and Services
Source: https://www.openstack.org/software/
TrilioVault TrilioVault is a protection solution built from the ground up to operate within the OpenStack framework and to protect and restore the OpenStack distributed framework. Running as a native service, TrilioVault leverages OpenStack services and open source features to enable organizations to recover workloads and improve their recovery time objectives (RTO) and recovery point objectives (RPO). The TrilioVault multi-tenant, self-service, policy-based solution is designed to protect application workloads from data corruption or data loss, providing point-in-time snapshots with configuration and change awareness to seamlessly recover a workload with one click. As shown in Figure 3, TrilioVault consists of three major components: The TrilioVault Virtual Appliance – a virtual machine that performs the management and orchestration of the backup
and restore process. TrilioVault can be deployed as one or more VM(s) on a KVM-based hypervisor. The Trilio Data Mover – a lightweight, Python-based service that moves data during backup and restore of an
OpenStack environment. The Data Mover is installed on each Nova compute node of the OpenStack cloud. The TrilioVault API – a RESTful API set installed on OpenStack controller nodes that connects the TrilioVault VM to the OpenStack services using the services’ native APIs. A plug-in to the OpenStack Horizon dashboard further simplifies management.
2
A glossary of additional OpenStack terms used in this review can be found at https://docs.openstack.org/ops-guide/common/glossary.html © 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
3
Figure 3. TrilioVault Architecture Nova / Horizon
Nova / Horizon
Trilio Trilio Trilio Vault VM Vault VaultVM VM
KVM
Trilio Horizon Server Vault API Nova API
Trilio Horizon Server Vault API Nova API
Nova / Horizon Controller 3
Controller 2
Controller 1
Trilio Horizon Server Vault API Nova API
RabbitMQ Nova Compute 1 VM
VM
VM
Trilio Data Mover
Nova Compute 2 VM
VM
VM
Trilio Data Mover
Nova Compute 3 VM
VM
Nova Compute n VM
VM
Trilio Data Mover
CINDER STORAGE
VM
VM
Trilio Data Mover
….
NFS / Swift Source: Trilio Data, 2017
Additionally, TrilioVault uses an NFS system or Swift object storage services to store and recover data and metadata for its instances. TrilioVault leverages the replication capabilities of the NFS system to provide remote replication and disaster recovery. Support for Amazon Web Services S3 environments is on the roadmap. The TrilioVault installation package also includes guest scripts for coordinating application-consistent backups with MySQL and Postgres, and templates that users can modify to coordinate consistent backups with additional applications. The basic requirements for installing the TrilioVault VM are: Virtual Appliance Requirements: 40GB storage, 24GB memory, and 4 CPUs. Virtual Appliance: Based on Ubuntu 14.04. Supported OpenStack releases: Kilo, Liberty, Mitaka, Newton; OpenStack Distributions RED HAT ENTERPRISE LINUX
OPENSTACK PLATFORM 7, 8, 9,Mirantis MOS 7, 8, 9; or Linux Distributions Ubuntu 14.04.4 LTS (Trusty Tahr), Ubuntu 16.04 LTS, CentOS 7, RedHat 7, and SUSE OpenStack Cloud 6 and 7. During the backup process, the Trilio Virtual Snapshot Technology (VAST) captures the entire OpenStack workload environment including the application; operating system; compute, network, and storage configurations; security groups; and the environment’s data and metadata. After an initial backup, VAST collects incremental point-in-time images of the environment, processes the snapshots along with previously stored information, and offers complete, restorable images of the workload environment. Specifically, TrilioVault captures: Security groups which include firewall rules for each instance; network configurations including subnets and netmasks; and availability zones from the OpenStack Neutron networking service. 2. VM flavors and VM metadata from the OpenStack Nova compute service. 3. Volume configuration and volume types from the OpenStack Cinder block storage service. 4. VM images from the OpenStack Glance image service. 1.
© 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
4
During backup, Trilio also captures all the application data from the OpenStack Cinder data store, using Cinder and Ceph snapshots. TrilioVault then processes the environment metadata and application data snapshots to create the backup images in a copy-on-write format stored on the NFS or Swift data store. The copy-on-write file format enables TrilioVault to do a full restore from any point-in-time image. TrilioVault offers three restore capabilities: One-click seamless recovery for the entire workload to its original form. Selective recovery of the VMs onto target networks, availability zones, regions, or clouds that are different from the
original. Accessing and recovering individual files and folders from backup images.
ESG Lab Tested ESG Lab tested the TrilioVault data protection capability on Trilio Data systems located in Hopkinton, MA. Testing focused on ease of management and deployment, performance, and scalability.
Ease of Management and Deployment First, ESG Lab tested the TrilioVault deployment and configuration process. As shown in Figure 4, the process is straightforward, with the installation launch and configuration of the Vault VM, configuration of the controller nodes, compute nodes and Horizon plug-in, installation of the CLI, and then installation and/or adaptation of the Ansible scripts for complete backup consistency.
Figure 4. TrilioVault Configuration
Source: Enterprise Strategy Group, 2017
© 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
5
ESG Lab then explored TrilioVault’s comprehensive capabilities for describing the OpenStack workloads to be protected and the snapshot policies as shown in Figure 5.
Figure 5. Workloads and Snapshots
Source: Enterprise Strategy Group, 2017
Finally, ESG Lab tested TrilioVault’s restore features including one-click restore, selective restore, and file-level restore. Selective restore, as shown in Figure 6 , is a key part of TrilioVault’s disaster recovery capability, as it enables workloads to be moved to and restored on target environments that are different from the original environment – other availability zones, data centers, or clouds. This process leverages existing replication when a remote site is the destination for the recovery.
Figure 6. Selective Restore
Source: Enterprise Strategy Group, 2017 © 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
6
Why This Matters Downtime, caused by inability to restore or lengthy restore times, inflicts hard costs on businesses, including lost revenue, lost opportunity, and the cost of the formidable efforts required to restore. Cloud deployments are not immune from the cost of downtime, and often the many moving parts of a distributed cloud deployment make recovery more difficult and error-prone than with traditional infrastructure. TrilioVault orchestrates the backup and recovery process, reducing the complexity and risk that accompany manual recovery. ESG Lab found that TrilioVault makes protection of complex OpenStack clouds straightforward, with recovery as simple as one click, for the minimum disruption due to downtime.
Scale and Performance Next, ESG Lab tested TrilioVault’s ability to characterize its performance and to determine its ability to scale to match the growth characteristics of an OpenStack cloud. As Figure 7 shows, the TrilioVault dashboard extends its user-friendliness to performance, as it summarizes both the elapsed time and amount of data backed up for workloads under Trilio’s care.
Figure 7. Performance Information in the TrilioVault GUI
Source: Enterprise Strategy Group, 2017
One of the benefits of TrilioVault’s DevOps-friendly, agentless architecture is the ability to increase performance of OpenStack clusters by adding more Nova compute nodes, using software configuration management tools such as Ansible, Heat, Puppet, or Chef. TrilioVault is automatically deployed along with this new infrastructure to maintain performance, scalability, and flexiblity as the application grows. ESG Lab tested a full backup of 500 GB for up to ten source VMs by up to © 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
7
five TrilioVault Data Movers. Since the TrilioVault Data Mover runs on the compute nodes, TrilioVault’s backup performance scales nearly linearly, as shown in Figure 8.
Figure 8. TrilioVault Scaling Trilio Scale - Data Mover (more is better) 90 80 70
MB/Sec
60 50 40 30 20 10 0 2 DMs
3 DMs
4 DMs
5 DMs
Source: Enterprise Strategy Group, 2017
What the Numbers Mean In keeping with the scale-out characteristic of OpenStack clusters, TrilioVault’s backup performance scales as compute
nodes with Trilio Data Movers are added to the cluster. A single pair of Data Movers delivered 36 MB/sec. This increased to 77 MB/sec as the number of Data Movers increased and more jobs were run in parallel. In a world of 1GbE or 10GbE OpenStack cluster connects, Data Mover bandwidth consumption is not a practical limit to backup performance. However, the write bandwidth of the NFS or Swift backup store can be a practical limit to TrilioVault scaling.
A key characteristic of the TrilioVault architecture is that once the first backup is stored, additional backups are taken by elegant use of OpenStack’s snapshot capabilities and TrilioVault’s VAST on the backend. Subsequent backups move much less data, shortening backup times and enabling the effective protection of OpenStack clouds that grow into the hundreds of terabytes or more. TrilioVault’s incremental approach stores only the changes since the last incremental—unlike differential backups, which store the changes since the last full backup—saving both time and storage capacity. TrilioVault further reduces the data bloat of an incremental forever approach by managing capacity across the data retention cycle. For example, with a backup cycle of N days, at day N+1 the oldest incremental backup is committed to the full backup image and the oldest cycle’s data is discarded, so the maximum amount of data TrilioVault stores is one full and N incremental backups.
© 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
8
For a sample data set of 40GB, Figure 9 shows the dramatic performance improvement with TrilioVault’s snapshot-based incremental schema. Backup times were reduced from approximately 34 minutes for the first full to approximately six minutes for the average incremental.
Figure 9. TrilioVault Full and Incremental Backup Performance Full Versus Incremental Backup Duration (less is better)
Backup Duration (sec)
2500
2000
1500
1000
500
0 Full
Incr-1
Incr-2
Incr-3
Incr-4
Incr-5
Source: Enterprise Strategy Group, 2017
What the Numbers Mean The TrilioVault perpetual incremental schema reduced the backup duration by 80% for ongoing incremental backups
as compared with the initial full backup operation. Incremental backup only needs to move the blocks that have changes since the last increment backup job. Incremental backups also have a positive impact on network resources because they reduce the network load of the overall backup operation. The shorter time for incremental backups translates directly into a shorter RPO.
Why This Matters Elastic scalability is a key feature driving organizations to implement cloud deployments. A recent ESG survey showed that 35% of users rated rapid elasticity (i.e., the ability to add or remove IT resources as needed) as the most important capability of a private cloud infrastructure.3 ESG Lab testing found that TrilioVault performance scaled as the underlying OpenStack cloud scaled. Additionally, TrilioVault’s snapshot-based schema enables scaling with minimal data transfer, and a full restore can be conducted from any point-in-time incremental backup.
3
ESG Research Report, The Cloud Computing Spectrum, from Private to Hybrid, March 2016. © 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Lab Review: Redefining Protection in the Cloud with Trilio Data: Automated, Scalable and Rapid Recovery For OpenStack
9
The Bigger Truth The proliferation of cloud deployments does not diminish the attention that backup and recovery command from IT. In fact, the advent of new deployment models such as OpenStack clouds exacerbates the already-difficult operational challenges of backup because traditional backup tools are not a good fit for this new distributed deployment model. The ability to scale translates directly into the ability to do work, whether it’s to generate revenue or to handle other transactions, such as processing patient records or performing data protection processes. TrilioVault from Trilio Data is an integrated data protection and copy management solution for OpenStack clouds. TrilioVault’s Virtual Snapshot Technology (VAST) makes point-in-time copies of both metadata and data to enable both recovery from a local failure as well as disaster recovery from an offsite location. TrilioVault can recover all or part of a failed deployment. ESG Lab confirmed that TrilioVault is easy to deploy and use, and enables users to perform self-service recovery. ESG Lab testing shows that TrilioVault’s performance scales as the underlying OpenStack cluster grows. Since cloud deployments grow by nature, this feature can ensure that applications continue to function properly and remain protected, with minimal disruption to operations. In today’s multi-cloud and hybrid-cloud application deployments, it is rare for any cloud to stand alone. The ability to coexist with other clouds, particularly public clouds, is becoming essential. While TrilioVault current supports only OpenStack deployments, ESG is pleased to learn that Trilio Data is working on future support for Amazon Web Services S3 environments. Numerous Fortune 50 companies across the globe currently use TrilioVault to protect their own clouds or cloud services they provide to end-users. As OpenStack adoption grows, TrilioVault is well-positioned to lead in OpenStack data protection; as always, execution will be the determining factor. From ESG’s perspective, if you are looking for an easy-to-manage, effective OpenStack data protection solution that scales easily, TrilioVault is well worth a look.
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188. The goal of ESG Lab reports is to educate IT professionals about data center technology products for companies of all types and sizes. ESG Lab reports are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objective is to go over some of the more valuable feature/functions of products, show how they can be used to solve real customer problems and identify any areas needing improvement. ESG Lab's expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments.
© 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
www.esg-global.com
[email protected]
© 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved.
P.508.482.0188