Preview only show first 10 pages with watermark. For full document please download

Vmware Virtual San

   EMBED


Share

Transcript

VMware Virtual SAN Technical Walkthrough / Medium Dive William David van Collenburg VMware Systems Engineer © 2014 VMware Inc. All rights reserved. Virtual SAN Technical Walkthrough 1 SDS and Virtual SAN Overview 2 Use Cases 3 Hardware Requirements 4 Technical Characteristics and Architecture 5 Configuration Walkthrough 6 Virtual Machine Provisioning Operations 7 Resiliency and Failed Scenarios 8 Interoperability 9 Design and Sizing 10 Troubleshooting 2 VMware Storage Innovations vSphere 5.x vSphere 4.x VI 3.x •  •  •  •  VMFS Snapshots Storage vMotion NAS & iSCSI support •  •  •  •  •  Thin Provisioning Storage I/O control Boot from SAN VAAI Linked mode •  •  •  •  •  •  •  Storage DRS Profile-driven Storage VASA vSphere Storage Appliance vSphere Data Protection vSphere Replication vSphere Flash Read Software-defined Storage 2014+ 2011-2013 2008-2010 2005-2007 3 Hypervisor-Converged Opportunities Why the virtualization platform can play a critical role to solve storage problems? vSphere •  Inherent knowledge of application •  Global view of infrastructure Server Side Flash •  Hardware agnostic SAN & NAS All Flash BLOB DAS Hypervisor-Converged storage solutions abstract the plumbing to optimize storage for applications 4 Virtual SAN vSAN VSAN 5 VMware Software-defined Storage Bringing compute’s operational model to storage •  Common policy-based automation and orchestration Policy-driven Control Plane Virtual Data Services Data Protection Virtual SAN Hypervisor-converged Storage pool Mobility LUN LUN Virtual Data Plane SAN/NAS Pool LUN Performance LUN •  VM centric data services •  Third-party services integration LUN Object Storage Pool LUN •  Abstraction and pooling •  Infrastructure integration •  New storage tiers VVOL x86 Servers SAN / NAS Cloud Object Storage 6 VMware Virtual SAN •  Hypervisor-Converged storage platform •  Software-defined storage software solution. •  Aggregates locally attached storage from each ESXi host in a cluster. vSphere + Virtual SAN •  Flash optimized storage solution. … •  VM-Centric data operations and policy SSD Hard disks SSD Hard disks SSD Hard disks driven management principals. •  Resilient design based on a Distributed Virtual SAN Shared Datastore RAID architecture –  No single points of failures •  Fully integrated with vSphere. 7 Deeply Integrated with VMware Stack Bringing the benefit of VMware’s products to make Storage Easy vSphere Data Protection vMotion DRS Snapshots VDP Advanced vSphere HA Storage vMotion Linked clones vSphere Replication Cloud Ops and Automation vCenter Operations Manager Disaster Recovery Site Recovery Manager Virtual Desktop VMware Horizon View Storage Policy-Based Management vCloud Automation Center IaaS 8 Virtual SAN is NOT a Virtual Storage Appliance –  Virtual SAN is fully integrated with vSphere (ESXi & vCenter) –  Drivers embedded in ESXi 5.5 contain the Virtual SAN smarts –  Kernel modules: •  Provide the shortest path for I/O •  Remove unnecessary management overheads when dealing with an appliance •  Do not consume resources unnecessarily VSA Virtual SAN – Not a VSA Virtual SAN – Embedded into vSphere 9 VMware Virtual SAN •  Radically Simple Hypervisor-Converged Storage Software •  Hybrid storage solution –  Magnetic disks (HDD) vSphere + Virtual SAN –  Flash based disks (SSD) … •  Storage scale out architecture built into the hypervisor SSD Hard disks SSD Hard disks SSD Hard disks •  Dynamic capacity and performance scalability Virtual SAN Shared Datastore •  Object based storage architecture •  Interoperable with vSphere and enterprise features: –  vMotion, DRS, vSphere HA 10 Virtual SAN Key Benefits Radically Simple High Performance §  Installs in two clicks §  Embedded in vSphere kernel §  Managed from vSphere Client §  Flash-accelerated §  Policy-based management §  Up to 915K IOPs from 16 nodes cluster §  Self-tuning and elastic §  Deep integration with VMware stack §  Matches the VDI density of all flash array Lower TCO §  Eliminates large upfront investments (CAPEX) §  Grow-as-you-go (OPEX) §  Flexible choice of industry standard hardware §  Does not require specialized skills §  Best price/performance 11 Simplifies and Automates Storage Management Per VM storage service levels from a single self-tuning datastore Per VM Storage Policies Policies set based on application needs Software automates control of service levels Capacity Storage Policy-Based Management Performance vSphere + VSAN SLAs Availability VSAN Shared Datastore No more LUNs! 12 Virtual SAN Puts The App In Charge Simpler and automated storage management through application centric approach Today VSAN 1. Define storage policy 5. Consume from pre-allocated bin 2. Apply policy at VM creation 4. Select appropriate bin 3. Expose pre-allocated bins 2. Pre-allocate static bins Resource and data service are automatically provisioned and maintained 1. Pre-define storage configurations VSAN Shared Datastore ✖  Overprovisioning (better safe than sorry!) ✖  Wasted resources, wasted time ✖  Frequent Data Migrations ü  No overprovisioning ü  Less resources, less time ü  Easy to change 13 VMware Virtual SAN Use Cases Use Cases Tier 2 / Tier 3 Test / Dev / Staging Private cloud Backup and DR Target VDI Site A Management Clusters Virtual Desktop Site B DMZ / Isolated ROBO VSAN vSphere 15 VMware Virtual SAN Hardware Requirements Hardware Requirements SAS/SATA/PCIe/NVMe SSD Any Server on VMware Compatibility Guide At least 1 of each SAS/NL-SAS/SATA HDD 1Gb/10Gb NIC SAS/SATA Controllers (RAID Controllers must work in “pass-through” or RAID0” mode 4GB to 8GB USB, SD Cards, SATADOM •  Minimum 3 ESXi 5.5 Hosts, Maximum Hosts “I’ll tell you later……” 17 Flash Based Devices In Virtual SAN ALL read and write operations always go directly to the Flash tier. Flash based devices serve two purposes in Virtual SAN 1.  Non-volatile Write Buffer (30%) –  Writes are acknowledged when they enter prepare stage on SSD. –  Reduces latency for writes 2.  Read Cache (70%) –  Cache hits reduces read latency –  Cache miss – retrieve data from HDD Choice of hardware is the #1 performance differentiator between Virtual SAN configurations. 18 Flash Based Devices VMware SSD Performance Classes Workload Definition –  Class A: 2,500-5,000 writes per second –  Queue Depth: 16 or less –  Class B: 5,000-10,000 writes per second –  Transfer Length: 4KB –  Class C: 10,000-20,000 writes per second –  Operations: write –  Class D: 20,000-30,000 writes per second –  Pattern: 100% random –  Class E: 30,000+ writes per second –  Latency: less than 5 ms Examples Endurance –  Fusion I/O 1,2 TB PCIe SSD ~500.000 writes per second –  10 Drive Writes per Day (DWPD), and –  Intel 400GB 910 PCIe SSD ~38.000 writes per second –  Random write endurance up to 3.5 PB on 8KB transfer size –  Toshiba 200GB SAS SSD ~16.000 writes per second per NAND module, or 2.5 PB on 4KB transfer size per NAND module 19 Magnetic Disks (HDD) •  SAS/NL-SAS/SATA HDDs supported –  7200 RPM for capacity –  10000 RPM for performance –  15000 RPM for additional performance •  NL SAS will provide higher HDD controller queue depth at same drive rotational speed and similar price point –  NL SAS recommended if choosing between SATA and NL SAS •  Differentiate performance between clusters with SSD selection, and SSD:HDD ratio. Rule of thumb guideline is 10% 20 Storage Controllers •  SAS/SATA Storage Controllers –  Pass-through or “RAID0” mode supported •  Performance using RAID0 mode is controller dependent –  Check with your vendor for SSD performance behind a RAID-controller •  Storage Controller Queue Depth matters –  Higher storage controller queue depth will increase performance •  Validate number of drives supported for each controller 21 Storage Controllers – RAID0 Mode •  Configure all disks in RAID0 mode –  Flash based devices (SSD) –  Magnetic disks (HDD) •  Disable the storage controller cache –  Allows better performance as cache is controlled by Virtual SAN •  Disks Device cache support –  Flash based devices leverage write through caching –  Magnetic disks leverage write back caching •  ESXi may not be able to differentiate flash based devices from magnetic devices. –  Use ESXCLI to manually flag the devices as SSD 22 Network •  1Gb / 10Gb supported –  10Gb shared with NIOC for QoS will support most environments –  If 1GB then recommend dedicated links for Virtual SAN •  Jumbo Frames will provide nominal performance increase –  Enable for greenfield deployments •  Virtual SAN supports both VSS & VDS –  NIOC requires VDS –  Nexus 1000v – Should work but hasn't been fully tested •  Network bandwidth performance has more impact on host evacuation, rebuild times than on workload performance 23 Firewalls •  Virtual SAN Vendor Provider (VSANVP) –  Inbound and outbound - TCP 8080 •  Cluster Monitoring, Membership, and Monitoring Services (CMMDS) –  Inbound and outbound UDP 12345 - 23451 •  Reliable Datagram Transport (RDT) –  Inbound and outbound TCP 2233 24 VMware Compatibility Guide 25 Two Ways to Build a Virtual SAN Node Radically Simple Hypervisor-Converged Storage 1 VSAN Ready Node Preconfigured server ready to use VSAN… 2 Build your own Choose individual components … Any Server on vSphere Hardware Compatibility List Multi-level cell SSD (or better) or PCIe SSD SAS/NL-SAS HDD Select SATA HDDs …with 10 different options between multiple 3rd party vendors available at GA 6Gb enterprise-grade HBA/ RAID Controller …using the VSAN Compatibility Guide* * Note: For additional details, please refer to Virtual SAN VMware Compatibility Guide Page VMware Virtual SAN Technical Characteristics and Architecture Technical Characteristics Virtual SAN is a cluster level feature similar to: –  vSphere DRS –  vSphere HA –  Virtual SAN Deployed, configured and manage from vCenter through the vSphere Web Client (ONLY!). –  Radically simple •  Configure VMkernel interface for Virtual SAN •  Enable Virtual SAN by clicking Turn On 28 Virtual SAN Implementation Requirements •  Virtual SAN requires: –  Minimum of 3 hosts in a cluster configuration vSphere 5.5 U1 Cluster local storage local storage local storage HDD HDD –  All 3 host MUST!!! contribute storage •  vSphere 5.5 U1 or later –  Locally attached disks •  Magnetic disks (HDD) •  Flash-based devices (SSD) –  Network connectivity •  1GB Ethernet •  10GB Ethernet (preferred) HDD cluster esxi-01 esxi-02 esxi-03 29 Storage Policy-based Management •  SPBM is a storage policy framework built into vSphere that enables virtual machine policy driven provisioning. •  Virtual SAN leverages this new framework in conjunction with VASA API’s to expose storage characteristics to vCenter: –  Storage capabilities •  Underlying storage surfaces up to vCenter and what it is capable of offering. –  Virtual machine storage requirements •  Requirements can only be used against available capabilities. –  VM Storage Policies •  Construct that stores virtual machine’s storage provisioning requirements based on storage capabilities. 30 Virtual SAN SPBM Object Provisioning Mechanism Storage  Policy  Wizard   virtual disk SPBM   Datastore  Profile   VSAN object manager VSAN objects may be (1) mirrored across hosts & (2) striped across disks/hosts to meet VM storage profile policies VSAN object Virtual SAN Constructs and Artifacts New Virtual SAN constructs, artifacts and terminologies: •  Disk Groups. •  VSAN Datastore. •  Objects. •  Components. •  Virtual SAN Network. 32 Virtual SAN Disk Groups •  Virtual SAN uses the concept of disk groups to pool together flash devices and magnetic disks as single management constructs. •  Disk groups are composed of at least 1 flash device and 1 magnetic disk. –  Flash devices are use for performance (Read cache + Write buffer). –  Magnetic disks are used for storage capacity. –  Disk groups cannot be created without a flash device. Each host: 5 disk groups max. Each disk group: 1 SSD + 1 to 7 HDDs disk group disk group disk group HDD HDD HDD disk group HDD disk group HDD 33 Virtual SAN Datastore •  Virtual SAN is an object store solution that is presented to vSphere as a file system. •  The object store mounts the VMFS volumes from all hosts in a cluster and presents them as a single shared datastore. –  Only members of the cluster can access the Virtual SAN datastore –  Not all hosts need to contribute storage, but its recommended. vsanDatastore Each host: 5 disk groups max. Each disk group: 1 SSD + 1 to 7 HDDs disk group disk group HDD HDD VSAN network VSAN network disk group disk group HDD HDD VSAN network VSAN network disk group HDD VSAN network 34 Virtual SAN Objects •  Virtual SAN manages data in the form of flexible data containers called objects. virtual machine files are referred to as objects. •  Virtual machines files are referred to as objects. –  There are four different types of virtual machine objects: •  VM Home •  VM swap •  VMDK •  Snapshots •  Virtual machine objects are split components based requirements defined in profile. vsanDatastore Each host: 5 disk groups max. Each disk group: 1 SSD + 1 to 7 HDDs disk group HDD VSAN network disk group disk group disk group disk group into multiple on performance and availability HDD HDD VM Storage HDD HDD VSAN network VSAN network VSAN network VSAN network 35 Virtual SAN Components •  Virtual SAN components are chunks of objects distributes across multiple hosts in a cluster in order to tolerate simultaneous failures and meet performance requirements. •  Virtual SAN utilizes as Distributed RAID architecture to distribute data across the cluster. •  Components are distributed with the use of two main techniques: –  Striping (RAID0) vsanDatastore –  Mirroring (RAID1) RAID1 •  Number of component replicas based on replica-2 replica-1 and copies created is the object policy definition. disk group disk group HDD HDD VSAN network VSAN network disk group disk group HDD HDD VSAN network VSAN network disk group HDD VSAN network 36 Object and Components Layout /vmfs/volumes/vsanDatastore/rolo/rolo.vmdk rolo2.vmdk rolo1.vmdk rolo.vmx, .log, etc The VM Home directory object is formatted with VMFS to allow a VM’s configuration files to be stored on it. VMFS VMFS R0 VMFS Virtual SAN Storage Objects R1 R0 VMFS R0 Availability defined as number of copies Performance may include a stripe width disk group disk group disk group disk group disk group HDD HDD HDD HDD HDD VSAN network VSAN network VSAN network VSAN network Low level storage objects would reside on different hosts VSAN network 37 Virtual SAN Network •  New Virtual SAN traffic VMkernel interface. –  Dedicated for Virtual SAN intra-cluster communication and data replication. •  Supports both Standard and Distributes vSwitches –  Leverage NIOC for QoS in shared scenarios •  NIC teaming – used for availability and not for bandwidth aggregation. •  Layer 2 Multicast must be enabled on physical switches. –  Much easier to manage and implement than Layer 3 Multicast uplink2 uplink1 vmk0 vmk1 vmk2 Management Virtual Machines vMotion Virtual SAN 100 shares 150 shares 250 shares 500 shares Distributed Switch 38 Virtual SAN Network •  NIC teamed and load balancing algorithms: Multi chassis link aggregation capable switches –  Route based on Port ID •  active / passive with explicit failover –  Route based on IP Hash •  active / active with LACP port channel –  Route based on Physical NIC load •  active / active with LACP port channel uplink2 uplink1 vmk0 Management 100 shares Virtual Machines vmk1 vMotion vmk2 Virtual SAN 150 shares 250 shares 500 shares Distributed Switch Virtual SAN Scalable Architecture •  Scale up and Scale out architecture – granular and linearly storage, performance and compute scaling capabilities –  Per magnetic disks – for capacity –  Per flash based device – for performance –  Per disk group – for performance and capacity –  Per node – for compute capacity vsanDatastore disk group HDD scale up disk group VSAN network HDD disk group disk group HDD HDD VSAN network VSAN network disk group scale out HDD VSAN network 40 VMware Virtual SAN Configuration Walkthrough Configuring VMware Virtual SAN •  Radically Simple configuration procedure Setup Virtual SAN Network Enable Virtual SAN on the Cluster Select Manual or Automatic If Manual, create disk groups 42 Configure Network •  Configure the new dedicated Virtual SAN network –  vSphere Web Client network template configuration feature. 43 Enable Virtual SAN •  One click away!!! –  Virtual SAN configured in Automatic mode, all empty local disks are claimed by Virtual SAN for the creation of the distributed vsanDatastore. –  Virtual SAN configured in Manual mode, the administrator must manually select disks to add the the distributed vsanDatastore by creating Disk Groups. 44 Disk Management •  Each host in the cluster creates a single or multiple disk groups which contain a combination of HDDs, and SSDs. 45 Virtual SAN Datastore •  A single Virtual SAN Datastore is created and mounted, using storage from all multiple hosts and disk groups in the cluster. •  Virtual SAN Datastore is automatically presented to all hosts in the cluster. •  Virtual SAN Datastore enforces thin-provisioning storage allocation by default. 46 VM Storage Policies •  VM Storage Policies are accessible from vSphere Web Client Home screen. 47 Virtual SAN Capabilities •  Virtual SAN currently surfaces five unique storage capabilities to vCenter. 48 Number of Failures to Tolerate •  Number of failures to tolerate –  Defines the number of hosts, disk or network failures a storage object can tolerate. For “n” failures tolerated, “n+1” copies of the object are created and “2n+1” host contributing storage are required. esxi-01 esxi-02 esxi-03 vmdk vmdk ~50% of I/O ~50% of I/O esxi-04 raid-1 vsan network witness Virtual SAN Policy: “Number of failures to tolerate = 1” 49 Number of Disk Stripes Per Object •  Number of disk stripes per object –  The number of HDDs across which each replica of a storage object is distributed. Higher values may result in better performance. esxi-01 esxi-02 esxi-03 esxi-04 raid-1 raid-0 raid-0 vsan network stripe-1a stripe-2a stripe-2b witness stripe-1b VSAN Policy: “Number of failures to tolerate = 1” + “Stripe Width =2” 50 Virtual SAN Storage Capabilities •  Force provisioning –  if yes, the object will be provisioned even is the policy specified in the storage policy is not satisfiable with the resources currently available. •  Flash read cache reservation (%) –  Flash capacity reserved as read cache for the storage object. Specified as a percentage of logical size of the object. •  Object space reservation (%) –  Percentage of the logical size of the storage object that will be reserved (thick provisioned) upon VM provisioning. The rest of the storage object is thin provisioned. 51 Virtual SAN I/O flow – Write Acknowledgement esxi-01 esxi-02 esxi-03 vmdk vmdk esxi-04 raid-1 vsan network Destaging to HDD is done independently between hosts. VSAN mirrors write IOs to all active mirrors, these are acknowledged when they hit the flash buffer! witness Virtual SAN I/O flow – 1MB increment striping esxi-01 esxi-02 esxi-03 esxi-04 raid-1 raid-0 raid-0 vsan network stripe-1a 1MB(1) 1MB(3) 1MB(5) 1MB(2) 1MB(4) witness stripe-1b (x) indicates stripe segment. VSAN is thin provisioned by default, stripes grow in increments of 1MB Components and Objects Visualization •  Visualization of mapping and layout of all objects and components –  vSphere Web Client –  RVC 54 Storage Capabilities Recommended Practices Storage Capability Use Case Value Number of failures to tolerate (RAID 1 – Mirror) Redundancy Default 1 Max 3 Number of disk stripes per object (RAID 0 – Stripe) Performance Default 1 Max 12 Object space reservation Thick Provisioning Default 0 Max 100% Flash read cache reservation Performance Default 0 Max 100% Force provisioning Override policy Disabled 55 VM Storage Policies Recommendations •  Number of Disk Stripes per object –  Should be left at 1, unless the IOPS requirements of the VM is not being met by the flash layer. •  Flash Read Cache Reservation –  Should be left at 0, unless there is a specific performance requirement to be met by a VM. •  Proportional Capacity –  Should be left at 0, unless thick provisioning of virtual machines is required. •  Force Provisioning –  Should be left disabled, unless the VM needs to be provisioned, even if not in compliance. 56 VMware Virtual SAN Virtual Machine Provisioning Operations Virtual Machine Provisioning Operations •  All VM provisioning operation include access to VM Storage Policies 58 Virtual Machine Provisioning Operations •  If the Virtual SAN Datastore understands the capabilities in the VM Storage Policy, it will be displayed as a matching resource. 59 Virtual Machine Provisioning Operations –  If the VSAN Datastore can satisfy the VM Storage Policy, the VM Summary tab will display the VM as compliant. –  If not, due to failures, or the force provisioning capability, the VM will be shown as non-compliant. 60 Virtual Machine Policy Management •  Modify VM performance, capacity, and availability requirements without downtime. 61 VMware Virtual SAN Resiliency & Failure Scenarios Understanding Failure Events §  Virtual SAN recognized two different types of hardware device events in order to define the type of failed scenario: –  Absent –  Degraded §  Absent events are responsible to trigger the 60 minutes recovery operations. –  Virtual SAN will wait 60 minutes before starting the object and component recovery operations –  60 minutes is the default setting for all absent events –  Configurable value via hosts advanced settings 63 Understanding Failure Events §  Degraded events are responsible to trigger the immediate recovery operations. –  Triggers the immediate recovery operation of objects and components –  Not configurable §  Any of the following detected I/O errors are always deemed degraded: –  Magnetic disk failures –  Flash based devices failures –  Storage controller failures §  Any of the following detected I/O errors are always deemed absent: –  Network failures –  Network Interface Cards (NICs) –  Host failures 64 Failure handling philosophy §  Traditional SANs –  Physical drives that fail need to be replaced to get back to full redundancy –  Hot-spare disks are set aside to take role of failed disks immediately –  In both cases: 1:1 replacement of disk §  Virtual SAN –  Entire cluster is a “hot-spare”, we always want to get back to full redundancy –  When a disk fails, many small components (stripes or mirrors of objects) fail –  New copies of these components can be spread around the cluster for balancing –  Replacement of the physical disk just adds back resources Managing Failure Scenarios §  Through policies, VM’s on Virtual SAN can tolerate multiple failures –  Disk Failure – degraded event –  SSD Failure – degraded event –  Controller Failure – degraded event –  Network Failure – absent event –  Server Failure – absent event §  VM’s continue to run §  Parallel rebuilds minimize performance pain –  SSD Fail – immediately –  HDD Fail – immediately –  Controller Fail – immediately –  Network Fail – 60 minutes –  Host Fail – 60 minutes 66 Virtual SAN Access Rules §  Components Access Rules •  At least 1 mirror copy intact •  All stripes must be intact •  Greater than 50% of components must be available •  Including witnesses 1 Mirror Copy All stripes available > 50% components and witnesses Logic is implemented per object Power on Operation Magnetic Disk Failure – Instant mirror copy •  Degraded - All impacted components on the failed disk will be instantaneously created onto other disk, disk groups, or hosts. Disk failure, instant mirror copy of impacted component esxi-01 esxi-02 esxi-03 vmdk vmdk esxi-04 raid-1 vsan network vmdk Instant! new mirror copy witness Flash Based Device Failure – Instant mirror copy •  Degraded - All impacted components on the failed disk will be instantaneously created onto other disk, disk groups, or hosts. •  Greater impact on the cluster overall storage capacity Disk failure, instant mirror copy of impacted component esxi-01 esxi-02 esxi-03 vmdk vmdk esxi-04 raid-1 vsan network vmdk Instant! new mirror copy witness Host Failure – 60 Minute Delay •  Absent – will wait the default time setting of 60 minutes before starting the copy of objects and components onto other disk, disk groups, or hosts. •  Greater impact on the cluster overall compute and storage capacity. Host failure, 60 minutes wait copy of impacted component esxi-01 esxi-02 esxi-03 vmdk vmdk esxi-04 raid-1 vsan network vmdk 60 minute wait new mirror copy witness Network Failure – 60 Minute Delay •  Absent – will wait the default time setting of 60 minutes before starting the copy of objects and components onto other disk, disk groups, or hosts. •  NIC failures, physical network failures can lead to network partitions. –  Multiple hosts could be impacted in the cluster. Network failure, 60 minutes wait copy of impacted component esxi-01 esxi-02 esxi-03 vmdk vmdk esxi-04 raid-1 vsan network vmdk 60 minute wait new mirror copy witness Virtual SAN 1 host isolated – HA restart isolated! esxi-01 esxi-02 esxi-03 esxi-04 HA restart raid-1 vsan network vmdk vmdk vSphere HA restarts VM witness Virtual SAN 2 hosts isolated – HA restart isolated! esxi-01 esxi-02 esxi-03 isolated! esxi-04 HA restart raid-1 vsan network vmdk vmdk witness vSphere HA restarts VM on ESXi-02 / ESXi-03, they own > 50% of components! Virtual SAN partition – With HA restart esxi-01 Partition 1 esxi-02 Partition 2 esxi-03 esxi-04 HA restart raid-1 vsan network vmdk vmdk vSphere HA restarts VM in Partition 2, it owns > 50% of components! witness Maintenance Mode – planned downtime §  3 Maintenance mode options: §  Ensure accessibility §  Full data migration §  No data migration VMware Virtual SAN Interoperability Technologies and Products Technology Interoperability •  Virtual SAN is fully integrated with many of VMware’s storage and vSphere availability enterprise features. Supported Not applicable Future Virtual Machine Snapshots Storage IO Control (SIOC) 62 TB VMDKs vSphere HA Storage DRS vCOPS vSphere DRS Distributed Power Management (DPM) vMotion 77 Horizon View •  Virtual SAN and Horizon View: –  Handle peak performance such as boot, login, read/write storms –  Seamless granular scaling without huge upfront investments –  Support high VDI density –  Support high end virtual desktop GPU requirements •  Virtual SAN is compatible with the following Horizon View vSphere + Virtual SAN versions: … SSD Hard disks SSD Hard disks –  Horizon View 5.3 (SPBM manually implemented) SSD Hard disks –  Policies maintained across operations such as refresh/refresh – no need to re-associate Full Clone Policies Linked Clone Policies –  FTT = 1 for persistent –  OS Disk: FTT = 1 for dedicated pools, –  FTT = 0 for non-persistent –  OS Disk: FTT = 0 for floating pool –  Provisioning 100% reserved –  Replica Disk: FTT = 1 –  Replica Disk: Read Cache Reservation 10% –  Provisioning: Thin vSphere Replication and Site Recovery Manager recovery site production site vCenter Server vCenter Server VR/SRM VR/SRM •  Virtual SAN is compatible with: –  vSphere Replication 5.5 (vSphere Web Client) –  SPBM configured as part of replication –  vCenter Site Recovery Manager 5.5 (vSphere C#) vSphere vSphere + Virtual SAN –  SRM configuration based on VR replication VMFS SSD Hard disks SSD Hard disks SSD Hard disks •  vSphere Replication & vCenter Site Recovery Manager –  Asynchronous replication – 15 minute RPO –  VM-Centric based protection –  Provide automated DR operation & orchestration replication –  Automated failover – execution of user defined plans –  Automated failback – reverser original recovery plan –  Planned migration – ensure zero data loss –  Point-in-Time Recovery – multiple recovery points –  Non-disruptive test – automate test on isolated network vSphere Data Protection •  Virtual SAN and vSphere Data Protection vCenter Server vCenter Server –  Radically simple to deploy and manage –  Integrated User Interface – vSphere Web Client –  Highly available storage solution vSphere –  Increase operation efficiency vSphere + Virtual SAN VMFS SSD Hard disks SSD Hard disks SSD Hard disks •  vSphere Data Protection Advanced 5.5 –  Source and target De-duplication capabilities –  Bidirectional replication capabilities –  Secure, easy, reliable, network-efficient replication –  Application-consistent backup and recovery capabilities –  Higher RTO and RPO – 24 hours RTO, minutes – hours RPO –  Incorporated technologies •  vStorage API for Data protection •  Change Block Tracking (CBT) •  Avamar variable-length segment algorithm vCloud Automation Center •  vCloud Automation Center provides Virtual SAN: –  Centralized provisioning, governance, infrastructure management capabilities –  Simple and self-service consumption capabilities –  Entitlement compliance monitoring, and enforcement –  Leverage existing business processes and tools –  Delegation control of resources vSphere + Virtual SAN •  Custom use of VM Storage Policies: … –  Virtual SAN default policy SSD Hard disks SSD Hard disks SSD Hard disks –  Blueprints – VM templates –  Via vCenter Orchestrator – with custom workflow –  Via vCloud Automation Center designer – modifying provisioning workflow –  OpenStack OpenStack Framework Horizon KeyStone Dashboard Neutron Nova Cinder Glance Swift networking compute node volume service image store object store NSX vsphere vsphere datastore driver driver driver identity service SSD Hard disks –  Leverage the use of Flash Optimized storage in OpenStack –  vSphere Web Plug-in for OpenStack UI … Hard disks –  Cloud Ready App to Hypervisor Converged solution –  Resiliency for legacy and Cloud Ready applications vSphere + Virtual SAN SSD •  Virtual SAN and OpenStack Framework •  Virtual SAN interoperates with OpenStack Framework. SSD Hard disks –  vSphere Driver –  vSphere Datastore VMware Virtual SAN Design & Sizing Guidelines Exercise Virtual SAN Datastore §  Distributed datastore capacity determined by aggregating the disk groups found across multiple hosts that are members of a vSphere cluster and the size of the magnetic disks. §  Only the usable capacity of the magnetic disks count towards the total capacity of the Virtual SAN datastore. §  The capacity of the flash based devices is specifically dedicated to Virtual SAN's caching layer. vsanDatastore Each host: 5 disk groups max. Each disk group: 1 SSD + 1 to 7 HDDs disk group disk group HDD HDD VSAN network VSAN network disk group disk group HDD HDD VSAN network VSAN network disk group HDD VSAN network Objects §  Individual storage block device that is compatible with SCSI semantics. §  Each object that resides on the Virtual SAN datastore is comprised of multiple components. §  Objects are assigned storage performance and availability services requirements through VM Storage Profiles. Object Types Definitions VM Home Location where all virtual machines configuration files reside (.vmx, log files, etc.) Swap Unique storage object only created when virtual machines are powered on. VMDK Virtual machine disk file Snapshots Unique storage object created for virtual machines Components §  Objects are comprised of components that are distributed across hosts in vSphere cluster. §  Virtual SAN 5.5 currently supports a maximum of 3000 components per host. §  Objects greater than of 255 gigabytes in capacity are automatically divided into multiple components. §  Each component consumes 2 megabytes of disk capacity for metadata. Witness §  Witness components are part of every storage object. §  Only contain object metadata. §  Serve as tiebreakers when availability decisions are made in the Virtual SAN cluster in order to avoid split-brain behavior. §  Each Virtual SAN witness component also consumes 2 megabytes of capacity. Virtual SAN Datastore Sizing Considerations §  It is important to understand the impact of availability and performance storage capabilities on the consumption of storage capacity. •  Number of Failures to Tolerate •  Number of Disk Stripes per Object •  Flash Read Cache Reservation •  Object Space Reservation Disk Groups §  A single flash based device (SAS/SATA/PCIe SSD) and one or more magnetic disks (SAS/SATA HDD). §  Disk Groups make up the distributed flash tier and storage capacity of the Virtual SAN Datastore. §  Formatted with a modified on-disk file system (VMFS-L) and are then mounted onto the Object Store File System datastore as a single datastore §  VMFS-L on-disk file system formatting consumes a total of 750 megabytes of capacity per disk. Artifacts Minimums Maximums Disk Groups 1 Per Host 5 per host Flash Devices (SAS/SAS/PCIe SSD) 1 Per Disk Group 1 Per Disk Group Magnetic Disk Devices 1 HDD Per Disk Group 7 HDD Per Disk Group Disk Formatting Overhead 750 MB Per HDD 750 MB Per HDD Number of Failures to Tolerate §  Largest impact on the consumption of storage capacity in Virtual SAN. §  Based on the availability requirements of a virtual machine, the setting defined in a VM Storage Policy can lead to the consumption of up to four times the virtual machine or individual disks capacity 2 full copies of data + 1 witness Number of Disk Stripes Per Object §  If the Number of Disk Stripes per Object is increased beyond the default value of 1, then each stripe will count as a separate component. §  This has an impact on the of total number of components supported per host. Disk Group Design §  One Flash Device Per Disk Group §  Multiple flash based devices, multiple disk groups will be created to leverage the additional flash §  Higher the ratio of flash based device capacity to magnetic disks capacity, the greater the size of the cache layer. §  Define and reduce the storage failure domains. Each host: 5 disk groups max. Each disk group: 1 SSD + 1 to 7 HDDs disk group disk group HDD HDD VSAN network VSAN network disk group disk group HDD HDD VSAN network Failure domain VSAN network disk group HDD VSAN network Flash Capacity Sizing §  The general recommendation for sizing Virtual SAN's flash capacity is to have 10% of the anticipated consumed storage capacity before the Number of Failures To Tolerate is considered. Measurement Requirements Values Projected VM space usage 20GB Projected number of VMs 1000 Total projected space consumption per VM 20GB x 1000 = 20,000 GB = 20 TB Target flash capacity percentage 10% Total flash capacity required 20TB x .10 = 2 TB §  Total flash capacity percentage should be based on use case and their capacity and performance requirements. –  10% is a general recommendation, could be too much or it may not be enough. Sizing Exercise Formulas Constraints •  VSAN components and VMFS metadata overhead (VSANmetaDataOverhead): 1GB per disk Variables •  Number of Hosts Per cluster (Hst) = 8 •  Number of Disk Groups (DskGrp) = 5 •  Number of Disks Per Disk Group (DskPerDskGrp) = 7 •  Size of Disks (SzHDD) = 4000 GB •  Number of Failures To Tolerate (ftt) = 1 •  Number of Virtual Machines (VMs) = 800 •  Number of Disks per Virtual Machine (NumOfVMDK) = 1 •  Memory Per Virtual Machine (vmSwp) = 10 GB Cluster RAW Capacity •  Formula: Hst x NumDskGrpPerHst x NumDskPerDskGrp x SzHDD = y •  Example: 8 x 5 x 7 x 4000 GB =1,120,000 GB =1,120 TB 94 Sizing Exercise Formulas VMFS Meta Data • Formula: VMFSMetadata x NumDskGrpPerHst x NumDskPerDskGrp = y • Example: 750 MB x 5 x 7 = 26,250 MB = 26.2 GB VMFS Metadata Objects • Formula: VMs x [VMnamespace + vmSwap + NumOfVMDK] = y • Example: 800 x [1 + 1 + 1] = 2400 Objects Note: Snaps, Clones and >1 Disk Stripes would add more objects Components • Formula: Object x [ftt x 2 + 1] = y • Example: 2400 x (1 x 2 + 1) = 7200 Components = 900 average components per host (max is 3000 per host) Components Metadata • Formula: NumComponents x compMetadata = y • Example: 7200 Components x 2 MB = 14.4 GB Component Metadata 95 Sizing Exercise Formulas VSAN Meta Data • Formula: compMetadata + VMFSMetadata = y • Example: 14.4 GB + 26.2 GB = 40.6 GB VSAN Metadata Swap Utilization • Formula: (VMs x vmSwp x 2) • Example: Swap Space = (100 x 10GB x 2) = 2000 GB Available Capacity = Raw Capacity – Swap Capacity = 1120000 GB – (100 x 10GB x 2) = 1120000 – 2000 = 1118000 = 1,118 TB Disk Capacity Usable Capacity • Formula: (DiskCapacity – VSAN Meta Data) / (ftt + 1) • Example: (1118000 GB - 41 GB) / 2 = 1117959 GB / 2 = 558,980 GB Usable Capacity •  Best practice is to allocate no more than 80% to virtual disks 96 Memory and CPU §  Memory requirements for Virtual SAN are defined based on the number of disks groups and disk that are managed by hypervisor. §  As long as vSphere hosts have greater memory configurations than 32 gigabytes of RAM, they will be able to support the maximum disk group and disks configuration supported in Virtual SAN. §  Virtual SAN is designed to introduce no more than 10% of CPU overhead per hosts. Consider this fact in Virtual SAN implementations with high consolidation ratios and CPU intensive applications requirements. Network §  Virtual SAN network activities can potentially saturate and overwhelm an entire 1GbE network, particularly during rebuild and synchronization operations. §  Separate the different traffic types (Management, vMotion, Virtual Machine, Virtual SAN) onto different VLANs and use shares as a Quality of Service mechanism to sustain the level of performance expected during possible contentions scenarios. §  Virtual SAN requires for IP multicast to be enabled on the layer 2 physical network segment utilized for Virtual SAN communication VMware Virtual SAN Monitoring & Troubleshooting Network Status reports •  Misconfiguration detected: –  Verify physical network –  Enable multicast •  Disabling IGMP snooping •  Configure IGMP snooping for selective traffic •  Validate the virtual switch configuration –  VLAN –  VSAN Traffic service enabled –  NIC team failover policy Failover Policy •  NIC Teaming failover load balancing: –  policy with route based on port ID –  Active / Standby 101 Command Line Tools •  VMKPING –  vmkping •  Example – 10.4.90.27 –  To validate network accessibility •  ESXCLI –  esxcli vsan network list 102 Disk Claiming Operation •  Automatic disk claiming operation fails to claim disks –  “Is local: true” disks are automatically claimed –  “Is local: false” disks are shared thus not automatically claimed but can be manually marked local 103 Ruby vSphere Console •  RVC VSAN –  vsan.disks_info –  Size, disk type, manufacturers, model, local/non-local 104 Disk Groups Creation Fails •  Disk Groups Creation Fails –  VSAN license needs to be added to the cluster •  Home > licenses > Cluster tab > Select cluster object > Assign License Key –  vSphere Web Client refresh time out •  Log out and back in •  Unable to delete Disk Group –  VSAN disk claiming operation set to automatic, change to manual •  “vsan.host_wipe_vsan_disks --force” –  wipe disks used by VSAN CONFIDENTIAL 105 Observing performance §  Monitor performance: §  Ruby vSphere Console & VSAN Observer §  In-depth monitoring of VSAN’s physical disk layer performance, cache hit rates, latencies, etc. VSAN Observer •  Starting the VSAN Observer –  Performance stats 1 2 VSAN Observer – Monitoring Flash Devices •  Monitor read cache hit rate •  Flash based devices evictions to magnetic disks 108 VSAN Observer •  Monitor disk groups aggregate and disk layers Virtual SAN Logs •  Virtual SAN related logs. –  Individually maintain per hosts 110 Ruby vSphere Console •  Disk Capacity – used and reserved capacity •  Monitoring VSAN Component Limits 111 Ruby vSphere Console •  Virtual SAN what if failure analysis –  Simulate host failure impact to cluster Ruby vSphere Console §  VSAN Observer recommendations –  Deploy a VCVA appliance to use for the Observer –  Run the observer session on the newly deployed or remote VCVA appliance –  Increase the data gathering time beyond the default (2 hours) if necessary. Oh yeah! Scalability….. 915K IOPS vsanDatastore 2.2 Petabytes 114 THANK YOU ! ! ! 115 Graphics by Duncan Epping & Rawlinson