Transcript
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters
HP Part Number: B7660-90025 Published: September 2008
Legal Notices © Copyright 2008 Hewlett-Packard Development Company, L.P. Publication Date: 2008 Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Intel®, Itanium®, registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. Oracle ® is a registered trademark of Oracle Corporation. UNIX® is a registered trademark in the United States and other countries, licensed exclusively through The Open Group.
Table of Contents Printing History ...........................................................................................................................19 Preface.......................................................................................................................................21
Guide to Disaster Tolerant Solutions Documentation........................................................22 1 Designing a Metropolitan Cluster...............................................................................................25
Designing a Disaster Tolerant Architecture for use with Metrocluster Products..............25 Single Data Center.........................................................................................................26 Two Data Centers and Third Location with Arbitrator(s)............................................26 Arbitrator Node Configuration Rules......................................................................29 Disk Array Data Replication Configuration Rules..................................................29 Calculating a Cluster Quorum.................................................................................30 Example Failover Scenarios with One Arbitrator....................................................30 Example Failover Scenarios with Two Arbitrators..................................................31 Worksheets..........................................................................................................................33 Disaster Tolerant Checklist............................................................................................33 Cluster Configuration Worksheet.................................................................................33 Package Configuration Worksheet................................................................................34 Next Steps...........................................................................................................................36 2 Designing a Continental Cluster.................................................................................................37
Understanding Continental Cluster Concepts...................................................................37 Mutual Recovery Configuration...................................................................................38 Application Recovery in a Continental Cluster............................................................39 Monitoring over a Network..........................................................................................40 Cluster Events................................................................................................................41 Interpreting the Significance of Cluster Events............................................................42 How Notifications Work...............................................................................................42 Alerts.............................................................................................................................43 Alarms...........................................................................................................................43 Creating Notifications for Failure Events.....................................................................44 Creating Notifications for Events that Indicate a Return of Service.............................44 Maintenance Mode for Recovery Groups.....................................................................44 Moving a Recovery Group into Maintenance Mode...............................................45 Moving a Recovery Group out of the Maintenance Mode......................................46 Performing Cluster Recovery........................................................................................46 Performing Recovery Group Rehearsal in Continentalclusters....................................47 Notes on Packages in a Continental Cluster.................................................................49 Startup and Switching Characteristics.....................................................................49 Table of Contents
3
Network Attributes..................................................................................................50 How Serviceguard commands work in a Continentalclusters.....................................50 Designing a Disaster Tolerant Architecture for use with Continentalclusters...................51 Mutual Recovery...........................................................................................................52 Serviceguard Clusters...................................................................................................52 Data Replication............................................................................................................52 Physical Data Replication using Special Environment files....................................54 Multiple Recovery Pairs in a Continental Cluster...................................................55 Highly Available Wide Area Networking.....................................................................56 Data Center Processes ..................................................................................................57 Continentalclusters Worksheets....................................................................................57 Data Center Worksheet ...........................................................................................57 Recovery Group Worksheet ....................................................................................58 Cluster Event Worksheet .........................................................................................58 Preparing the Clusters........................................................................................................59 Setting up and Testing Data Replication.......................................................................59 Configuring a Cluster without Recovery Packages......................................................60 Configuring a Cluster with Recovery Packages............................................................62 Configuring Recovery Groups with Rehearsal Packages.............................................64 Building the Continentalclusters Configuration................................................................65 Preparing Security Files................................................................................................65 Network Security Configuration Requirements......................................................67 Creating the Monitor Package.......................................................................................67 Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters........................................................................................................69 Configuring Shared Disk for the Maintenance Feature...........................................69 Configuring a Monitor Package for the Maintenance Feature................................70 Editing the Continentalclusters Configuration File......................................................71 Editing Section 1—Cluster Information...................................................................71 Editing Section 2 – Recovery Groups.......................................................................74 Editing Section 3—Monitoring Definitions.............................................................79 Selecting Notification Intervals................................................................................84 Checking and Applying the Continentalclusters Configuration..................................87 Starting the Continentalclusters Monitor Package........................................................88 Validating the Configuration........................................................................................89 Documenting the Recovery Procedure.........................................................................90 Reviewing the Recovery Procedure..............................................................................91 Testing the Continental Cluster..........................................................................................91 Testing Individual Packages..........................................................................................92 Testing Continentalclusters Operations........................................................................92 Switching to the Recovery Packages in Case of Disaster...................................................93 Receiving Notification...................................................................................................94 Verifying that Recovery is Needed................................................................................94 Using the Recovery Command to Switch All Packages................................................95 4
Table of Contents
To Start the Failover Process.........................................................................................95 How the cmrecovercl Command Works..................................................................98 Forcing a Package to Start...................................................................................................98 Restoring Disaster Tolerance..............................................................................................99 Restore Clusters to their Original Roles........................................................................99 Primary Packages Remaining on the Surviving Cluster.............................................100 Primary Packages Remaining on the Surviving Cluster using cmswitchconcl......101 Newly Created Cluster Will Run Primary Packages...................................................104 Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups.........................................................................................................................105 Performing a Rehearsal Operation in your Environment................................................106 Maintaining a Continental Cluster...................................................................................107 Adding a Node to a Cluster or Removing a Node from a Cluster .............................108 Adding a Package to the Continental Cluster.............................................................108 Removing a Rehearsal Package from a Recovery Group............................................109 Modifying a Recovery Group with a new Rehearsal Package....................................109 Removing a Package from the Continental Cluster....................................................109 Changing Monitoring Definitions...............................................................................110 Checking the Status of Clusters, Nodes, and Packages...............................................110 Reviewing Messages and Log Files.............................................................................113 Deleting a Continental Cluster Configuration............................................................114 Renaming a Continental Cluster.................................................................................115 Checking Java File Versions.........................................................................................115 Next Steps....................................................................................................................115 Support for Oracle RAC Instances in a Continentalclusters Environment......................115 Configuring the Environment for Continentalclusters to Support Oracle RAC........117 Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration..............................................................................................................126 Initial Startup of Oracle RAC Instance in a Continentalclusters Environment..........126 Failover of Oracle RAC Instances to the Recovery Site...............................................127 Failback of Oracle RAC Instances After a Failover.....................................................130 Rehearsing Oracle RAC Databases in Continentalclusters.........................................131 3 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP....133
Files for Integrating XP Disk Arrays with Serviceguard Clusters....................................133 Overview of Continuous Access XP Concepts.................................................................134 PVOLs and SVOLs.......................................................................................................134 Device Groups and Fence Levels................................................................................135 Fence Level of NEVER.............................................................................................135 Fence Level of DATA...............................................................................................136 Fence Level of ASYNC ............................................................................................136 Continuous Access Link Timeout..........................................................................138 Consistency Group.................................................................................................138 Table of Contents
5
Limitations of Asynchronous Mode......................................................................138 Other Considerations on Asynchronous Mode.....................................................139 Continuous Access Journal Overview.........................................................................139 Journal Volume.......................................................................................................140 Pull-Based Replication...........................................................................................141 Mitigation of Network Problems...........................................................................141 Fence Level.............................................................................................................142 Journal Group.........................................................................................................142 Journal Cache, Journal Volumes, and Inflow Control...........................................142 Continuous Access Journal Pair State....................................................................143 Limitations of XP12000 Continuous Access Journal..............................................143 One-to-One Volume Copy Operations...................................................................143 One-to-One Journal Group Operations.................................................................144 Journal Group Requirement...................................................................................144 Configuring XP12000 Continuous Access Journal.................................................144 Registering Journal Volumes..................................................................................144 Data Replication Connections................................................................................145 Metrocluster package vs. Journal Group...............................................................145 Creating the Cluster..........................................................................................................145 Preparing the Cluster for Data Replication......................................................................146 Creating the RAID Manager Configuration................................................................146 Pair Creation of Journal Groups.............................................................................150 Creating Continuous Access Journal Pair..............................................................150 Sample Raid Manager Configuration File.............................................................150 Notes on the Raid Manager Configuration............................................................153 Configuring Automatic Raid Manager Startup.....................................................153 Defining Storage Units................................................................................................154 Creating and Exporting LVM Volume Groups using Continuous Access XP.......154 Creating VxVM Disk Groups using Continuous Access XP.................................155 Validating VxVM Disk Groups using Metrocluster/Continuous Access Data Replication..............................................................................................................156 Configuring Packages for Disaster Recovery...................................................................157 Completing and Running a Metrocluster Solution with Continuous Access XP............160 Maintaining a Cluster that uses Metrocluster with Continuous Access XP...............160 Viewing the Progress of Copy Operations.............................................................161 Viewing Side File Size............................................................................................161 Viewing the Continuous Access Journal Status.....................................................161 Viewing the Pair and Journal Group Information - Raid Manager using the “pairdisplay” Command..................................................................................161 Viewing the Journal Volumes Information - Raid Manager using the “raidvchkscan” Command................................................................................162 Normal Maintenance..............................................................................................163 Resynchronizing.....................................................................................................164 Using the pairresync Command............................................................................164 6
Table of Contents
Failback..................................................................................................................165 Timing Considerations...........................................................................................165 Data maintenance with the failure of a Metrocluster Continuous Access XP Failover...................................................................................................................166 Swap Takeover Failure (Asynchronous/Journal mode).........................................166 Takeover Timeout (for Continuous Access Journal mode)....................................166 PVOL-PAIR with SVOL-PSUS(SSWS) State (for Continuous Access Journal Mode).....................................................................................................................167 XP Continuous Access Device Group Monitor...........................................................167 XP/Continuous Access Device Group Monitor Operation Overview...................167 Configuring the Monitor........................................................................................168 Configure the Monitor’s Variables in the Package Environment File....................168 Configure XP/Continuous Access Device Group Monitor as a Service of the Package...................................................................................................................170 Configuring the XP/Continuous Access Device Group Monitor as a Service in the Site Controller Package....................................................................................170 Troubleshooting the XP/Continuous Access Device Group Monitor....................171 Completing and Running a Continental Cluster Solution with Continuous Access XP..171 Setting up a Primary Package on the Primary Cluster................................................171 Setting up a Recovery Package on the Recovery Cluster............................................174 Setting up the Continental Cluster Configuration......................................................178 Switching to the Recovery Cluster in Case of Disaster...............................................179 Failback Scenarios........................................................................................................179 Scenario 1...............................................................................................................179 Scenario 2...............................................................................................................179 Failback in Scenarios 1 and 2.................................................................................180 Failback when the Primary has SMPL Status........................................................181 Maintaining the Continuous Access XP Data Replication Environment....................182 Resynchronizing.....................................................................................................182 Using the pairresync Command.......................................................................183 Some Further Points ..............................................................................................183 4 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA...185
Files for Integrating the EVA with Serviceguard Clusters...............................................185 Overview of EVA and Continuous Access EVA Concepts...............................................186 Metrocluster with EVA and Data Replication.............................................................187 DR Groups..............................................................................................................187 DR Group Properties........................................................................................188 Log Disk.................................................................................................................188 Copy Sets.....................................................................................................................189 Managed Sets...............................................................................................................189 Failover........................................................................................................................189 Continuous Access EVA Management Software.........................................................190 Table of Contents
7
Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA ................190 Setting up the Storage Hardware................................................................................190 Cluster Configuration..................................................................................................192 Management Server/SMI-S and DR Groups Configuration.......................................192 Defining Management Server and SMI-S Information ..............................................193 Creating the Management Server List...................................................................193 Creating the Management Server Mapping File....................................................195 Setting a Default Management Server...................................................................195 Displaying the List of Management Servers..........................................................195 Adding or Updating Management Server Information.........................................196 Deleting a Management Server..............................................................................197 Defining EVA Storage Cells and DR Groups..............................................................197 Creating the Storage Map File................................................................................199 Copying the Storage Map File................................................................................199 Displaying Information about Storage Devices.....................................................199 Verifying the EVA Configuration................................................................................200 Configuring Volume Groups.......................................................................................200 Identifying Special Device File Name for Vdisk in DR Group using Secure Path V3.0D or V3.0E.......................................................................................................200 Identifying Special Device Files using Secure Path v3.0F......................................202 Identifying Special Device Files for PVLinks Configuration.................................203 Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F................................................................................................................205 Configuring Volume Groups using PVLinks.........................................................206 Importing Volume Groups on Nodes at the Same Site..........................................207 Importing Volume Groups on Nodes at the Remote Site......................................208 Building a Metrocluster Solution with Continuous Access EVA.....................................209 Configuring Packages for Automatic Disaster Recovery............................................209 Maintaining a Cluster that Uses Metrocluster Continuous Access EVA....................213 Continuous Access EVA Link Suspend and Resume Modes................................213 Normal Maintenance..............................................................................................214 Failback..................................................................................................................214 Cluster Re-Configuration.......................................................................................214 Completing and Running a Continental Cluster Solution with Continuous Access EVA...................................................................................................................................215 Setting up a Primary Package on the Primary Cluster................................................215 Setting up a Recovery Package on the Recovery Cluster............................................218 Setting up the Continental Cluster Configuration......................................................221 Switching to the Recovery Cluster in Case of Disaster...............................................222 Failover to Recovery Site.............................................................................................223 Failover Scenarios........................................................................................................223 Scenario 1...............................................................................................................223 Failback to the Primary Site...................................................................................223 Scenario 2...............................................................................................................224 8
Table of Contents
Failback to the Primary Site...................................................................................224 Scenario 3...............................................................................................................224 Failback in Scenario 3.............................................................................................224 Reconfiguring Recovery Group Site Identities in Continentalclusters after a Recovery.................................................................................................................224 5 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF...................227
Files for Integrating Serviceguard with EMC SRDF.........................................................227 Overview of EMC and SRDF Concepts............................................................................229 Preparing the Cluster for Data Replication......................................................................229 Installing the Necessary Software...............................................................................230 Building the Symmetrix CLI Database........................................................................230 Determining Symmetrix Device Names on Each Node..............................................231 Building a Metrocluster Solution with EMC SRDF..........................................................236 Setting up 1 by 1 Configurations.................................................................................236 Creating Symmetrix Device Groups......................................................................237 Configuring Gatekeeper Devices...........................................................................238 Verifying the EMC Symmetrix Configuration.......................................................239 Creating and Exporting Volume Groups...............................................................239 Importing Volume Groups on Other Nodes..........................................................240 Configuring PV Links............................................................................................240 Grouping the Symmetrix Devices at Each Data Center..............................................240 Setting up M by N Configurations..............................................................................242 Creating Symmetrix Device Groups...........................................................................242 Configuring Gatekeeper Devices................................................................................244 Creating the Consistency Groups................................................................................244 Creating Volume Groups.............................................................................................245 Creating VxVM Disk Groups using Metrocluster with EMC SRDF...........................247 Validating VxVM Disk Groups using Metrocluster with EMC SRDF........................248 Additional Examples of M by N Configurations........................................................249 Configuring Serviceguard Packages for Automatic Disaster Recovery......................250 Maintaining a Cluster that uses Metrocluster with EMC SRDF.................................254 Managing Business Continuity Volumes....................................................................255 Protecting against Rolling Disasters......................................................................255 Using the BCV in Resynchronization.....................................................................255 R1/R2 Swapping..........................................................................................................257 R1/R2 Swapping using Metrocluster SRDF...........................................................257 R1/R2 Swapping using Manual Procedures..........................................................257 Some Further Points..........................................................................................................258 Metrocluster with SRDF/Asynchronous Data Replication...............................................261 Overview of SRDF/Asynchronous Concepts..............................................................261 Requirements for using SRDF/Asynchronous in a Metrocluster Environment.........262 Hardware Requirements........................................................................................263 Table of Contents
9
Software Requirements..........................................................................................263 Preparing the Cluster for SRDF/Asynchronous Data Replication..............................263 Metrocluster SRDF Topology using SRDF/Asynchronous....................................263 Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous........................264 Building a Device Group for SRDF/Asynchronous....................................................264 Package Configuration using SRDF/Synchronous or SRDF/Asynchronous..............266 First-time installation of Metrocluster with EMC SRDF using SRDF/Synchronous................................................................................................266 Pre-existing Installations of Metrocluster SRDF using SRDF/Synchronous..........267 Migration of Existing Applications from SRDF/Synchronous to SRDF/Asynchronous..............................................................................................267 Package Failover using SRDF/Asynchronous.............................................................267 Protecting against a Rolling Disaster..........................................................................267 Limitations and Restrictions........................................................................................267 Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication....268 Overview of SRDF/Asynchronous MSC Concepts.....................................................268 Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Multi-Session Consistency (MSC) Data Replication..........................................................................270 Building a Composite Group for SRDF/Asynchronous MSC................................271 Configuring a Package using SRDF/Asynchronous MSC...........................................273 Initial installation of Metrocluster with EMC SRDF using SRDF/Synchronous....273 Metrocluster with EMC SRDF is already installed................................................273 Setting up the RDF Daemon........................................................................................273 Starting and Stopping the Daemon........................................................................274 Building a Continental Cluster Solution with EMC SRDF...............................................274 Setting up a Primary Package on the Primary Cluster................................................274 Setting up a Recovery Package on the Recovery Cluster............................................277 Setting up the Continental Cluster Configuration......................................................279 Switching to the Recovery Cluster in Case of Disaster...............................................280 Failback Scenarios........................................................................................................281 Scenario 1...............................................................................................................281 Scenario 2...............................................................................................................283 Maintaining the EMC SRDF Data Replication Environment......................................285 Normal Startup......................................................................................................285 Normal Maintenance..............................................................................................285 6 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture...........................287
Overview of Three Data Center Concepts........................................................................287 Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP..........................................................................................................................289 Overview of HP XP StorageWorks Three Data Center Architecture..........................291 XP 3DC Multi-Target Bi-Link Configuration.........................................................291 Three Data Center Multi-Hop Bi-Link Configuration...........................................293 10
Table of Contents
HP StorageWorks Mirror Unit Descriptors............................................................294 Configuring an XP Three Data Center Solution..........................................................296 Creating the Serviceguard Clusters.......................................................................297 Creating the Continental Cluster...........................................................................297 HP StorageWorks RAID Manager Configuration.......................................................297 Creating the RAID Manager Configuration..........................................................297 Multi-Target Raid Manager Configuration............................................................301 Sample Raid Manager Configuration on a DC1 NodeA (multi-target bi-link)...............................................................................................................301 Sample Raid Manager Configuration on a DC2 NodeB (multi-target bi-link)...............................................................................................................302 Sample Raid Manager Configuration on a DC3 NodeC (multi-target bi-link)...............................................................................................................302 Multi-Hop Raid Manager Configuration...............................................................303 Sample Raid Manager Configuration on a DC1 NodeA (multi-hop-bi-link)....303 Sample Raid Manager Configuration on a DC2 NodeB (multi-hop-bi-link)....304 Sample Raid Manager Configuration on a DC3 NodeC (multi-hop-bi-link)....304 Alternative to HORCM_DEV.................................................................................305 11. Restart the Raid Manager instance so that the new information in the configuration file is read........................................................................................306 12. Repeat steps 2 through 11 on each host runs this particular application package...................................................................................................................306 Creating Device Group Pairs.................................................................................306 Identification of HP-UX device files.................................................................307 LVM Volume Groups Configuration.....................................................................308 VxVM Configuration.............................................................................................309 Package Configuration in a Three Data Center Environment....................................310 Timing Considerations.....................................................................................................314 Bandwidth for Continuous Access and Application Recovery Time...............................315 Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover...316 Swap Takeover Failure (for Continuous Access Sync Pair)........................................316 Takeover Timeout (for third data center)....................................................................317 Continuous Access-Journal Device Group PVOL-PAIR with SVOL-PSUS(SSWS) State.............................................................................................................................317 Failback Scenarios.............................................................................................................318 Failback from Data Center 3 (DC3).............................................................................318 MULTI-HOP-BI-LINK (DC1 > DC2 > DC3) Data Recovery from DC3 to DC1.....318 MULTI-TARGET-BI-LINK (DC2 > DC1 > DC3) Data Recovery from DC3 to DC1.........................................................................................................................321 Additional Reading...........................................................................................................322 7 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture..................323
Overview of Site Aware Disaster Tolerant Architecture..................................................323 Table of Contents
11
Components of SADTA...............................................................................................324 Site...............................................................................................................................324 Oracle Clusterware Sub-cluster...................................................................................325 Cluster File System Sub-cluster...................................................................................326 Complex Workload Packages......................................................................................326 Site Controller Package................................................................................................327 Site Safety Latch...........................................................................................................330 Overview of SADTA Configuration.................................................................................331 SADTA and Oracle Database 10gR2 RAC........................................................................332 Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture......................................................................................................................334 Summary of Required Procedures..............................................................................335 Checklist for Configuring SADTA.........................................................................335 Sample Configuration.................................................................................................337 Configuring SADTA....................................................................................................340 Setting up Replication.................................................................................................341 Configuring Metrocluster............................................................................................341 Creating a Serviceguard Cluster with Sites Configured........................................341 Configuring the Cluster File System Multi Node Package (SMNP)......................343 Installing and Configuring Oracle Cluster Ready Service (CRS)................................343 Configuring the Network.......................................................................................344 Configuring the Storage Device for Installing Oracle CRS....................................345 Setting Up CRS OCR and VOTING Directories ....................................................345 Installing and Configuring Oracle CRS.................................................................346 Configuring SGeRAC Toolkit Packages for the site CRS Sub-cluster....................347 Installing and Configuring Oracle Real Application Clusters (RAC).........................348 Creating the RAC Database ........................................................................................348 Setting up CFS File Systems for RAC Database Data Files ...................................349 Setting up CFS File Systems for RAC Database Flash Recovery...........................350 Creating the RAC Database using the Oracle Database Configuration Assistant.................................................................................................................352 Configuring and Testing RAC MNP Stack at the Local Disk Site ........................352 Halting the RAC Database on the Local Disk Site.................................................352 Creating Identical RAC Database at the Remote Site .................................................352 Configuring the Replica RAC Database.................................................................353 Configuring the RAC MNP Stack at the Target Disk Site .....................................354 Halting the RAC Database on the Target Disk Site................................................354 Configuring the Site Controller Package.....................................................................355 Configuring the Site Safety Latch Dependencies........................................................356 Starting the Disaster Tolerant RAC Database in the Metrocluster..............................358 Configuring Client Access for Oracle Database 10gR2 RAC......................................359 Configuring SGeRAC Cluster Interconnect Subnet Monitoring.................................360 Configuring and Administration Restrictions............................................................361 Understanding Site Failover in a Site Aware Disaster Tolerant Architecture..................361 12
Table of Contents
Node Failure................................................................................................................362 Site Failure...................................................................................................................362 Site Failover ................................................................................................................362 Site Controller Package Failure...................................................................................364 Network Partitions Across Sites..................................................................................364 Disk Array and SAN Failure.......................................................................................365 Replication Link Failure..............................................................................................365 Oracle Database 10gR2 RAC Failure ..........................................................................365 Oracle Database 10gR2 RAC Instance Failure.............................................................366 Oracle Database 10gR2 RAC Oracle Clusterware Daemon Failure............................366 Administering the Site Aware Disaster Tolerant Metrocluster Environment..................366 Maintaining a Node.....................................................................................................367 Online Addition and Deletion of Nodes.....................................................................367 Adding Nodes Online on a Primary Site where the RAC Database is Running....368 Adding Nodes Online on a Remote Site where the RAC Database is Down........368 Deleting Nodes Online on the Primary Site where the RAC Database Package Stack is Running.....................................................................................................369 Deleting Nodes Online on the Site where the RAC Database Package Stack is Down......................................................................................................................369 Maintaining the Site.....................................................................................................370 Maintaining the Metrocluster Environment File.........................................................370 Maintaining Site Controller Package...........................................................................370 Starting a Disaster Tolerant Oracle Database 10gR2 RAC..........................................371 Shutting Down a Disaster Tolerant Oracle Database 10gR2 RAC..............................372 Halting and Restarting the RAC Database MNP Packages........................................372 Maintaining Oracle Database 10gR2 RAC MNP packages on a Site..........................373 Maintaining Oracle Database 10gR2 RAC..................................................................373 Moving a Site Aware Disaster Tolerant Oracle RAC Database to a Remote Site........374 Limitations of a Site Aware Disaster Tolerant Architecture.............................................374 Troubleshooting................................................................................................................375 Logs and Files..............................................................................................................375 Cleaning the Site to Restart the Site Controller Package.............................................376 Identifying and Cleaning RAC MNP Stack Packages that are Halted........................377 Understanding Site Controller Package Logs.............................................................377
Table of Contents
13
A Environment File Variables for Serviceguard Integration with Continuous Access XP.......................383 B Environment File Variables for Metrocluster Continuous Access EVA.............................................401 C Environment File Variables for Metrocluster with EMC SRDF........................................................405 D Configuration File Parameters for Continentalclusters..................................................................411 E Continentalclusters Command and Daemon Reference................................................................417 F Metrocluster Command Reference for Preview Utility...................................................................423
Overview of Data Replication Storage Failover Preview.................................................423 Command Reference.........................................................................................................424 Sample Output of the cmdrprev Command...................................................................425 G Data Replication Rehearsal in a Sample Environment................................................................427
Setup Environment...........................................................................................................427 Device Group Configuration Changes........................................................................427 Rehearsal Package Configuration................................................................................427 Primary Package Metrocluster Environment File.......................................................428 Continentalclusters Configuration..............................................................................428 Rehearsing Failure for a Single Instance Application......................................................428 H Site Aware Disaster Tolerant Architecture Configuration Work Sheet............................................431
Metrocluster Site Configuration.......................................................................................431 Replication Configuration................................................................................................432 CRS Sub-cluster Configuration – using CFS.....................................................................433 RAC Database Configuration...........................................................................................434 Site Controller Package Configuration.............................................................................436 Glossary...................................................................................................................................437 Index........................................................................................................................................445
14
Table of Contents
List of Figures 1-1 1-2 1-3 1-4 1-5 1-6 1-7 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 3-1 3-2 3-3 3-4 3-5 4-1 4-2 4-3 4-4 5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9 5-10 5-11
Two Data Centers and Third Location with Arbitrators.............................................27 Failover Scenario with a Single Arbitrator..................................................................30 Failover Scenario with Two Arbitrators......................................................................32 Disaster Tolerant Checklist.........................................................................................33 Cluster Configuration Worksheet...............................................................................34 Package Configuration Worksheet..............................................................................35 Package Control Script Worksheet..............................................................................35 Sample Continentalclusters Configuration.................................................................38 Sample Mutual Recovery Configuration....................................................................39 Continental Cluster After Recovery............................................................................40 Multiple Recovery Pair Configuration in a Continental Cluster................................56 Sample Local Cluster Configuration ..........................................................................62 Sample Cluster Configuration with Recovery Packages............................................64 Sample Continentalclusters Recovery Groups...........................................................75 Sample Bi-directional Recovery Groups.....................................................................76 Continentalclusters Configuration Files.....................................................................88 Recovery Checklist......................................................................................................91 Oracle RAC Instances in a Continentalclusters Environment..................................116 Sample Oracle RAC Instances in a Continentalclusters Environment After Failover......................................................................................................................117 Continentalclusters Configuration Files in a Recovery Pair with RAC Support......125 XP Series Primary and Secondary Volume Definitions............................................135 XP Series Disk Array Side File..................................................................................137 Journal Based Replication.........................................................................................140 Disaster Tolerant Cluster...........................................................................................149 Q-Marker and Q-CNT...............................................................................................163 Configuration of Virtual Disks and DR groups........................................................191 EVA Configuration Checklist....................................................................................200 EVA Command View for the WWN Identifier.........................................................201 EVA Command View DR Group Properties.............................................................209 EMC R1 and R2 Definitions......................................................................................229 Sample syminq Output from a Node on the R1 Side...............................................231 Sample syminq Output from a Node on the R2 Side...............................................232 Parsing the Symmetrix Serial Number......................................................................232 Sample symrdf list Output from R1 Side.............................................................234 Sample symrdf list Output from R2 Side.............................................................235 Mapping HP-UX Device File Names to Symmetrix Units........................................237 2 X 2 Node and Data Center Configuration with Consistency Groups....................241 Devices and Symmetrix Units in M by N Configurations........................................242 2 by 1 Configuration..................................................................................................249 Bidirectional 2 by 2 Configuration............................................................................250 15
5-12 5-13 5-14 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 7-1 7-2 7-3
16
SRDF/Asynchronous Basic Functionality.................................................................262 Metrocluster Topology using SDRF/Asynchronous.................................................264 Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication.................................................................................................................270 Three Data Center Solution Overview......................................................................288 Three Data Center Architecture................................................................................291 XP Three Data Center Multi-Target Bi-Link Configuration Data Replication..........292 3DC Multi-Hop Bi-Link Configuration Data Replication.........................................293 Mirror Unit Descriptors.............................................................................................295 Mirror Unit Descriptor Usage...................................................................................296 Multi-Target Bi-Link (1:2)..........................................................................................301 Multi-Hop Bi-Link (1:1:1)..........................................................................................303 Complex Workload with Package Dependencies Configured..................................327 Package View.............................................................................................................332 Sample Configuration...............................................................................................338
List of Figures
List of Tables 1 2 1-1 1-2 1-3 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 3-1 4-1 4-2 5-1 5-2 5-3 7-1 7-2 7-3 7-4 A-1 A-2 A-3 A-4 A-5 A-6 A-7 F-1 H-1 H-2 H-3 H-4 H-5
Editions and Releases..................................................................................................19 Disaster Tolerant Solutions Document Road Map......................................................22 Supported System and Data Center Combinations....................................................28 Node Failure Scenarios with One Arbitrator..............................................................31 Node Failure Scenarios with Two Arbitrators............................................................32 Monitored States and Possible Causes........................................................................41 Impact of Maintenance Mode.....................................................................................45 Serviceguard and Continentalclusters Commands....................................................51 Data Replication and Continentalclusters..................................................................53 Continentalclusters Data Replication Package Structure...........................................60 Status of Continentalclusters Packages Before Recovery............................................89 Status of Continentalclusters Packages After Recovery.............................................97 Supported Continentalclusters and RAC Configuration..........................................118 Metrocluster/Continuous Access Template Files......................................................133 Metrocluster Continuous Access EVA Template Files..............................................185 Individual Management Server Information............................................................196 Metrocluster with EMC SRDF Template Files..........................................................228 Mapping for a 4 Node Cluster connected to 2 Symmetrix Arrays...........................235 RETRY and RETRYTIME Values...............................................................................276 Packages Monitored by the Site Controller Package.................................................328 CRS Sub-clusters configuration in the Metrocluster.................................................338 Sample database configuration.................................................................................339 Error Messages and their Resolution........................................................................378 AUTO_FENCEDATA_SPLIT....................................................................................395 AUTO_NONCURDATA...........................................................................................396 AUTO_PSUEPSUS....................................................................................................396 AUTO_PSUSSSSWS..................................................................................................397 AUTO_SVOLPFUS....................................................................................................397 AUTO_SVOLPSUE....................................................................................................397 AUTO_SVOLPSUS....................................................................................................398 Command Exit Value and its Description.................................................................424 Site Configuration.....................................................................................................431 Replication Configuration.........................................................................................432 Configuring a CRS Sub-cluster using CFS................................................................433 RAC Database Configuration....................................................................................434 Site Controller Package Configuration......................................................................436
17
18
Printing History Table 1 Editions and Releases Printing Date
Part Number
Edition
Operating System Releases (see Note below)
December 2006
B7660-90019
Edition 1
HP-UX 11i v1 and 11i v2
September 2007
B7660-90021
Edition 2
HP-UX 11i v1, 11i v2 and 11i v3
December 2007
B7660-90023
Edition 3
HP-UX 11i v1, 11i v2 and 11i v3
September 2008
B7660–90025
Edition 4
HP-UX 11i v1, 11i v2 and 11i v3
The printing date and part number indicate the current edition. The printing date changes when a new edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The part number changes when extensive technical changes are incorporated. New editions of this manual will incorporate all material updated since the previous edition. NOTE: This document describes a group of separate software products that are released independently of one another. Not all products described in this document are necessarily supported on all the same operating system releases. Consult your product’s Release Notes for information about supported platforms. HP Printing Division: ESS Software Division Hewlett-Packard Co. 19111 Pruneridge Ave. Cupertino, CA 95014
19
20
Preface The following two guides describe disaster tolerant clusters solutions using Serviceguard, Metrocluster Continuous Access XP, Metrocluster Continuous Access EVA, Metrocluster EMC SRDF, and Continentalclusters: • •
Understanding and Designing Serviceguard Disaster Tolerant Architectures Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters
The Understanding and Designing Serviceguard Disaster Tolerant Architectures guide provides an overview of Hewlett-Packard Disaster Tolerant high availability cluster technologies and how to configure an extended distance cluster using Serviceguard. It is assumed you are already familiar with Serviceguard high availability concepts and configurations. The contents are as follows: •
Chapter 1, Disaster Tolerance and Recovery in a Serviceguard Cluster, is an overview of disaster tolerant cluster configurations. • Chapter 2, Building an Extended Distance Cluster Using Serviceguard, shows the creation of disaster tolerant cluster solutions using extended distance Serviceguard cluster configurations. The Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters guide provides detailed, task-oriented documentation on how to configure, manage, and set up disaster tolerant clusters using Metrocluster Continuous Access XP, Metrocluster Continuous Access EVA, Metrocluster EMC SRDF, Continentalclusters, and Three Data Center Architecture. The contents are as follows: • • •
•
•
Chapter 1, Designing a Metropolitan Cluster, shows the creation of disaster tolerant cluster solutions using the metropolitan cluster products. Chapter 2, Designing a Continental Cluster, shows the creation of disaster tolerant solutions using the Continentalclusters product. Chapter 3, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP, shows how to integrate physical data replication via Continuous Access XP with metropolitan and continental clusters. Chapter 4, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA, shows how to integrate physical data replication via Continuous Access EVA with metropolitan and continental clusters. Chapter 5, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF, shows how to integrate physical data replication via EMC Symmetrix disk arrays. Also, it shows the configuration of a special continental cluster that uses more than two disk arrays. 21
•
•
Chapter 6, Designing a Disaster Tolerant Solution Using the Three Data Center Architecture, shows how to integrate synchronous replication (for data consistency) and Continuous Access journaling (for long-distance replication) using Serviceguard, Metrocluster Continuous Access XP, Continentalclusters and HP StorageWorks XP 3DC Data Replication Architecture. Chapter 7, Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture, describes how to configure a site aware disaster tolerant architecture in a Metrocluster with Oracle Database 10gR2 RAC.
A set of appendixes and a glossary provide additional reference information. Table 2 outlines the types of disaster tolerant solutions and their related documentation.
Guide to Disaster Tolerant Solutions Documentation Use the following table as a guide for locating specific Disaster Tolerant Solutions documentation: Table 2 Disaster Tolerant Solutions Document Road Map To Set up
Read
Extended Understanding and Designing Serviceguard Disaster Tolerant Architectures Distance Cluster • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster for • Chapter 2: Building an Extended Distance Cluster Using Serviceguard Serviceguard/Serviceguard Extension for RAC Metrocluster with Continuous Access XP
Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 3: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Metrocluster with Continuous Access EVA
Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 4: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Metrocluster Understanding and Designing Serviceguard Disaster Tolerant Architectures with EMC SRDF • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 5: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
22
Table 2 Disaster Tolerant Solutions Document Road Map (continued) To Set up
Read
Continental Cluster
Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster
Continental Cluster using Continuous Access XP data replication
Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster
Continental Cluster using Continuous Access EVA data replication
Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster • Chapter 3: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster • Chapter 4: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Continental Understanding and Designing Serviceguard Disaster Tolerant Architectures Cluster using • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster EMC SRDF data Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters replication • Chapter 2: Designing a Continental Cluster • Chapter 5: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Continental Cluster using other data replication
Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster
Three Data Center Architecture
Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 2: Designing a Continental Cluster • Chapter 6: Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
Online versions of these document and other HA documentation are available at: http://www.docs.hp.com -> High Availability.
Guide to Disaster Tolerant Solutions Documentation
23
Related Publications The following documents contain additional useful information: • • • • • •
Clusters for High Availability: a Primer of HP Solutions, Second Edition. Hewlett-Packard Professional Books: Prentice Hall PTR, 2001 (ISBN 0-13-089355-2) Managing Serviceguard Fourteenth Edition Understanding and Designing Serviceguard Disaster Tolerant Architectures (B7660-90020) Using Serviceguard Extension for RAC (T1859-90038) Using High Availability Monitors (B5736-90025) Using the Event Monitoring Service (B7612-90015)
When using VxVM storage with Serviceguard, refer to the following: • VERITAS Volume Manager Administrator’s Guide. This contains a glossary of VERITAS terminology. • VERITAS Volume Manager Storage Administrator Administrator’s Guide • VERITAS Volume Manager Reference Guide • VERITAS Volume Manager Migration Guide • VERITAS Volume Manager for HP-UX Release Notes Use the following URL to access HP’s High Availability web page: • http://www.hp.com/go/ha Use the following URL for access to a wide variety of HP-UX documentation: • http://docs.hp.com/hpux
Problem Reporting If you have problems with HP software or hardware products, please contact your HP support representative.
24
1 Designing a Metropolitan Cluster This chapter describes the configuration and management of a basic metropolitan cluster through the following topics: • • • • • • •
Designing a Disaster Tolerant Architecture for use with Metrocluster Products Single Data Center Two Data Centers and Third Location with Arbitrator(s) Package Configuration Worksheet Disaster Tolerant Checklist Cluster Configuration Worksheet Next Steps
In addition, this chapter outlines the general characteristics of the metropolitan cluster solutions that are provided with use of the following products: • • •
Metrocluster with Continuous Access XP Metrocluster with Continuous Access EVA Metrocluster with EMC SRDF
A separate chapter details the configuration process for each storage solution. For additional information, refer to the Release Notes for your metropolitan cluster product and the documentation for your storage solution.
Designing a Disaster Tolerant Architecture for use with Metrocluster Products Metrocluster is designed for use in a metropolitan cluster or metropolitan cluster environment within the 100 km distance limit. All nodes must be members of a single Serviceguard cluster. Two configurations are supported: • •
A single data center without arbitrators (not disaster tolerant.) Two data centers and a third location architecture with one or two arbitrator systems or a quorum server system. See Figure 1-1 (page 27).
Specifically for disaster tolerance, Serviceguard clusters or data centers can also be configured on different subnets. Such configurations provide improved scalability as operators can configure more number of nodes with more IP addresses. Following are the guidelines that must be followed to configure a Serviceguard cluster across network subnets: • All the nodes in the cluster must belong to the same domain. • The latency period in the heartbeat network that is configured across subnets must be less than 200 milliseconds. • A minimum of two heartbeat subnets must be configured for all cluster nodes.
Designing a Disaster Tolerant Architecture for use with Metrocluster Products
25
• • •
Each heartbeat subnet on a node must be routed using a different physical route to the other heartbeat subnet on the other node. Redundant physical networks need to be cabled separately between sites to maintain high availability. Each subnet that is used by a package must be configured with a standby interface in the local bridged network.
For more information on configuring cross subnet clusters, see the Managing Serviceguard manual available at http://www.docs.hp.com. Following are the disaster tolerant architecture requirements: •
•
•
In the disaster tolerant cluster architecture, it is expected that each data center is self-contained such that the loss of one data center does not cause the entire cluster to fail. It is important that all single points of failure (SPOF) be eliminated so that surviving systems continue to run in the event that one or more systems fail. It is also expected that the networks between the data centers are redundant and routed in such a way that the loss of any one data center does not cause the network between surviving data centers to fail. Exclusive volume group activation must be used for all Volume Groups (VG) associated with packages that use the disk arrays in the Metrocluster environment. The design of the Metrocluster Continuous Access script assumes that only one system in the cluster will have a VG activated at any time.
Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323).
Single Data Center A single data center architecture is supported, but it is not a true disaster tolerant architecture. If the entire data center fails, there will be no automated failover. This architecture is only valid for protecting data through data replication, and for protecting against multiple node failures.
Two Data Centers and Third Location with Arbitrator(s) This is the recommended and supported disaster tolerant architecture for use with Metropolitan cluster. This architecture consists of two main data centers with an equal number of nodes and a third location with one or more arbitrator nodes or a quorum server node. Figure 1-1.
26
Designing a Metropolitan Cluster
Figure 1-1 Two Data Centers and Third Location with Arbitrators replicated data for package A
PVOL PVOL SVOL SVOL Power Circuit 1
Power Circuit 2
replicated data for package C replicated data for package D
Local XP Disk Array A pkg A PVlinks C1
Remote XP Power node 1a Circuit Disk Array Continuous Access link 3 PVlinks pkg C C network A1 Highly Available
node 1
node 2 pkg B
SVOL SVOL PVOL PVOL
replicated data for package B
D1 PVlinks
network
Network network
B1
network
B
PVlinks
D
Continuous Access link
Data Center A
node 2a pkg D
Power Circuit 4
Data Center B network
network
arbitrator 1
arbitrator 2
or Quorum Server
Third Location
Power Circuit 5 Power Circuit 6
A disk array can be the main disk array for one set of packages and the remote disk array for another. In Figure 1-1, the XP disk array in data center A is the main or primary disk array for packages A and B, and the remote or secondary disk array for packages C and D in data center B. For packages A and B, data is written to PVOLs on the array in Data Center A and replicated to SVOLs on the array in Data Center B. Likewise the XP disk array in Data Center B is the primary or main disk array for packages C and D, and the secondary or remote for packages A and B. For packages C and D, data is written to PVOLs on the disk array in Data Center B and replicated to SVOLs in Data Center A. Arbitrators provide functionality like that of the cluster lock disk, and act as tie-breakers for a cluster quorum in case all of the nodes in one data center go down at the same time. Cluster lock devices are not supported because cluster locks cannot be maintained across the replication link, such as Continuous Access or SRDF. Arbitrators are fully functioning systems that are members of the cluster, and are not usually physically connected to the disk arrays. A Quorum Server is an alternative Designing a Disaster Tolerant Architecture for use with Metrocluster Products
27
form of cluster arbitration that uses a server program to determine cluster membership rather than a cluster lock disk or a Serviceguard Arbitration Node. Table 1-1 lists the allowable number of nodes at each main data center and the third location, up to a 16-node maximum cluster size. Table 1-1 Supported System and Data Center Combinations Data Center A
Data Center B
Data Center C
Serviceguard Version
1
1
1 Arbitrator Node
A.11.13 or later
1
1
Quorum Server System A.11.13 or later
2
1
2 Arbitrator Nodes
A.11.13 or later
1
2
2 Arbitrator Nodes
A.11.13 or later
2
2
1 Arbitrator Node
A.11.13 or later
2
2
2* Arbitrator Nodes
A. 11.13 or later
2
2
Quorum Server System A. 11.13 or later
3
3
1 Arbitrator Node
A. 11.13 or later
3
3
2* Arbitrator Nodes
A. 11.13 or later
3
3
Quorum Server System A.11.13 or later
4
4
1 Arbitrator Node
A.11.13 or later
4
4
2* Arbitrator Nodes
A.11.13 or later
4
4
Quorum Server System A.11.13 or later
5
5
1 Arbitrator Node
A.11.13 or later
5
5
2* Arbitrator Nodes
A.11.13 or later
5
5
Quorum Server System A.11.13 or later
6
6
1 Arbitrator Node
A.11.13 or later
6
6
2* Arbitrator Nodes
A.11.13 or later
6
6
Quorum Server System A.11.13 or later
7
7
1 Arbitrator Node
A.11.13 or later
7
7
2* Arbitrator Nodes
A.11.13 or later
7
7
Quorum Server System A.11.13 or later
8
8
Quorum Server System A.11.13 or later
* Configurations with two arbitrators are preferred because they provide a greater degree of availability, especially in cases when a node is down due to a failure or 28
Designing a Metropolitan Cluster
planned maintenance. It is highly recommended that two arbitrators be configured in Data Center C to allow for planned downtime in Data Centers A and B. The following is a list of recommended arbitration methods for Metrocluster solutions in order of preference: • • • •
2 arbitrator nodes, where supported 1 arbitrator node, where supported Quorum Server with APA Quorum Server
For more information on Quorum Server, refer to the Serviceguard Quorum Server Release Notes for HP-UX. NOTE: In the metropolitan environment, the same number of systems must be present in each of the two data centers (Data Center A and Data Center B) whose systems are connected to the XP disk arrays. There must be either one or two arbitrators or a Quorum Server in a third location. Arbitrator Node Configuration Rules Although you can use one arbitrator, having two arbitrators provides greater flexibility in taking systems down for planned outages as well as providing better protection against multiple points of failure. Using two arbitrators: • • •
Provides local failover capability to applications running on the arbitrator. Protects against multiple points of failure (MPOF). Provides for planned downtime on a single system anywhere in the cluster.
If you use a single arbitrator system, special procedures must be followed during planned downtime to remain protected. Systems must be taken down in pairs, one from each of the data centers, so that the Serviceguard quorum is maintained after a node failure. If the arbitrator itself must be taken down, disaster recovery capability is at risk if one of the other systems fails. Arbitrator systems can be used to perform important and useful work such as: • • • •
Hosting mission-critical applications not protected by disaster recovery software Running monitoring and management tools such as IT/Operations or Network Node Manager Running backup applications such as Omniback Acting as application servers
Disk Array Data Replication Configuration Rules Each disk array must be configured with redundant links for data replication. To prevent a single point of failure (SPOF), there must be at least two physical boards in each disk array for the data replication links. Each board usually has multiple ports. Designing a Disaster Tolerant Architecture for use with Metrocluster Products
29
However, a redundant data replication link must be connected to a port on a different physical board from the board that has the primary data replication link. For Continuous Access XP, when using bi-directional configurations, where data center A backs up data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which to allow failback. Calculating a Cluster Quorum When a cluster initially forms, all systems must be available to form the cluster (100% Quorum requirement). A quorum is dynamic and is recomputed after each system failure. For instance, if you start out with an 8-node cluster and two systems fail, that leaves 6 out 8 surviving nodes, or a 75% quorum. The cluster size is reset to 6 nodes. If two more nodes fail, leaving 4 out of 6, quorum is 67%. Each time a cluster forms, there must be more than 50% quorum to reform the cluster. With Serviceguard a cluster lock disk or Quorum Server is used as the tie-breaker when quorum is exactly 50%. However, with a Metrocluster configuration, a Quorum Server is supported and a cluster lock disk is not supported. Therefore, a quorum of 50% will require access to a Quorum Server, otherwise all nodes will halt. Example Failover Scenarios with One Arbitrator Taking a node off-line for planned maintenance is treated the same as a node failure in these scenarios. Study these scenarios to make sure you do not put your cluster at risk during planned maintenance. Figure 1-2 Failover Scenario with a Single Arbitrator node 1 pkg A node 2 pkg B
node 3 A
C
C1
A1
D1
Continuous Access links
B
Data Center A
B1 D
arbitrator 1
pkg C node 4 pkg D
Data Center B
Third Location The scenarios in Table 1-2, based on Figure 1-2, illustrate possible results if one or more nodes fail in a configuration with a single arbitrator. 30
Designing a Metropolitan Cluster
Table 1-2 Node Failure Scenarios with One Arbitrator Failure
Quorum
Result
arbitrator 1
4 of 5 (80%)
no change
node 1
4 of 5 (80%)
pkg A switches
node 1, then node 2
3 of 4 (75%)
pkg A and B switch
node 1, 2, then arbitrator 1
2 of 3 (67%)
pkg A and B switch
nodes 1, 2, arbitrator 1, then node 3
1 of 2 (50%)
cluster halts*
arbitrator 1, then node 1
3 of 4 (75%)
pkg A switches
data center A (nodes 1 and 2)
3 of 5 (60%)
pkg A and B switch to data center B
data center A, then arbitrator 1
2 of 3 (67%)
pkg A and B switch, then no change
data center A and arbitrator 1
2 of 5 (40%)
cluster halts*
data center A, then arbitrator 1, then node 3
1 of 2 (50%)
cluster halts*
arbitrator 1, then data center A
2 of 4 (50%)
cluster halts*
node 3, then data center A
2 of 4 (50%)
cluster halts*
data center B
3 of 5 (60%)
pkg C and D switch to data center A
third location
4 of 5 (80%)
no change
* Cluster can be manually started with the remaining node. With a single arbitrator node, the cluster is at risk each time a node fails or comes down for planned maintenance. Example Failover Scenarios with Two Arbitrators Having two arbitrator nodes adds extra protection during node failures and allows you to do planned maintenance on arbitrator nodes without losing the cluster should a disaster occur.
Designing a Disaster Tolerant Architecture for use with Metrocluster Products
31
Figure 1-3 Failover Scenario with Two Arbitrators node 1 pkg A node 2 pkg B
node 3 A
C
C1
A1
Continuous Access links
D1 B
B1 D
Data Center Aarbitrator 1
pkg C node 4 pkg D
Data Center B
arbitrator 2
Third Location The scenarios in Table 1-3 illustrate possible results if a data center or one or more nodes fail in a configuration with two arbitrators. Note that 3 of the 4 scenarios that caused a cluster halt with a single arbitrator, do not cause a cluster halt with two arbitrators. Table 1-3 Node Failure Scenarios with Two Arbitrators Failure
Quorum
Result
data center A (nodes 1 and 2)
4 of 6 (67%)
pkg A and B switch to data center B
data center A, then arbitrator 1
3 of 4 (75%)
pkg A and B switch, then no change
data center A and arbitrator 1
3 of 6 (50%)
cluster halts*
data center A, then arbitrator 1, then node 3
2 of 3 (67%)
pkg A, B, and C switch
arbitrator 1, then data center A
3 of 5 (60%)
pkg A and B switch to data center B
node 3, then data center A
3 of 5 (60%)
pkg A and B switch to data center B
data center B
4 of 6 (67%)
pkg C and D switch to data center A
third location
4 of 6 (67%)
no change
* Cluster can be manually started with the remaining node.
32
Designing a Metropolitan Cluster
Worksheets Disaster Tolerant Checklist Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for a two main data centers and a third location configuration. Figure 1-4 Disaster Tolerant Checklist Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails. Arbitrator nodes or Quorum Server nodes are located in a separate location from either of the primary data centers (A or B). The elements in each data center including nodes, disks, network components, and climate control are on separate power circuits. Each node is configured with PV links. Each disk array is configured with redundant replication links, each of which is connected to a different LCP or RCP card or Controller. At least two networks are configured to function as the cluster heartbeat. All redundant cabling for network, heartbeat, and replication links are routed using different physical paths.
Cluster Configuration Worksheet Use this cluster configuration worksheet either in place of, or in addition to the worksheet provided in the Managing Serviceguard user’s guide. If you have already completed a Serviceguard cluster configuration worksheet, you only need to complete the first part of this worksheet.
Worksheets
33
Figure 1-5 Cluster Configuration Worksheet Name and Nodes Cluster Name: Data Center A Name and Location: Node Names: Data Center B Name and Location: Node Names: Arbitrator/Quorum Server Third Location Name & Location: Arbitrator Node/Quorum Server Names: Maximum Configured Packages:
Subnets Heartbeat IP Addresses: Non-Heartbeat IP Addresses: Timing Parameters Heartbeat Interval: Node Time-out: Network Polling Interval: AutoStart Delay:
Package Configuration Worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the Managing Serviceguard user’s guide. If you have already completed an Serviceguard package configuration worksheet, you only need to complete the first part of this worksheet.
34
Designing a Metropolitan Cluster
Figure 1-6 Package Configuration Worksheet Package Configuration File Data Package Name: Primary Node: First Failover Node: Second Failover Node: Third Failover Node: Fourth Failover Node: Package Run Script: Package Halt Script: Maximum Configured Packages:
Data Center: Data Center: Data Center: Data Center: Data Center: Timeout: Timeout:
XP Series Volume Configuration Device Group
Device Name
Port #
Target ID
LUN
EMC SRDF Series Volume Configuration Device Group
Device Name
Port #
Figure 1-7 Package Control Script Worksheet Package Control Script Data VG[0]:
LV[0]:
FS[0]:
FS_MOUNT_OPT[0]:
VG[1]:
LV[1]:
FS[1]:
FS_MOUNT_OPT[1]:
VG[2]:
LV[2]:
FS[2]:
FS_MOUNT_OPT[2]:
VXVM_DG[0]:
LV[0]:
FS[0]:
FS_MOUNT_OPT[0]:
VXVM_DG[1]:
LV[1]:
FS[1]:
FS_MOUNT_OPT[1]:
VXVM_DG[2]:
LV[2]:
FS[2]:
FS_MOUNT_OPT[2]:
IP[0]:
SUBNET[0]:
IP[1]:
SUBNET[1]:
X.25 Resource Name: Service Name: Run Command: Service Fail Fast Enabled?: Service Name: Run Command:
Retries: Service Halt Timeout: Retries:
Service Fail Fast Enabled?:
Service Halt Timeout:
RetryTime: RetryTime:
Worksheets
35
Next Steps To implement the metropolitan cluster design, use the procedures in the following sections below: • • •
36
“Completing and Running a Continental Cluster Solution with Continuous Access XP” (page 171) “Building a Continental Cluster Solution with EMC SRDF” (page 274) “Building a Metrocluster Solution with Continuous Access EVA” (page 209)
Designing a Metropolitan Cluster
2 Designing a Continental Cluster Unlike metropolitan and campus clusters, which have a single-cluster architecture, a continental cluster uses multiple Serviceguard clusters to provide application recovery over local or wide area network (LAN and WAN). Using the Continentalclusters product, two independently functioning clusters are set up in such a way that in the event of a disaster, one cluster can take over the critical operations formerly carried out by the other cluster Disaster tolerance is obtained by eliminating the cluster itself as a single point of failure. This chapter describes the configuration and management of a basic continental cluster through the following topics: • • • • • • • • • •
Understanding Continental Cluster Concepts Designing a Disaster Tolerant Architecture for use with Continentalclusters Preparing the Clusters Building the Continentalclusters Configuration Testing the Continental Cluster Switching to the Recovery Packages in Case of Disaster Restoring Disaster Tolerance Performing a Rehearsal Operation in your Environment Maintaining a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment
Refer to Appendix D and Appendix E for additional information on the Continentalclusters command set and on configuration file parameters. For details of the cascading failover using HP StorageWorks or EMC Symmetrix disk arrays contact your HP representative. NOTE: This chapter briefly addresses data replication, highly available WANs, and site security and communication. Chapters 3, 4 and 5 give details on physical data replication using the HP StorageWorks Disk Array XP Series with Continuous Access XP, HP StorageWorks Disk Array EVA Series with Continuous Access EVA and the EMC Symmetrix with the SRDF facility.
Understanding Continental Cluster Concepts The Continentalclusters product provides the ability to monitor a high availability cluster and fail over mission critical applications to another cluster if the monitored cluster should become unavailable. In the following example, the Los Angeles cluster runs the mission critical application and replicates data to the New York cluster, which has another copy of the mission critical application ready to run in case of failover. In addition, Continentalclusters supports mutual recovery, which allows for different Understanding Continental Cluster Concepts
37
critical applications to be run on each cluster, with each cluster configured to recover the mission critical applications of the other. Because clusters may be separated over wide geographical distances, and because they have independent function, the operation of clusters in a Continentalclusters configuration is somewhat different from that of typical Serviceguard clusters. A typical Continentalclusters recovery pair environment is shown in Figure 2-1. Figure 2-1 Sample Continentalclusters Configuration New York Cluster NYnode1 Highly Available Network
Los Angeles Cluster
monitor
NYnode2 Disk Array
WAN Data Replication Links
LAnode1 salespkg
LAnode2 Disk Array
custpkg
ESCON/WAN converter
Two packages are running on the cluster in Los Angeles, and their data is replicated to the cluster in New York. Physical data replication is carried out using ESCON (Enterprise Storage Connect) links between the disk array hardware in New York and Los Angeles via an ESCON/WAN converter at each end. The New York cluster is running a monitor that checks the status of the Los Angeles cluster. In this example, the Los Angeles cluster runs just like any Serviceguard cluster, with applications configured in packages that may fail from node to node as necessary. The New York cluster is configured with a recovery version of the packages that are running on the Los Angeles cluster. These packages do not run under normal circumstances, but are set to start up when they are needed. In addition, either cluster may run other packages that are not involved in Continentalclusters operation.
Mutual Recovery Configuration Bi-directional failover is supported in what is called a “mutual recovery configuration.” This allows recovery groups to be defined for primary packages running in both 38
Designing a Continental Cluster
component clusters of a recovery pair in the Continentalclusters configuration. Figure 2-2 shows a mutual recovery configuration. Figure 2-2 Sample Mutual Recovery Configuration New York Cluster NYnode1 Highly Available Network
Los Angeles Cluster
monitor
NYnode2 Disk Array
salespkg
WAN Data Replication Links
LAnode1 monitor
LAnode2 Disk Array
custpkg
ESCON/WAN converter
In the above figure, the salespkg is running on the New York cluster and can be recovered by the Los Angeles cluster. Similarly, the custpkg running on the Los Angeles cluster can be recovered by the New York cluster. As stated previously, physical data replication is carried out using ESCON (Enterprise Storage Connect) links between the disk array hardware in New York and Los Angeles via an ESCON/WAN converter at each end. Each cluster is running a monitor that checks the status of the alternate cluster. As depicted in the above example, each cluster runs just like any Serviceguard cluster, with applications configured in packages that may fail from node to node as necessary. Each cluster is configured with a recovery version of the packages that are running on the alternate cluster. These packages do not run under normal circumstances, but are set to start up when they are needed. In addition, either cluster may run other packages that are not involved in Continentalclusters operation.
Application Recovery in a Continental Cluster If a given cluster in a recovery pair of a continental cluster should become unavailable, Continentalclusters allows an administrator to issue a single command, cmrecovercl (described later) to transfer mission critical applications from that cluster to another cluster, making sure that the packages do not run on both clusters at the same time. Understanding Continental Cluster Concepts
39
Transfer is not automatic, although it is automated through a recovery command, which a root user must issue. The result after issuing the recovery command is shown in Figure 2-3. Figure 2-3 Continental Cluster After Recovery New York Cluster NYnode2
NYnode1 Highly Available Network
Los Angeles Cluster
salespkg
Disk Array
custpkg_bak
WAN Data Replication Links
LAnode1
LAnode2 Disk Array
The movement of an application from one cluster to another cluster does not replace local failover activity; packages are normally configured to fail over from node to node as they would on any high availability cluster. Cluster recovery, failover of packages to a different cluster, occurs only after the following events: • Continentalclusters detects the problem • Continentalclusters sends a notification of the problem • Verify that the monitored cluster has failed • Issue the cluster recovery command
Monitoring over a Network A monitor package running on one cluster tracks the health of another cluster in the recovery pair and sends notification to configured destinations if the state of the monitored cluster changes. (If a cluster contains any packages to be recovered it must be monitored.) The monitor software polls the monitored cluster at a specific MONITOR_INTERVAL defined in an ASCII configuration file, which also indicates when and where to send messages if there is a state change.
40
Designing a Continental Cluster
The physical separation between clusters will require communication by way of a Local or Wide Area Network (LAN or WAN). Since the polling takes place across the network, interruptions of network service cannot always be differentiated from cluster failure states. This means that if the network is unreliable, the monitoring facility will often detect and report an unreachable state for the monitored cluster that is actually an interruption of the network service. Because the monitoring is indeterminate in some instances, information from independent sources must be gathered to determine the need for proceeding with the recovery process. For these reasons, cluster recovery is not automatic, but must be initiated by a root user. Once initiated, however, the cluster recovery is automated to reduce the chance of human error that might occur if manual steps were needed. In Continentalclusters, a system of cluster events and notifications is provided so that events can be easily tracked, and users will know when to seek additional information before initiating recovery.
Cluster Events A cluster event is a change of state in a monitored cluster. The four cluster states reported by the monitor are Unreachable, Down, Up, and Error. Table 2-1 summarizes possible causes for the cluster events with regard to both the monitored cluster and the network. However, in many cases the causes of cluster events are indeterminate without additional information that is not available to the software. Table 2-1 Monitored States and Possible Causes Cluster Event (Old state -> Cluster-related Causes New state)
Network-related Causes
Up -> Unreachable
Cluster went down; no nodes are responding to network inquiries
Network failure
Down -> Unreachable
Cluster was down and nodes are no longer responding
Network failure
Error -> Unreachable
Error resolved but cluster down and nodes not responding
Network failure
Up -> Down
Cluster has been halted, but at least one node is still responding to network inquiries
No network problems
Error -> Down
Error resolved, cluster is down
Network problem was fixed, cluster is down
Unreachable -> Down
Cluster nodes were rebooted but the cluster was not started
Network came up but the cluster was not running
Up -> Error
Serviceguard version or security file mismatch, software error
Network is misconfigured, or DNS server crashed or set up incorrectly
Understanding Continental Cluster Concepts
41
Table 2-1 Monitored States and Possible Causes (continued) Cluster Event (Old state -> Cluster-related Causes New state)
Network-related Causes
Down -> Error
Serviceguard version or security file mismatch, software error
Network is misconfigured, or DNS server crashed or set up incorrectly
Unreachable -> Error
Serviceguard version or security file mismatch, software error
Network problem was fixed, but the error condition still exists
Down -> Up
Cluster started
No network problems
Unreachable -> Up
Cluster nodes were rebooted and the cluster started
Network came up and the cluster was already running
Error -> Up
Error resolved, cluster is up
Network problem was fixed, cluster is up
NOTE: There is only one condition under which cmclsentryd will determine that the cluster has Error status: all nodes are unreachable except those which have Serviceguard Error status. (If any nodes are Down or Up, then the cluster status will take one of those values, rather than Error.)
Interpreting the Significance of Cluster Events Because some cluster events (for example, Up -> Unreachable) can be caused by changes in either a cluster state or a network state, additional independent information is required to achieve the primary objective of determining whether you need to recover a cluster’s applications. Sources of independent information include: • • • •
Contact with the network provider Contact with the administrator of the monitored cluster Contact with local cluster administrator Contact with company executives
When problematic cluster events persist, obtain as much information as possible, including authorization to recover, if your business practices require this, and then issue the Continentalclusters recovery command, cmrecovercl.
How Notifications Work A central part of the operation of Continentalclusters is the transmission of notifications following the detection of a cluster event. Notifications occur at specifically coded times, and at two different levels: • •
42
Alert — when a cluster event should be considered noteworthy. Alarm — when an event shows evidence of a cluster failure.
Designing a Continental Cluster
Notifications are typically sent as: • • • •
Email messages SNMP traps Text log files OPC messages to OpenView IT/Operations
In addition, notifications are sent to the eventlog file located in the /var/opt/ resmon/log/cc directory on the system where monitoring is taking place. NOTE: An email message can be sent to an address supplied by a pager service that will forward the message to a specified pager system. (Contact your pager service provider for more information.)
Alerts Alerts are intended as informational. Some typical uses of alerts include: • • • •
Notification that a cluster has been halted for a significant amount of time. Notification that a cluster has come up after being down or unreachable. Notification that a cluster came down for any reason. Notification that a cluster has been in an unreachable state for a short period of time. An alert is sent in this case as a warning that an alarm might be issued later if the cluster’s state remains unreachable for a longer time.
The expected process in dealing with alerts is to continue watching for additional notifications and to contact individuals at the site of the monitored cluster to see whether problems exist.
Alarms Alarms are intended to indicate that a cluster failure might have taken place. The most common example of an alarm is the following: •
Notification that a specified cluster has been in an unreachable state for a significant amount of time.
The expected process in dealing with cluster events that persist at the alarm level is to obtain as much information as possible, including authorization to recover, if your business practices require this. At which point, issue the Continentalclusters recovery command, cmrecovercl.
Understanding Continental Cluster Concepts
43
Creating Notifications for Failure Events For events that indicate potential cluster failure, display the escalation of concern of the cluster health by defining alerts followed by one or more alarms. The following is a typical sequence: • • •
cluster alert at 5 minutes cluster alert at 10 minutes cluster alarm at 15 minutes
This could be accomplished by entering two CLUSTER_ALERT lines in the configuration file, and one CLUSTER_ALARM line. A detailed example is provided in the comments in the ASCII configuration file template, shown in “Editing Section 3—Monitoring Definitions” (page 79).
Creating Notifications for Events that Indicate a Return of Service For those events that indicate that the cluster is back online or that communication with the monitor has been restored, use cluster alerts to show the de-escalation of concern. In this case, use a CLUSTER_ALERT line in the configuration file with a time of zero (0), so that notifications are sent as soon as the return to service is detected.
Maintenance Mode for Recovery Groups A recovery group in a maintenance mode allows the recovery group to be exempted from a recovery. This implies that the recovery package cannot be started in a recovery cluster. By default, all recovery groups in the Continentalclusters configuration are not in the maintenance mode. To move a recovery group in continentalclusters into the maintenance mode, you must disable it. To move a recovery group out of the maintenance mode, you must enable it. You can complete rehearsal operations on a recovery group only when the recovery group is in the maintenance mode. For more information on rehearsal operations, see “Performing a Rehearsal Operation in your Environment” (page 106). Use the cmrecovercl -d -g command to move a recovery group into the maintenance mode. To move the recovery group out of the maintenance mode, use the cmrecovercl -e -g command. Maintenance mode for recovery groups is an optional feature. You must explicitly configure Continentalclusters to use this feature. Consider the following guidelines when you move a recovery group into the maintenance mode: • •
44
Configure a shared disk with a file system in all primary clusters and in the recovery cluster. This shared disk is local to the cluster and need not be replicated. Configure the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalcluster configuration file with an absolute path to a file system. Create
Designing a Continental Cluster
•
this path in all nodes and reserve it for Continentalclusters. This file system is used to hold the current maintenance mode setting for recovery groups. Configure the monitor package control script to mount the file system in the shared disk for the path specified with the CONTINENTAL_CLUSTER_STATE_DIR parameter.
When a recovery group is in the maintenance mode, start up of a recovery package with cmrecovercl, cmrunpkg or cmmodpkg commands is prevented by Continentalclusters for that recovery group. When a recovery group is in the maintenance mode there is no impact on the availability of the primary packages. The primary package continues to be up and is highly available within the primary cluster (i.e., local failover allowed). Clients can continue to connect to the primary package and access its production data on the primary cluster. There is no dependency on data replication to move a recovery group into maintenance mode. Array based replication can be suspended or can be in progress. Similarly, logical replication can either be suspended (receiver package is down) or can be resumed (receiver package is up). Table 2-2 describes the impact on recovery when a recovery group is in the maintenance mode. Table 2-2 Impact of Maintenance Mode Default Mode
Maintenance Mode
Recovery package startup using cmrecovercl, Recovery package startup using cmrecovercl, cmrunpkg, cmmodpkg or cmforceconcl cmrunpkg, cmmodpkg or cmforceconcl commands commands is allowed. is not allowed. Cross-checking is done between primary and recovery packages to ensure both packages are not up at the same time.
No impact to primary packages. The primary package continues to run irrespective of the mode.
The primary package is allowed to start only if the recovery package is down. Similarly, a recovery package is allowed to start only if the primary package is down.
Moving a Recovery Group into Maintenance Mode Run the following command to disable a recovery group and move it into the maintenance mode: cmrecovercl -d [-f] -g
Where: is the name of the recovery group to be disabled.
Understanding Continental Cluster Concepts
45
Run this command only on recovery cluster nodes. This command succeeds only when Continentalclusters is configured for maintenance mode. The command checks for the following conditions to successfully disable a recovery group: • •
The recovery package is down and package switching is disabled. The primary cluster and the primary package are up. If the cluster is down or unreachable, use the force -f option to forcefully disable the recovery group. WARNING! When you use the force option, ensure that the primary package and the cluster are not down due to a primary site failure.
•
The monitor package is up and running in the recovery cluster.
Moving a Recovery Group out of the Maintenance Mode Run the following command to enable a recovery group and move it out of the maintenance mode: cmrecovercl -e -g Where: is the name of the recovery group to be enabled and moved out of the maintenance mode. You can run this command only on recovery cluster nodes. The command succeeds only when Continentalclusters is configured for maintenance mode. Following are the conditions that need to be met for the recovery group to be enabled and moved out of the maintenance mode: • •
For recovery groups configured with a rehearsal package, the rehearsal package is halted and package switching is disabled. The monitor package is up and running in the recovery cluster.
Performing Cluster Recovery When a CLUSTER_ALARM is issued, there may be a need for a cluster recovery using the recovery command, cmrecovercl, which is enabled for use by the root user. Cluster recovery is carried out at the site of the recovery cluster by using thecmrecovercl command. The cmrecovercl command will only recover recovery groups that are enabled for recovery and are not in the maintenance mode. # cmrecovercl Issuing this command will halt any configured data replication activity from the failed cluster to the recovery cluster, and will start all configured recovery packages on the recovery cluster that are pre-configured in recovery groups. A recovery group is the basic unit of recovery used in a continental cluster configuration. This command will fail if a cluster alarm has not been issued.
46
Designing a Continental Cluster
If option “-g RecoveryGroup” is specified with the recovery command, then the recovery process of halting data replication activity and starting of the recovery package will only be done for the specified recovery group. After the cmrecovercl command is issued, there is a delay of at least 90 seconds (per recovery group) while the command ensures that the package is not active on another cluster. Cluster recovery is done as a last resort, after all other approaches to restore the unavailable cluster have been exhausted. It is important to remember that cluster recovery sets in motion a process that cannot be easily reversed. Unlike the failover of a package from one node to another, failing a package from one cluster to another normally involves a significant quantity of data that is being accessed from a new set of disks. Returning control to the original cluster will involve resynchronizing this data and resetting the roles of the clusters in a process that is easier for some data replication techniques than others. NOTE: After a recovery, it is not possible to reverse directions and return a package to its original cluster without first reconfiguring the data replication hardware and/or software and synchronizing data. Therefore, be very cautious when deciding to use the cmrecovercl command. It is for this reason, HP recommends that stringent procedures and processes are in place to aid in making the decision to complete a recovery process.
Performing Recovery Group Rehearsal in Continentalclusters During a recovery in Continentalclusters, a configuration inconsistency at the recovery cluster can result in an unsuccessful recovery attempt. Rehearsing the recovery procedure provides you a method to proactively identify and fix these configuration inconsistencies so that there are no issues during an actual recovery. Continentalclusters provides the environment and a set of required tools to complete a Disaster Recovery (DR) Rehearsal. Continentalclusters allows recovery groups to be configured with a special rehearsal package for DR rehearsal. You must configure the rehearsal package in the recovery cluster and specify it as part of the recovery group definition. The rehearsal package is identical to the recovery package and can be effectively used in place of the recovery package to verify the environment. This rehearsal package bundles the same application and uses the same storage devices as the recovery package. During a DR rehearsal process, Continentalclusters will start the rehearsal package and validate the recovery cluster environment. However, to stop clients from using the recovery package application instance, it is necessary that the client access network IP address be different for the rehearsal package. For more information on configuring a rehearsal package, see “Configuring Recovery Groups with Rehearsal Packages” (page 64). Understanding Continental Cluster Concepts
47
Also, you must configure Continentalclusters to enable the maintenance mode feature for recovery groups. For more information on the maintenance mode, see “Maintenance Mode for Recovery Groups” (page 44). Disaster Recovery Rehearsal for a recovery group is done in the following phases: •
Rehearsal Preparation Phase In this phase, you must prepare your environment for rehearsal. To prepare a recovery group for rehearsal, complete the following steps: 1. 2. 3. 4. 5.
•
Enter the cmrecovercl -d command to move the recovery group into the maintenance mode by disabling it. Suspend replication from the primary cluster. Prepare a business copy (BC/BCV) of the storage on the recovery cluster. Make the storage system read-write on the recovery cluster. Import the volume manager entities such as LVM or SLVM volume groups or the VxVM or CVM disk groups in the recovery cluster.
Rehearsal Run Phase Start the DR rehearsal for the recovery group using the Continentalclusters command cmrecovercl -r. This command starts the rehearsal package for the recovery group and validates the Continentalclusters configuration and environment for the recovery group. Once rehearsal is started, use the regular Serviceguard commands to manage the rehearsal package. You can run any test load on the rehearsal package to validate the recovery of the application.
•
Rehearsal Restoration Phase Once rehearsal process is complete, you must restore recovery for the recovery group. You must halt and disable the rehearsal package in the cluster and synchronize the recovery cluster storage with the primary site storage with the latest application data. Also, you must resume replication from the primary cluster. Finally, you need to move the recovery group out of the maintenance mode by enabling it using the cmrecovercl -e command. WARNING! Ensure that the storage system of the recovery group is synchronized with the latest data and the replication environment is restored before the recovery group is moved out of the maintenance mode. Failure to do so can result in the recovery package using production data that was invalidated by the rehearsal run during a subsequent recovery.
For information on running a rehearsal process in your environment, see Appendix G.
48
Designing a Continental Cluster
Notes on Packages in a Continental Cluster Packages have somewhat different behavior in a continental cluster than in a normal Serviceguard environment. There are specific differences in • •
Startup and Switching Characteristics Network Attributes
From Serviceguard A.11.17 and above, you can configure the following package types in a recovery group: • •
Failover Oracle RAC Multi-node packages
In the case of a multi-node package, a recovery process recovers all instances of the package in a recovery cluster. NOTE: System multi-node packages cannot be configured in Continentalclusters recovery groups. Multi-node packages are supported only for Oracle with CFS or CVM environments. Startup and Switching Characteristics Normally, an application (package) can run on only one node at a time in a cluster. However, in a continental cluster, there are two clusters in which an application—the primary package or the recovery package—could operate on the same data. Both the primary and the recovery package must not be allowed to run at the same time. To prevent this, it is important to ensure that packages are not allowed to start automatically and are not started at inappropriate times. To keep packages from starting up automatically, when a cluster starts, set the AUTO_RUN (PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) parameter for all primary and recovery packages to NO. Then use the cmmodpkg command with the -e option to start up only the primary packages and enable
Understanding Continental Cluster Concepts
49
switching. The cmrecovercl command, when run, will start up the recovery packages and enable switching during the cluster recovery operation. CAUTION: After initial testing is complete, the cmrunpkg and cmmodpkg commands or the equivalent options in SAM should never be used to start a recovery package unless cluster recovery has already taken place. To prevent packages from being started at the wrong time and in the wrong place, use the following strategies: •
•
•
Set the AUTO_RUN (PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) parameter for all primary and recovery packages to NO. Ensure that recovery package names are well known, and that personnel understand they should never be started with a cmrunpkg or cmmodpkg command unless the cmrecovercl command has been invoked first. If a cluster has no packages to run before recovery, then do not allow packages to be run on that cluster with Serviceguard Manager.
Network Attributes Another important difference between the packages configured in a continental cluster and the packages configured in a standard Serviceguard cluster is that the same or different subnets can be used for primary cluster and recovery cluster configurations. In addition, the same or different relocatable IP addresses can be used for the primary package and its corresponding recovery package. The client application must be designed properly to connect to the appropriate IP address following a recovery operation. For recovery groups with a rehearsal package configured, ensure that the rehearsal package IP address is different from the recovery package IP address.
How Serviceguard commands work in a Continentalclusters Continentalclusters packages are manipulated manually by the user via Serviceguard commands and by cmcld automatically in the same way as any other packages. In a continental cluster the recovery package are not allowed to run at the same time as the primary, data sender, or data receiver packages. To enforce this, several Serviceguard commands behave in a slightly different manner when used in a continental cluster. Table 2-3 describes the Serviceguard commands whose behavior is different in a continental cluster environment. Specifically, when one of the commands listed in Table 2-3 attempts to start or enable switching of a package, it first checks the status of the other packages in the recovery group. Based on the status, the operation is either allowed or disallowed. The checking is done based on the stable clusters' environment and the proper functioning of the network communication. In the case when the network 50
Designing a Continental Cluster
communication between clusters can not be established or the cluster or package status can not be determined, it is must be checked manually to ensure that the operation to be performed on the target package will not have a conflict with other packages configured in the same recovery group. Table 2-3 Serviceguard and Continentalclusters Commands Commands
How the commands How the commands work in Continentalclusters work in Serviceguard
cmrunpkg
runs a package
Will not start a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not start recovery package if the recovery group is in maintenance mode. Will not start a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled. Will not start a rehearsal package when the recovery group is not in maintenance mode.
cmmodpkg -e
enable switching attribute for a highly available package
Will not enable switching on a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not enable switching for a recovery package if the recovery group is in maintenance mode. Will not enable a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled. Will not enable switching for a rehearsal package when the recovery group is not in maintenance mode.
cmhaltnode -f halts a node in a highly available cluster
Will not re-enable switching on a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not re-enable a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled.
cmhaltcl -f
This command will halt daemons on all currently running systems
Will not re-enable switching on a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not re-enable a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled.
Designing a Disaster Tolerant Architecture for use with Continentalclusters A recovery pair in a continental cluster consists of two Serviceguard clusters. One functions as a primary cluster and the other functions as recovery cluster for a specific application. Prior to Continentalclusters version A.05.00, one recovery pair can be configured in a continental cluster. Starting with Continentalclusters version A.05.00, a configuration of multiple recovery pairs is allowed. In the multiple recovery pair configuration, more than one primary cluster (where the primary packages are running) can be configured to share the same recovery cluster (where the recovery package is running). Designing a Disaster Tolerant Architecture for use with Continentalclusters
51
The key elements providing disaster tolerance in a continental cluster recovery pair are: • • • • •
Mutual Recovery Serviceguard clusters Data replication Highly available WAN networking Data center processes and procedures coordinated between the two cluster sites
There is significant amount of latitude in selecting these elements for a configuration. It is recommended the choices are recorded on worksheets which can be reviewed and updated periodically.
Mutual Recovery For mutual recovery, any cluster in a continental cluster recovery pair may contain both primary and recovery packages for any recovery group. Recovery groups may be defined, for example, such that cluster A and cluster B contain recovery packages. In this case, cmrecovercl could be run on cluster B to recover packages from cluster A, or on cluster A to recover packages from cluster B.
Serviceguard Clusters Each Serviceguard cluster in a continental cluster provides high availability for an application at the local level at that particular site. For optimal performance and to assure adequate capacity on the recovery cluster, it is best to have similar hardware on both clusters. For example, if one cluster contains two systems with 1Gb of memory each, it is not a good idea to have a low-end system with 128 Mb of memory in the other cluster. Each cluster may have as many nodes as are permitted in an ordinary Serviceguard cluster, and each may be running packages that are not configured to fail over between clusters. NOTE: Take note when cluster A takes over for cluster B, it must run cluster B’s packages as well as any packages that it was already running on its own, unless those packages are stopped intentionally.
Data Replication Data replication between the Serviceguard clusters in a Continentalclusters recovery pair extends the scope of high availability to the level of the continental cluster. Select a technology for data replication between the two clusters. There are many possible choices, including: • •
52
Logical replication of databases Logical replication of file systems
Designing a Continental Cluster
• •
Physical replication of data volumes via software Physical replication of disk units via hardware
Table 2-4 is a brief discussion of how a data replication method affects a continental cluster environment. A detailed description of data replication can be found in Chapter 1, in the section titled “Disaster Tolerance and Recovery in a Serviceguard Cluster.” Specific guidelines for configuring the HP StorageWorks Disk Array XP Series, HP StorageWorks Disk Array EVA Series and the EMC Symmetrix Disk Array for physical data replication in a continental cluster are provided in Chapters 3, 4 and 5. In order to use these data replication solutions in a Continentalclusters environment it is necessary to purchase either the Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products separately. White papers describing specific implementations are also available at www.docs.hp.com -> High Availability If a data replication technology is chosen that is not mentioned above, and if the integration is performed independently, then it is necessary to use the guidelines described in section, “Using the Recovery Command to Switch All Packages” (page 95). In that case, note the following: •
•
Continentalclusters product is only responsible for the following: Continentalclusters configuration and management commands, the monitoring of remote cluster status, and the notification of remote cluster events. Continentalclusters product provides a single recovery command to start all recovery packages that are configured in the Continentalclusters configuration file. These recovery packages are typical Serviceguard's packages. Continentalclusters recovery command does not do any checking on the status of the devices and data that are used by the application prior to starting the recovery package. The user is responsible for checking the state of the devices and the data before executing Continentalclusters recovery command.
Table 2-4 Data Replication and Continentalclusters Replication Type
How it Works
Continentalclusters Implication
Logical Database Replication
Transactions from the primary Requirements on CPU and I/O may limit application are applied from logs or prevent the Recovery Cluster from to a copy of the application running running additional applications. on the recovery site. (This is an example only; there are other methods.)
Logical Filesystem Replication
Writes to the filesystem on the CPU issues are the same as for Logical primary cluster and are duplicated Database Replication. The software may periodically on the recovery cluster. have to be managed as a separate Serviceguard package.
Designing a Disaster Tolerant Architecture for use with Continentalclusters
53
Table 2-4 Data Replication and Continentalclusters (continued) Replication Type
How it Works
Continentalclusters Implication
Physical Replication of Data Volumes via Software
Disk mirroring via LVM software. Mirroring is done on disk links (SCSI or FibreChannel).
Requirements on CPU are less than for logical replication, but there is still some CPU use. Distance limits may make this type of replication inappropriate for Continentalclusters.
Physical Replication of Replication of the LUNs across disk Disk Units via Hardware arrays through dedicated hardware links such as EMC SRDF or Continuous Access XP or Continuous Access EVA.
Limited CPU requirements, but the requirement of synchronous data replication slows replication, and may impair application performance. Increased network speed and bandwidth can remedy this.
Logical data replication may require the use of packages to handle software processes that copy data from one cluster to another or that apply transactions from logs that are copied from one cluster to another. Some methods of logical data replication may use a logical replication data sender package, and others may use a logical replication data receiver package while some may use both. Logical replication data sender and receiver packages are configured as part of the data recovery group, as shown in section, “Preparing the Clusters” (page 59). Physical Data Replication using Special Environment files For physical data replication Continentalclusters uses pre-integrated solutions, which uses Continuous Access XP, Continuous Access EVA and EMC SRDF. In order to use these data replication solutions in a Continentalclusters environment it is necessary to purchase either the Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products separately. Physical data replication generally does not require the use of separate sender or receiver packages, but it does require specialized logic in the package control scripts to handle the transfer of control from the storage units of one cluster to the storage units at the other cluster. The packages that use physical data replication with the HP StorageWorks Disk Array XP Series with Continuous Access XP should have created a specific environment file using template /opt/cmcluster/toolkit/SGCA/xpca.env For packages that are using physical data replication with HP StorageWorks Disk Array EVA with Continuous Access EVA should be created using /opt/cmcluster/ toolkit/SGCA/caeva.env, and for packages that are using physical data replication with EMC Symmetrix and the SRDF facility should be created using /opt/cmcluster/ toolkit/SGSRDF/srdf.env. These templates can be purchased separately with the products Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF. 54
Designing a Continental Cluster
Details on configuring the special Continentalclusters control scripts are in Chapters 3, 4 and 5. Some additional notes are provided below. Multiple Recovery Pairs in a Continental Cluster One or more than one recovery pair can be configured in a continental cluster. In the Continentalclusters configuration that contains more than one recovery pair, more than one primary cluster is configured to share a common recovery cluster. Similar to the one recovery pair per continental cluster configuration, mutual recovery can also be configured in a multiple recovery pair scenario, as shown in Figure 2-4. The common recovery cluster can choose any one of the primary clusters as its recovery cluster. Data replication needs to be setup to allow for copying data from each primary cluster to the common recovery cluster. Each recovery pair should have its own data replication link. Different storage areas need to be configured with the common recovery cluster to receive data replicated from each primary clusters. The common recovery cluster should have enough capacity to serve the recovery purpose for all of the primary clusters configured to partner with it in a recovery pair.
Designing a Disaster Tolerant Architecture for use with Continentalclusters
55
Figure 2-4 Multiple Recovery Pair Configuration in a Continental Cluster New York Cluster NYnode1 Highly Available Highly Available Network
Los Angeles Cluster
Disk Array
WAN
LAnode1 sales_bak
monitor
NYnode2
Disk Array
LAnode2 HRpkg cust_bak
Atlanta monitor
Atlanta node2
Atlanta node1
Los Angeles Cluster
Disk Array
salespkg custpkg
Data Replication Links between LA & Atlanta Data Replication Links between LA & NY
Highly Available Wide Area Networking Disaster tolerant networking for Continentalclusters is directly tied to the data replication method. In addition to the reliability of the redundant lines connecting the remote nodes, it is important to consider what bandwidth is needed to support the data replication method that has been chosen. A continental cluster that handles a high number of write transactions per minute will not only require a highly available network, but also one with a large amount of bandwidth. Details on highly available networking can be found in Chapter 1, in the section titled “Disaster Tolerant Architecture Guidelines.” White papers describing specific implementations are also available at: www.docs.hp.com -> High Availability -> Continentalcluster or Metrocluster -> White Papers
56
Designing a Continental Cluster
Data Center Processes Continentalclusters provides the cmrecovercl command that fails over all applications on the primary cluster in a recovery pair that are protected by Continentalclusters. However, application failover also requires well-defined processes for the two sites of a recovery pair. These processes and procedures should be written down and made available at both sites. Some considerations for site management are as follows: • • •
• • •
Who notifies whom for the various events: configuration changes, alerts, alarms? What communication methods should be used? Email? Phone? Beeper? Multiple methods? Who has the authority to perform what sort of configuration modifications? Can the administrator at one site log in to the nodes on the remote site? If so, what permissions would be set? How often is a practice failover done? Is there a documented test plan? What is the process for tracking changes made to the primary cluster?
Continentalclusters Worksheets Planning is an essential effort in creating a robust continental cluster environment. It is recommended to record the details of your configuration on planning worksheets. These worksheets can be filled in partially before configuration begins, and then completed as you build the continental cluster. All the participating Serviceguard clusters in one continental cluster should have a copy of these worksheets to help coordinate initial configuration and subsequent changes. Complete the worksheets in the following sections for each recovery pair of clusters that will be monitored by the Continentalclusters monitor. Data Center Worksheet The following worksheet will help you describe your specific data center configuration. Fill out the worksheet and keep it for future reference. ============================================================== Continental Cluster Name: _____________________________________ Continental Cluster State Dir: ________________________________ ============================================================== Primary Data Center Information:_________________________________ Primary Cluster Name: ________________________________________ Data Center Name and Location: _______________________________ Main Contact: _______________________________________________ Phone Number: ________________________________________________ Beeper: ______________________________________________________ Email Address: _______________________________________________ Node Names: __________________________________________________ Monitor Package Name: __ccmonpkg______________________________ Designing a Disaster Tolerant Architecture for use with Continentalclusters
57
Monitor Interval: _____________________________________________ Continental Cluster State Shared Disk: ________________________ ============================================================== Recovery Data Center Information: Recovery Cluster Name: ______________________________________ Data Center Name and Location: ______________________________ Main Contact: _______________________________________________ Phone Number: _______________________________________________ Beeper: _____________________________________________________ Email Address: ______________________________________________ Node Names: _________________________________________________ Monitor Package Name: __ccmonpkg_____________________________ Monitor Interval: ___________________________________________ Continental Cluster State Shared Disk: ________________________
Recovery Group Worksheet The following worksheet will help you organize and record your specific recovery groups. Fill out the worksheet and keep it for future reference. =============================================================== Continental Cluster Name: _____________________________________ ============================================================== Recovery Group Data: _________________________________________ Recovery Group Name: _________________________________________ Primary Cluster/Package Name:_________________________________ Data Sender Cluster/Package Name:_____________________________ Recovery Cluster/Package Name:________________________________ Rehearsal Cluster/Package Name: ______________________________ Data Receiver Cluster/Package Name:___________________________ Recovery Group Data:_________________________________________ Recovery Group Name: ________________________________________ Primary Cluster/Package Name:________________________________ Data Sender Cluster/Package Name:___________________________ Recovery Cluster/Package Name:_______________________________ Rehearsal Cluster/Package Name:______________________________ Data Receiver Cluster/Package Name:__________________________ Recovery Group Data: Recovery Group Name: ________________________________________ Primary Cluster/Package Name:________________________________ Data Sender Cluster/Package Name:____________________________ Recovery Cluster/Package Name:_______________________________ Rehearsal Cluster/Package Name:______________________________ Data Receiver Cluster/Package Name:____________________________
Cluster Event Worksheet The following worksheet will help you organize and record the cluster events you wish to track. Fill out a worksheet for each primary or recovery cluster that you wish to monitor. You must monitor each cluster containing a primary package which needs to be recovered. 58
Designing a Continental Cluster
Continental Cluster Name: _____________________________________ =============================================================== Cluster Event Information: Cluster Name ________________________________________________ Monitoring Cluster: __________________________________________ UNREACHABLE: Alert Interval:______________________________________________ Alarm Interval:______________________________________________ Notification:_________________________________________________ Notification:_________________________________________________ Notification:_________________________________________________ DOWN: Alert Interval:______________________________________________ Notification:________________________________________________ Notification:_______________________________________________ UP: Alert Interval:_____________________________________________ Notification:_______________________________________________ Notification:_______________________________________________ ERROR: Alert Interval:_____________________________________________ Notification:_______________________________________________ Notification:_______________________________________________
Preparing the Clusters The steps for configuring the clusters, needed by Continentalclusters, are as follows: • •
Set up and test data replication between the sites. Configure each cluster for Serviceguard operation.
Setting up and Testing Data Replication Depending on which data replication method you choose, it can take a week or more to set up and test a data replication method. If there is more than one recovery pair configured in a continental cluster, a separate data replication link is required to be set up for a different recovery pair (one for each pair). In the sample configuration, physical data replication is done through a hardware link between disk arrays. Because this method is hardware based, there is hardware set up and configuration that can take several days. Some logical replication methods, such as transaction processing monitors (TPMs), need application changes that are more easily done during the original application development. Make sure that the data replication to the recovery site is functional. This would include setting up the physical data replication links across the WAN and making sure that the data is replicated between the shared disk arrays.
Preparing the Clusters
59
NOTE: If using physical data replication on the HP StorageWorks Disk Array XP Series with Continuous Access XP or HP StorageWorks Disk Array EVA Series with Continuous Access EVA or on the EMC Symmetrix using EMC SRDF, use the special environment file templates that are installed along with either Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF software. Refer to Chapters 3, 4 and 5 for detailed instructions on configuring these special environment files. If the data replication software is separate from the application itself, then a separate Serviceguard package should be created for it. Some kinds of logical data replication require that a data receiver package be running on the recovery cluster at all times. If data sender and data receiver packages are used as your choice of data replication method, configure and apply them as described in the next sections before applying the continental cluster configuration. Table 2-5 shows the types of packages that are needed for each type of data replication. Table 2-5 Continentalclusters Data Replication Package Structure Primary Cluster
Recovery Cluster
Replication Type
Primary Application Package
Data Replication Sender Package
Recovery Application Package
Data Replication Receiver Package
XP Series Continuous Access
Yes
No
Yes
No
EVA Series Continuous Access
Yes
No
Yes
No
Symmetrix/EMC SRDF Yes
No
Yes
No
Oracle Standby Database
No
Yes
Yes
Yes
Configuring a Cluster without Recovery Packages Use the following steps and the instructions described in chapters 4 through 7 of Managing Serviceguard user’s guide as guidelines for creating a new cluster or preparing an existing cluster to run in a Continentalclusters environment: 1.
2.
60
If creating a new cluster, install required versions of HP-UX and Serviceguard. Also, if using an existing cluster, upgrade to the versions of HP-UX and Serviceguard that are required for Continentalclusters. See the Continentalclusters Release Notes for specifics on your versions requirements. Coordinate with the recovery site to make sure the same versions and patches are installed at both sites. Set up all cabling, being sure to provide redundant disk storage links and network connections.
Designing a Continental Cluster
3. 4.
5.
6.
Configure the disks and filesystems. Set up data replication (logical or physical). Configure the cluster according to the instructions in chapter 5 of the Managing Serviceguard user’s guide. Use the cmapplyconf command to apply the cluster configuration. Then test the cluster. Configure and test each primary package according to the instructions in chapters 6 and 7 of the Managing Serviceguard user’s guide. Use the cmapplyconf command to apply the package configuration. Be sure that AUTO_RUN is set to NO in the package ASCII configuration file for any package that is in a recovery group, and therefore might at some time be a candidate for recovery. This is to ensure that the package will not be automatically started if the primary site tries to come up again following a primary site disaster. If changing the setting of the AUTO_RUN parameter to NO in the ASCII configuration file for an existing package, then re-apply the configuration using the cmapplyconf command. NOTE: When package switching is disabled, a package does not automatically start at cluster startup time. Therefore, setting AUTO_RUN(PKG_SWITCHING_ENABLED) to NO means that primary packages in recovery groups must be started manually on the primary cluster. They also must be manually enabled for local switching, using the cmmodpkg -e command.
7.
8.
Test local failover of the packages. In our sample case, this would mean enabling package switching for salespkg (cmmodpkg -e salespkg) and then testing that salespkg fails over from LAnode1 to LAnode2. If using logical data replication, configure and test the data sender package if one is needed.
NOTE: If you are configuring Oracle RAC instances in Serviceguard packages in a CFS or CVM environment, do not specify the CVM_DISK_GROUPS, and CVM_ACTIVATION_CMD fields in the package control scripts as CVM disk group manipulation is addressed by the disk group multi-node package. The primary cluster is shown in Figure 2-5.
Preparing the Clusters
61
Figure 2-5 Sample Local Cluster Configuration
Los Angeles Cluster
LAnode1 salespkg
WAN LAnode2 custpkg
Highly Available Network
Configuring a Cluster with Recovery Packages Use the following steps and the instructions in chapters 4 through 7 of the Managing Serviceguard user’s guide as guidelines for creating a new Recovery Cluster or preparing an existing cluster to run in a Continentalclusters environment: 1.
Configure all hardware. Make sure the cluster hardware is able to handle the task of running any or all packages it supports in the Continentalclusters configuration: a. If this is a new cluster, make sure the hardware is similar to that of the other cluster. The recovery cluster must be built using servers of sufficient size and resources so that they can take over packages on recovery and also be able to run their own packages, if required. b. If this is an existing cluster, determine whether it is necessary to add disks for data replication. This is needed to ensure that there is enough capacity from system resources to run all packages if applications fail over to the other cluster. If not, either add nodes to the existing cluster, or move less critical packages to another cluster.
2.
For new clusters, install minimum required versions of HP-UX and Serviceguard. For existing clusters, perform a rolling upgrade to the minimum required versions of HP-UX and Serviceguard if necessary. Coordinate with the other site to make sure the same versions and patches are installed at both sites. This may include coordinating between HP support personnel if the sites have separate support contracts. Configure logical volumes, using the same names on both the clusters. If your cluster uses a physical data replication method and if data replication between the disk arrays at the different data centers has already taken place, vgimport and vgchange can be used to help configure the logical volumes on the Recovery Cluster.
3.
62
Designing a Continental Cluster
4.
5.
Use cmgetconf to capture the other cluster’s configuration. Then use cmquerycl on this cluster to generate a new ASCII file for the recovery configuration. Modify the node names, volume group names, resource names, and subnets as appropriate so that the two clusters will be consistent. See chapter 5 in the Managing Serviceguard user’s guide for details on cluster configuration. Set up the recovery package(s): a. Copy the package files from the other cluster in the recovery pair for all mission critical applications to be monitored by Continentalclusters. In the sample configuration this means copying the ASCII files salespkg.configand custpkg.config, and the control scripts salespkg.cntl and custpkg.cntl. (If preferred rename the package configuration files using a naming convention that identifies a package is a Continentalclusters monitored package. For example, if preferred, name the sample package salespkg_bak.config to indicate that it is the backup or recovery package.) b. Edit the package configuration files, replacing node names, subnets, and other elements as needed. For all recovery packages, be sure that AUTO_RUN (PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) is set to NO in the configuration file. This will ensure that the recovery packages will not start automatically when the recovery cluster forms, but only when the cmrecovercl command is issued. The following elements should be the same in the package configuration for both the primary and recovery packages: • Package services • Failfast settings c. Modify the package control script (salespkg_bak.cntl), checking for anything that may be different between clusters: • Volume groups (VGs) may be different. • IP addresses may be different. • Site-specific customer-defined routines (for example routines that send messages to a local administrator) may be different. • Control script files must be executable. NOTE: If you are using physical data replication on the HP StorageWorks Disk Array XP Series with Continuous Access XP or HP StorageWorks Disk Array EVA Series with Continuous Access EVA or on the EMC Symmetrix using EMC SRDF, use the special environment file templates that are provided by the separately purchased Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products.
6.
Apply the configuration using cmapplyconf and test the cluster. Preparing the Clusters
63
IMPORTANT: You must halt the primary package and the data sender packages before you attempt to run or test any recovery packages. 7.
8. 9.
Test local failover of the packages. In our sample case, this would mean enabling package switching for salespkg_bak (cmmodpkg -e salespkg_bak) and then testing that salespkg_bak fails over from NYnode1 to NYnode2. If you are using logical data replication, configure, apply, and test the data receiver package if one is needed. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater.
The New York cluster is shown in Figure 2-6. Figure 2-6 Sample Cluster Configuration with Recovery Packages New York Cluster
NYnode1
Highly Available Network
salespkg_bak Disk Array
NYnode2 custpkg_bak
WAN
Configuring Recovery Groups with Rehearsal Packages The rehearsal package is a regular Serviceguard package configured on the recovery cluster of the recovery group. You must configure the rehearsal package with the same volume group and file system mount points as that of the recovery package. The application setup and configuration used for the recovery package are also used for the rehearsal package. Similar to all other Continentalclusters packages, the AUTO_RUN parameter for the rehearsal package must be set to NO.
64
Designing a Continental Cluster
NOTE: When using physical replication, do not configure the Metrocluster environment files for the rehearsal package. The rehearsal package must have an IP address that is different from that of the recovery package. If the rehearsal package has the same IP address as the recovery package, clients may connect to the rehearsal instance mistaking it for the production instance.
Building the Continentalclusters Configuration If necessary, use the swinstall command to install the Continentalclusters product on all nodes in both clusters. Then create the Continentalclusters configuration using the following steps: • •
• • • • • •
Prepare the security files. Create the monitor package on each cluster containing a recovery package. Clusters not containing a recovery package may also monitor the other cluster in the recovery pair by creating a monitor package on that cluster. Edit the Continentalclusters configuration file on a node of your choice in any cluster. Check and apply the Continentalclusters configuration. Start each Continentalclusters monitor package on it’s cluster. Validate the configuration. Document the recovery procedure and distribute the documentation to both sites. Make sure all personnel are familiar with these procedures. Test recovery procedures.
Preparing Security Files Running a Continentalclusters command requires root access to cluster information on all the nodes of the participating Serviceguard clusters in the configuration. Before doing the Continentalclusters configuration, edit the /etc/cmcluster/ cmclnodelist file on each node of all the participating clusters to include entries that will allow access by all nodes in the Continentalclusters. Here is a sample entry in the /etc/cmcluster/cmclnodelist file for a continental cluster configured with two, two-node Serviceguard clusters: lanode1.myco.com lanode2.myco.com nynode1.myco.com nynode2.myco.com
root root root root
Also, be sure to create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running monitor packages and Continentalclusters commands to obtain information from other nodes about the health of each cluster. The file must
Building the Continentalclusters Configuration
65
contain entries that allow access to all nodes in the continental cluster by the nodes where monitors and Continentalclusters commands are running. Define the order of security checking by creating entries of the following types: order deny,allow
If deny is first, the deny list is checked first to see if the node is there, then the allow list is checked.
deny from
lists all the nodes that are denied access. Permissible entries are: All hosts are denied access. all
allow from
66
Designing a Continental Cluster
domain
Hosts whose names match, or end in, this string are denied access, for example, hp.com.
hostname
The named host (for example, kitcat.myco.com) is denied access.
IP address
Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet restriction is denied.
network/netmask
This pair of addresses allows more precise restriction of hosts, (for example, 10.163.121.23/225.225.0.0).
network/nnnCIDR
This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for Classless Interdomain Routing, a type of routing supported by the Border Gateway Protocol (BGP).
This lists all the nodes that are allowed access. Permissible entries are: All hosts are allowed access. all domain
Hosts whose names match, or end in, this string are allowed access, for example, hp.com.
hostname
The named host (for example, kitcat.myco.com) is allowed access.
IP address
Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet inclusion is allowed.
network/netmask
This pair of addresses allows more precise inclusion of hosts, (for example, 10.163.121.23/225.225.0.0).
network/nnnCIDR
This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for Classless Interdomain Routing, a type of routing supported by the Border Gateway Protocol (BGP).
The most typical entry is hostname. The following entries are from a typical /etc/ opt/cmom/cmomhosts file: order allow allow allow allow allow
allow,deny from lanode1.myco.com from lanode2.myco.com from nynode1.myco.com from nynode2.myco.com from 10.177.242.12
If the file is installed on all nodes in the continental cluster, these entries will allow Continentalclusters commands and monitors running on lanode1, lanode2, nynode1, nynode2 to obtain information about the clusters in the configuration. Network Security Configuration Requirements In a Continentalclusters configuration, if the clusters are behind firewalls in their respective sites, you must set appropriate firewall rules to enable inter-cluster communication. The monitoring daemon of Continentalclusters communicates with Serviceguard Cluster Object Manager on remote clusters. You can determine the ports used by Cluster Object Manager from the hacl-probe entry in the /etc/services file. In the firewall of all participating clusters, you must set the rule such that TCP and UDP protocol traffic on the hacl-probe ports are allowed from and to the IP addresses of all nodes in the Continentalclusters configuration. For more information on firewall and ports, see HP Serviceguard A.11.18 Release Notes available at http://www.docs.hp.com -> High Availability.
Creating the Monitor Package The Continentalclusters monitoring software is configured as a Serviceguard package so that it remains highly available. If more than one primary cluster is configured to share the same common recovery cluster, such as a multiple recovery pair scenario, Building the Continentalclusters Configuration
67
the monitor package running on the common recovery cluster performs the following: • •
monitors all of the primary clusters sends notifications for all of the monitored clusters events
The following steps should be carried out on the recovery cluster and can be repeated on the primary cluster if you want the primary cluster to monitor the recovery cluster: 1.
On the node where the configuration is located, create a directory for the monitor package. # mkdir /etc/cmcluster/ccmonpkg
2.
Copy the template files from the /opt/cmconcl/scripts directory to the /etc/ cmcluster/ccmonpkg directory. # cp /opt/cmconcl/scripts/ccmonpkg.* \ /etc/cmcluster/ccmonpkg • •
ccmonpkg.config is the ASCII package configuration file template for the Continentalclusters monitoring application. ccmonpkg.cntl is the control script file for the Continentalclusters monitoring application. NOTE: It is not recommended editing the ccmonpkg.cntlfile. However, if preferred, change the default SERVICE_RESTART value “-r 3” to a value that fits your environment.
3.
Edit the package configuration file (suggested name of /etc/cmcluster/ ccmonpkg/ccmonpkg.config) to match the cluster configuration: a. Add the names of all nodes in the cluster on which the monitor may run. b. AUTO_RUN(PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) should be set to YES so that the monitor package will fail over between local nodes. (Note, for all primary and recovery packages, AUTO_RUN is always set to NO.)
4.
Continentalclusters provides an optional feature for recovery groups to be in the maintenance mode. To enable this feature, configure the monitor package with a file system in a shared disk. For more information configuring this maintenance mode feature, see “Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters” (page 69). Use the cmcheckconf command to validate the package.
5.
# cmcheckconf -P ccmonpkg.config 6.
68
Copy the package configuration file ccmonpkg.config and control script ccmonpkg.cntl to the monitor package directory (default name /etc/
Designing a Continental Cluster
7.
cmcluster/ccmonpkg) on all the other nodes in the cluster. Make sure this file is executable. Use the cmapplyconf command to add the package to the Serviceguard configuration. # cmapplyconf -P ccmonpkg.config
The following sample package configuration file (comments have been left out) shows a typical package configuration for a Continentalclusters monitor package: PACKAGE_NAME ccmonpkgPACKAGE_TYPE FAILOVERFAILOVER_POLICY CONFIGURED_NODEFAILBACK_POLICY MANUALNODE_NAME LAnode1 NODE_NAME LAnode2AUTO_RUN YESLOCAL_LAN_FAILOVER_ALLOWED YESNODE_FAIL_FAST_ENABLED NORUN_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntl RUN_SCRIPT_TIMEOUT NO_TIMEOUTHALT_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntlHALT_SCRIPT_TIMEOUT NO_TIMEOUTSERVICE_NAME ccmonpkg.srvSERVICE_FAIL_FAST_ENABLED NO SERVICE_HALT_TIMEOUT 300
CAUTION: Do not run a monitor package until the steps for “Checking and Applying the Continentalclusters Configuration” (page 87) are completed.
Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters To configure the recovery group maintenance feature, you need to configure a file system on a shared disk in all the clusters configured in the Continentalclusters. The shared disk must have a minimum of 250MB disk space. Specify the file system path using the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters configuration file. Create this directory and reserve it for Continentalclusters on all nodes in the Continentalclusters. Configure the monitor package in the recovery clusters to mount the file system from the shared disk. Configuring Shared Disk for the Maintenance Feature Identify a shared disk connected to all nodes at the recovery cluster where the monitor package (ccmonpkg) will run. Create a volume group with one volume on the shared disk and complete the following procedure: 1.
Create the physical volume: pvcreate -f /dev/c0t10d0
2.
Create volume group directory under the device special file namespace: mkdir /dev/ccvg Building the Continentalclusters Configuration
69
3.
Create the group special file using the available major number: mknod /dev/ccvg/group c 64 0x060000
4.
Create the volume group: vgcreate /dev/ccvg /dev/c0t10d0
5.
Activate the volume group: vgchange -a y ccvg
6.
Create the logical volume: lvcreate -L 250M ccvg
Run the following command to create a file system on the volume: mkfs vxfs /dev/ccvg/lvol1 Complete the following procedure to export the volume group configuration and import the volume group on all the nodes at the recovery cluster: 1.
On the node where you created the volume, deactivate the volume group and export the VG configuration in preview mode to a file: vgchange -a n ccvg vgexport -m /tmp/ccvg.map -p ccvg
2.
Copy the file to all the nodes: rcp /tmp/ccvg.map node1:/tmp
3.
On each node, create the volume group directory and the group special file: mkdir /dev/ccvg mknod /dev/ccvg/group c 64 0x060000
4.
Import the volume group from the map file: vgimport -m /tmp/ccvg.map -v
Configuring a Monitor Package for the Maintenance Feature Configure the Continentalclusters monitor package using the template scripts available in the /opt/cmconcl/scripts/ directory: 1. 2.
Create the /etc/cmcluster/ccmonpkg directory on all nodes in the recovery cluster. On any node in the recovery cluster, copy the package configuration and control file template from the /opt/cmconcl/scripts directory to the /etc/ cmcluster directory: cp /opt/cmconcl/scripts/ccmonpkg.*
70
Designing a Continental Cluster
3.
In the ccmonpkg.cntl monitor package control file, specify the volume group for the VG parameter in the VOLUME GROUPS section: VG[0]="ccvg"
4.
In the ccmonpkg.cntl monitor package control file, specify a file system path and the logical volume name under the FILE SYSTEM section. The file system path should be the value configured for the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters configuration file. This path should be created and reserved on all nodes in the Continentalcluster. LV[0]=/dev/ccvg/lvol1; FS[0]=/opt/cmconcl/statedir; FS_MOUNT_OPT[0]="-o rw"; FS_UMOUNT_OPT[0]=""; FS_FSCK_OPT[0]=""; FS_TYPE[0]="vxfs"
5. 6.
Distribute the monitor package control file to all nodes in the recovery cluster. Apply the monitor package configuration.
Editing the Continentalclusters Configuration File First, on one cluster, generate an ASCII configuration template file using the cmqueryconcl command. The recommended name and location for this file is /etc/ cmcluster/cmconcl.config. (If preferred, choose a different name.) Example: # cd /etc/cmcluster # cmqueryconcl -C cmconcl.config This file has three editable sections: • Cluster information • Recovery groups • Monitoring definitions Customize each section according to your needs. The following are some guidelines for editing each section. Editing Section 1—Cluster Information Enter cluster-level information as follows in this section of the file: 1.
Enter a name for the continental cluster on the line that contains the CONTINENTAL_CLUSTER_NAME keyword. Choose any name, but it cannot be easily changed after the configuration is applied. To change the name, it is required to first delete the existing configuration as described in “Renaming a Continental Cluster” (page 115). Continentalclusters provides an optional maintenance feature for recovery groups. This feature is enabled by configuring an absolute path to a file system for the
Building the Continentalclusters Configuration
71
CONTINENTAL_CLUSTER_STATE_DIR parameter. If this feature is not required, this parameter can be omitted. 2.
3. 4.
Enter the name of the first cluster after the first CLUSTER_NAME keyword followed by the names of all the nodes within the first cluster. Use a separate NODE_NAME keyword and HP-UX host name for each node. Enter the domain name of the cluster’s nodes following the DOMAIN_NAME keyword. Optionally, enter the name of the monitor package on the first cluster after the MONITOR_PACKAGE_NAME keyword and the interval at which monitoring by this package will take place (minutes and/or seconds) following the MONITOR_INTERVAL keyword. The monitor interval defines how long it can take for Continentalclusters to detect that a cluster is in a certain state. The default interval is 60 seconds, but the optimal setting depends on your system’s performance. Setting this interval too low can result in the monitor’s falsely reporting an Unreachable or Error state. If this is observed during testing, use a larger value. It is suggested to use the name “ccmonpkg” for all Continentalclusters monitors. Create this package on each cluster containing a recovery package. If it is not desired to monitor a cluster, which does not containing a recovery package, it is required to delete or comment out the MONITOR_PACKAGE_NAME line and the MONITOR_INTERVAL line. For mutual recovery, create the monitor package on both the first and second clusters. NOTE: Monitoring of a cluster not containing recovery packages is optional. For example, set up monitoring of such a cluster to be able to check the status of the data replication technology being used.
5.
Repeat steps 2 through 4 for the other participating cluster or clusters.
NOTE: The monitor package is sensitive to system time and date. If you change the system time or date either backwards or forwards on the node where the monitor is running, notifications of alerts and alarms may be sent at incorrect times. A printout of Section 1 of the Continentalclusters ASCII configuration file follows. ################################################################ #### #### CONTINENTAL CLUSTER CONFIGURATION FILE #### #### #### #### This file contains Continentalclusters #### #### #### #### configuration data. #### #### #### #### The file is divided into three sections, #### #### #### #### as follows: #### #### #### #### 1. Cluster Information #### #### #### #### 2. Recovery Groups #### #### #### #### 3. Events, Alerts, Alarms, and #### #### #### #### Notifications #### #### 72
Designing a Continental Cluster
#### #### #### #### #### #### For complete details about how to set the #### #### #### #### parameters in this file, consult the #### #### #### #### cmqueryconcl(1m) manpage or your manual. #### #### ################################################################ #### #### Section 1. Cluster Information #### #### #### #### This section contains the name of the #### #### #### #### continental cluster,name of the state #### #### #### #### directory, followed by the names of member #### #### #### #### clusters and all their nodes.The #### #### #### #### continental cluster name can be any string #### #### #### #### you choose, up to 40 characters in length. #### #### #### #### The continentalclusters state directory #### #### #### #### must be string containing the directory #### #### #### #### location. The state directory must be #### #### #### #### always an absolute path. The state #### #### #### #### directory should be created on a shared #### #### #### #### disk in the recovery cluster. This #### #### #### #### parameter is optional, if maintenance mode #### #### #### #### feature recovery groups is not required. #### #### #### #### This parameter is mandatory, if maintenance #### #### #### #### mode feature for recovery groups is #### #### #### #### required. #### #### #### #### Each member cluster name must be the same #### #### #### #### as it appears in the MC/ServiceGuard cluster ######## #### #### configuration ASCII file for that cluster. #### #### #### #### In addition to the cluster name, include a #### #### #### #### domain name for the nodes in the cluster. #### #### #### #### Node Names must be the same as those that #### #### #### #### appear in the cluster configuration ASCII #### #### #### #### file. A minimum of two member cluster needs #### #### #### #### to be specified. You may configure one #### #### #### #### cluster to serve as recovery cluster for #### #### #### #### one or more other clusters. #### #### #### #### #### #### #### #### In the space below, enter the continental #### #### #### #### cluster name, then enter a cluster name for #### #### #### #### each member cluster, followed by the names #### #### #### #### of all the nodes in that cluster.Following #### #### #### #### the node names, enter the name of a monitor #### #### #### #### package that will run the continental #### #### #### #### cluster monitoring software on that cluster.#### #### #### #### It is strongly recommended that you use the #### #### #### #### same name for the monitoring package on all #### #### #### #### clusters; "ccmonpkg" is suggested. #### #### #### #### Monitoring of the recovery cluster by the #### #### #### #### primary cluster is optional. If you do not #### #### #### #### wish to monitor the recovery cluster, you #### #### #### #### must delete or comment out the #### #### #### #### MONITOR_PACKAGE_NAME and MONITOR_INTERVAL #### #### #### #### lines that follow the name of the primary #### #### #### #### cluster. #### #### Building the Continentalclusters Configuration
73
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
After the monitor package name, enter a monitor interval,specifying a number of minutes and/or seconds. The default is 60 seconds, the minimum is 30 seconds, and the maximum is 5 minutes.
#### #### #### #### #### #### CLUSTER_NAME westcoast #### CLUSTER_DOMAIN westnet.myco.com #### NODE_NAME system1 #### NODE_NAME system2 #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS#### #### #### CLUSTER_NAME eastcoast #### CLUSTER_DOMAIN eastnet.myco.com #### NODE_NAME system3 #### NODE_NAME system4 #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS #### #### CONTINENTAL_CLUSTER_NAME ccluster1 #### CONTINENTAL_CLUSTER_STATE_DIR #### CLUSTER_NAME #### CLUSTER_DOMAIN #### NODE_NAME #### NODE_NAME #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 60 SECONDS #### CLUSTER_NAME #### CLUSTER_DOMAIN #### NODE_NAME #### NODE_NAME #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 60 SECONDS ####
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
Editing Section 2 – Recovery Groups In this section of the file, define recovery groups, which are sets of Serviceguard packages that are ready to recover applications in case of cluster failure. Create a separate recovery group for each package that will be started on a cluster when the cmrecovercl(1m) command is issued on that cluster. Examples of recovery groups are shown graphically in Figure 2-7 and Figure 2-8.
74
Designing a Continental Cluster
Figure 2-7 Sample Continentalclusters Recovery Groups New York Cluster
Recovery Group for Sales Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE
NYnode1 Sales LAcluster/salespkg NYcluster/salespkg_bak
Los Angeles Cluster
LAnode1
NYnode2
salespkg_bak.config
custpkg_bak.conf
salespkg_bak.cntl
custpkg_bak.cntl
WAN
LAnode2
salespkg.config custpkg.config salespkg.cntl custpkg.cntl
Recovery Group for Customer Application: RECOVERY_GROUP_NAME Customer PRIMARY_PACKAGE LAcluster/custpkg RECOVERY_PACKAGE NYcluster/custpkg_bak
Building the Continentalclusters Configuration
75
Figure 2-8 Sample Bi-directional Recovery Groups New York Cluster Recovery Group for Sales Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE
Sales LAcluster/salespkg NYcluster/salespkg_bak
NYnode1
NYnode2
salespkg_bak.config salespkg_bak.cntl
custpkg.cntl
WAN
Los Angeles Cluster LAnode1
custpkg.config
LAnode2
salespkg.config custpkg_bak.conf salespkg.cntl custpkg_bak.cntl
Recovery Group for Customer Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE
Customer NYcluster/custpkg LAcluster/custpkg_bak
Enter data in Section 2 as follows: 1. 2.
Enter a name for the recovery group following the RECOVERY_GROUP_NAME keyword. This can be any name you choose. After the PRIMARY_PACKAGE keyword, enter a primary package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example: PRIMARY_PACKAGE LAcluster/custpkg
3.
76
Optionally, enter a data sender package definition consisting of the cluster name, a slash (/), and the data sender package name after the DATA_SENDER_PACKAGE keyword. This is only necessary if you are using a logical data replication method that requires a data sender package.
Designing a Continental Cluster
4.
After the RECOVERY_PACKAGE keyword, enter a recovery package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example: RECOVERY_PACKAGE NYcluster/custpkg_bak
5.
6.
7.
Optionally, enter a data receiver package definition consisting of the cluster name, a slash (/), and the data receiver package name after the DATA_RECEIVER_PACKAGE keyword. This is only necessary if using a logical data replication method that requires a data receiver package. Optionally, enter a rehearsal package definition consisting of the cluster name, a slash (/), and the rehearsal package name after the REHEARSAL_PACKAGE keyword. This is only required for performing a rehearsal operation at the recovery cluster. Repeat these steps for each package that will be recovered. Each package must be configured in a separate recovery group.
A printout of Section 2 of the Continentalclusters ASCII configuration file follows. ############################################################### #### #### Section 2. Recovery Groups #### #### #### #### This section defines recovery groups--sets #### #### #### #### of ServiceGuard packages that are ready to #### #### #### #### recover applications in case of cluster #### #### #### #### failure. Recovery groups allow one cluster #### #### #### #### in the continental cluster configuration to #### #### #### #### back up another member cluster's packages. #### #### #### #### You create a separate recovery group for #### #### #### #### each ServiceGuard package that will be #### #### #### #### started on the recovery cluster when the #### #### #### #### cmrecovercl(1m) command is issued. #### #### #### #### #### #### #### #### A recovery group consists of a primary #### #### #### #### package running on one cluster, a recovery #### #### #### #### package that is ready to run on a different #### #### #### #### cluster. In some cases, a data receiver #### #### #### #### package runs on the same cluster as the #### #### #### #### recovery package, and in some cases, a data #### #### #### #### sender package runs on the same cluster #### #### #### #### as the primary package.For rehearsal #### #### #### #### operations a rehearsal package forms a part #### #### #### #### of the recovery group. The rehearsal package #### #### #### #### is configured always in the recovery cluster.#### #### #### #### During normal operation, the primary package #### #### #### #### is running an application program on the #### #### #### #### primary cluster, and the recovery package, #### #### #### #### which is configured to run the same #### #### #### #### application, is idle on the recovery cluster.#### #### #### #### If the primary package performs disk I/O, #### #### #### #### the data that is written to disk is #### #### #### #### replicated and made available for possible #### #### #### #### use on the recovery cluster. #### #### Building the Continentalclusters Configuration
77
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 78
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
For some data replication techniques, this #### #### involves the use of a data receiver package #### #### running on the recovery cluster. #### #### In the event of a major failure on the #### #### primary cluster, the user issues the #### #### cmrecovercl(1m) command to halt any data #### #### receiver packages and start up all the #### #### recovery packages that exist on the #### #### recovery cluster. #### #### During rehearsal operation, before starting #### #### the rehearsal packages,care should be taken #### #### that the replication between the primary and #### #### the recovery sites is suspended. For some #### #### data replication techniques which involve #### #### the use of a data receiver package, #### #### rehearsal operations must be commenced only #### #### after shutting down the data receiver #### #### package at the recovery cluster. Rehearsal #### #### packages are started using the #### #### cmrecovercl -r command. #### #### Enter the name of each package recovery #### #### group together with the fully qualified #### #### names of the primary and recovery packages. #### #### If appropriate, enter the fully qualified #### #### name of a data receiver package. Note that #### #### the data receiver package must be on the #### #### same cluster as the recovery package. #### #### The primary package name includes the #### #### primary cluster name followed by a slash #### #### ("/") followed by the package name on the #### #### primary cluster. The recovery package name #### #### includes the recovery cluster name, followed #### #### by a slash ("/")followed by the package name #### #### on the recovery cluster. #### #### #### #### The data receiver package name includes the #### #### recovery cluster name, followed by a slash #### #### ("/") followed by the name of the data #### #### receiver package on the recovery cluster. #### #### The rehearsal package name includes the #### #### recovery cluster name, followed by a slash #### #### ("/"). #### #### Up to 29 recovery groups can be entered. #### #### #### #### Example: #### #### RECOVERY_GROUP_NAME nfsgroup #### #### PRIMARY_PACKAGE westcoast/nfspkg #### #### DATA_SENDER_PACKAGE westcoast/nfssenderpkg #### #### RECOVERY_PACKAGE eastcoast/nfsbackuppkg #### #### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg#### #### REHEARSAL_PACKAGE eastcoast/nfsrehearsalpkg #### #### #### ####
Designing a Continental Cluster
#### #### #### #### #### ####
#### #### #### #### #### ####
RECOVERY_GROUP_NAME hpgroup #### PRIMARY_PACKAGE westcoast/hppkg #### DATA_SENDER_PACKAGE westcoast/hpsenderpkg #### RECOVERY_PACKAGE eastcoast/hpbackuppkg #### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg#### REHEARSAL_PACKAGE eastcoast/hprehearsalpkg ####
#### #### #### #### #### ####
Editing Section 3—Monitoring Definitions Finally, enter monitoring definitions that define cluster events and set times at which alert and alarm notifications are to be sent out. Define notifications for all cluster events—Unreachable, Down, Up, and Error. Although it is impossible to make specific recommendations for every Continentalclusters environment, here are a few general guidelines about notifications. 1.
Specify the cluster event by using the CLUSTER_EVENT keyword followed by the name of the cluster, a slash (“/”) and the name of the status—Unreachable, Down, Up, or Error. Example: CLUSTER_EVENT LAcluster/UNREACHABLE
2.
3.
4.
Define a CLUSTER_ALERT at appropriate times following the appearance of the event. Specify the elapsed time and include a NOTIFICATION message that provides useful information about the event. Create as many alerts as needed, and send as many notifications as needed to different destinations (see the comments in the file excerpt below for a list of destination types). Note that the message text in the notification must be on a separate line in the file. If the event is for a cluster in an Unreachable condition, define a CLUSTER_ALARM at appropriate times. Specify the elapsed time since the appearance of the event (greater than the time used for the last CLUSTER_ALERT), and include a NOTIFICATION message that indicates what action should be taken. Create as many alarms as needed, and send as many notifications as needed to different destinations (see the comments in the file excerpt below for a list of destination types). If using a monitor on a cluster containing no recovery packages, define alerts for the monitoring of Up, Down, Unreachable, and Error states on the recovery cluster. It is not necessary to define alarms.
A printout of Section 3 of the Continentalclusters ASCII configuration file follows. ################################################################ #### #### Section 3. Monitoring Definitions #### #### #### #### This section of the file contains monitoring #### #### #### #### definitions. Well planned monitoring #### #### #### #### definitions will help in making the decision #### #### #### #### whether or not to issue the cmrecovercl(1m) #### #### #### #### command. Each monitoring definition specifies#### #### #### #### a cluster event along with the messages #### #### #### #### that should be sent to system administrators #### #### #### #### or other IT staff. #### #### Building the Continentalclusters Configuration
79
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 80
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
All messages are appended to the default log #### #### /var/opt/resmon/log/cc/eventlog as well as to#### #### the destination you specify below. #### #### A cluster event takes place when a monitor #### #### that is located on one cluster detects a #### #### significant change in the condition of #### #### another cluster. The monitored cluster #### #### conditions are: #### #### UNREACHABLE - the cluster is unreachable. #### #### This will occur when the communication link #### #### to the cluster has gone down, as in a WAN #### #### failure, or when the all nodes in the #### #### cluster have failed. #### #### DOWN - the cluster is down but nodes are #### #### responding. This will occur when the cluster #### #### is halted, but some or all of the member #### #### nodes are booted and communicating with the #### #### monitoring cluster. #### #### UP - the cluster is up. #### #### ERROR - there is a mismatch of cluster #### #### versions or a security error. #### #### A change from one of these conditions to #### #### another one is a cluster event. You can #### #### define alert or alarm states based on the #### #### length of time since the cluster event was #### #### observed. Some events are noteworthy at the #### #### time they occur, and some are noteworthy #### #### when they persist over time. Setting the #### #### elapsed time to zero results in a message #### #### being sent as soon as the event takes place. #### #### Setting the elaspsed time to 5 minutes results#### #### in a message being sent when the condition #### #### has persisted for 5 minutes. #### #### An alert is intended as informational only. #### #### Alerts may be sent for any type of cluster #### #### condition. For an alert, a notification is #### #### sent to a system administrator or other #### #### destination. Alerts are not intended to #### #### indicate the need for recovery. The #### #### cmrecovercl(1m) command is disabled. #### #### #### #### An alarm is an indication that a condition #### exists that may require recovery. For an #### alarm, a notification is sent, and in #### addition, the cmrecovercl(1m) command is #### enabled for immediate execution, allowing #### the administrator to carry out cluster #### recovery. An alarm can only be defined for #### an UNREACHABLE or DOWN condition in the #### monitored cluster. #### A notification defines a message that is #### appended to the log file ####
Designing a Continental Cluster
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
/var/opt/resmon/log/cc/eventlog and sent to other specified destinations, including email addresses, SNMP traps, the system console, or the syslog file. The message string in a notification can be no more than 170 characters. Enter notifications in one of the following forms: NOTIFICATION CONSOLE Message written to the console.
#### #### #### #### #### #### #### #### #### #### #### NOTIFICATION EMAIL #### #### Message emailed to a fully qualified email #### address. #### ##### NOTIFICATION OPC #### #### The is sent to OpenView IT/Operations)#### The value of may be 8 (normal), #### 16 (warning), 64 (minor), 128 (major),32 #### (critical). #### NOTIFICATION SNMP #### #### The is sent as an SNMP trap. #### The value of may be 1 (normal), #### 2 (warning), 3 (minor), 4 (major),5 (critical). #### NOTIFICATION SYSLOG #### #### A notice of the event is appended to the syslog #### file. #### #### NOTIFICATION TCP : ##### #### Message is sent to a TCP port on the specified #### node. #### #### NOTIFICATION TEXTLOG #### #### A notice of the event is written to a user#### specified log file. must be a full #### path for the user-specified file. The user #### specified file must be under /var/opt/resmon/log #### directory. #### NOTIFICATION UDP : #### #### Message is sent to a UDP port on the specified #### node. #### For the cluster event, enter a cluster name #### followed by a slash ("/") and a cluster condition #### (UP, DOWN, UNREACHABLE,ERROR) that may be detected #### by a monitor program. #### Building the Continentalclusters Configuration
81
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 82
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
##### Each cluster event must be paired with a #### monitoring cluster. Include the name of the #### cluster on which the monitoring will take place. #### Events can be monitored from either the primary ##### cluster or the recovery cluster. #### #### Alerts, alarms, and notifications have the #### following syntax. #### #### CLUSTER_ALERT MINUTES SECONDS #### Delay before the software issues an alert #### notification about the cluster event. #### #### CLUSTER_ALARM MINUTES SECONDS #### Delay before the software issues an alarm #### notification about the cluster event and #### enables the cmrecovercl(1m) command for #### immediate execution. #### NOTIFICATION #### #### A string value which is sent from the monitoring #### cluster for a given event to a specified #### destination. The , which can be no more #### than 170 characters, is also appended to the #### /var/opt/resmon/log/cc/eventlog file on the #### monitoring node in the cluster where the event #### was detected. #### #### #### Example: #### #### CLUSTER_EVENT westcoast/UNREACHABLE #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 5 MINUTES #### NOTIFICATION EMAIL [email protected] #### "westcoast status unknown for 5 min. Call #### secondary site." #### NOTIFICATION EMAIL [email protected] #### "Call primary admin. (555) 555-6666." #### #### CLUSTER_ALERT 10 MINUTES #### NOTIFICATION EMAIL [email protected] #### "westcoast status unknown for 10 min. Call #### secondary site." #### NOTIFICATION EMAIL [email protected] #### "Call primary admin. (555) 555-6666." #### NOTIFICATION CONSOLE #### "Cluster ALERT: westcoast not responding." #### #### CLUSTER_ALARM 15 MINUTES #### NOTIFICATION EMAIL [email protected] ####
Designing a Continental Cluster
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
"westcoast status unknown for 15 min. Takeover #### advised." #### NOTIFICATION EMAIL [email protected] #### "westcoast still not responding. Use #### cmrecovercl command." #### NOTIFICATION CONSOLE #### "Cluster ALARM: Issue cmrecovercl command to take #### over "westcoast." #### #### CLUSTER_EVENT westcoast/UP #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 0 MINUTES #### NOTIFICATION EMAIL [email protected] #### "Cluster westcoast is up." #### #### CLUSTER_EVENT westcoast/DOWN #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 0 MINUTES #### NOTIFICATION EMAIL [email protected] #### "Cluster westcoast is down." #### #### CLUSTER_EVENT westcoast/ERROR #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 0 MINUTES #### NOTIFICATION EMAIL [email protected] #### "Error in monitoring cluster westcoast." #### #### CLUSTER_EVENT /UNREACHABLE #### MONITORING_CLUSTER CLUSTER_ALERT ####
The TEXTLOG notification file should be placed under the /var/opt/resmon/log directory. If any other directory is specified, an error is reported by the cmapplyconcl and cmcheckconcl commands. If you specify any other location for logging, the following error message appears: The target after textlog “ ” is not valid. Please specify a file under /var/opt/resmon/log directory
If you upgraded Continentalclusters but are still using the old configuration file, the textlog location is still specified as /var/adm/cmconcl. As a result, the following error message appears: The file path “s” specified for textlog is invalid.
The destination file must be under /var/opt/resmon/log directory. Please change the path and restart the ccmon package.
Building the Continentalclusters Configuration
83
IMPORTANT: For TEXTLOG notification, the destination log file must be in the /var/ opt/resmon/log directory. If the destination file is not available in this directory, Continentalclusters will not work properly. Selecting Notification Intervals The monitor interval determines the amount of time between distinct attempts by the monitor to obtain the status of a cluster. The intervals associated with notifications need to be chosen to work in combination with the monitor interval to give a realistic picture of cluster events. Some combinations are not useful. For example, notification intervals that are smaller than the monitor interval do not make sense, and should be avoided. In the following example, the cluster event will always result in two alerts followed by an alarm. No change of state could possibly be detected at the one-minute, two-minute and three-minute intervals, because the monitor does not check for changes until the monitor interval (5 minutes) has been reached. MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 5 MINUTES... CLUSTER_EVENT LACluster/UNREACHABLE CLUSTER_ALERT 1 MINUTE NOTIFICATION CONSOLE "1 Minute Alert: LACluster Unreachable" CLUSTER_ALERT 2 MINUTES NOTIFICATION CONSOLE "2 Minute Alert: LACLuster Still Unreachable" CLUSTER_ALRAM 3 MINUTES NOTIFICATION CONSOLE "ALARM: LACluster Unreachable after 3 Minutes: Recovery Enabled"
The following sequence could provide meaningful notifications, since a change of state is possible between notification intervals: MONITOR_PACAGE_NAME ccmonpkg MONITOR_INTERVAL 1 minute ... CLUSTER_EVENT LACluster/UNREACHABLE CLUSTER_ALERT 3 MINUTES NOTIFCATION CONSOLE "5 Minute Alert: LACluster Still Unreachable CLUSTER_ALARM 10 MINUTES NOTIFICATION CONSOLE "ALARM: LACluster Unreachable after 10 Minutes: Recovery Enabled"
NOTE:
The notification intervals should be multiples of the monitor interval.
The following is a sample Continentalclusters configuration file with two recovery pairs. Both cluster1 and cluster2 are configured to have cluster3 as their 84
Designing a Continental Cluster
recovery cluster for package pkg1 and pkg2, and cluster3 is configured to have cluster1 as its recovery cluster for pkg3. # Section1: Cluster Information CONTINENTAL_CLUSTER_NAME sampleCluster CONTINENTAL_CLUSTER_STATE_DIR /opt/cmconcl/statedir CLUSTER_NAME cluster1 CLUSTER_DOMAIN cup.hp.com NODE_NAME node11 NODE_NAME node12 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 seconds CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME
cluster2 cup.hp.com node21 node22
CLUSTER_NAME cluster3 CLUSTER_DOMAIN cup.hp.com NODE_NAME node31 NODE_NAME node32 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 seconds RECOVERY_GROUP_NAME ccRG1 PRIMARY_PACKAGE cluster1/pkg1 RECOVERY_PACKAGE cluster3/pkg1’ REHEARSAL_PACKAGE cluster3/pkg4’ RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE RECOVERY_GROUP_NAME RECOVERY_PACKAGE DATA_RECEIVER_PACKAGE
ccRG2 cluster2/pkg2 cluster3/pkg2’ ccRG3 cluster3/pkg3 cluster1/pkg3’
# Section 3. Monitoring Definitions #### CLUSTER_EVENT cluster1/DOWN MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster1 DOWN alert” CLUSTER_ALARM 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alarm” NOTIFICATION SYSLOG Building the Continentalclusters Configuration
85
“DRT: (Ora-test) cluster1 DOWN alarm” CLUSTER_EVENT cluster2/DOWN MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 DOWN alert” CLUSTER_ALARM 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alarm” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 DOWN alarm” CLUSTER_EVENT cluster3/DOWN MONITORING_CLUSTER cluster1 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/logging “DRT: (Ora-test) DOWN alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster3 DOWN alert” CLUSTER_ALARM 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alarm” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster3 DOWN alarm” CLUSTER_EVENT cluster1/UP MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) UP alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster1 UP alert” CLUSTER_EVENT cluster2/UP MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) UP alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 UP alert” CLUSTER_EVENT cluster3/UP MONITORING_CLUSTER cluster1 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) UP alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster3 UP alert”
86
Designing a Continental Cluster
Checking and Applying the Continentalclusters Configuration After editing the configuration file on any of the participating clusters in the Continentalcluster, halt any monitor packages that are running, then use the following steps to apply the configuration to all nodes in the continental cluster. 1.
Verify the content of the file. # cmcheckconcl -v -C cmconcl.config This command will verify that all parameters are within range, all fields are filled out, and the entries (such as NODE_NAME) are valid.
2.
Distribute the Continentalclusters configuration information to all nodes in the continental cluster. # cmapplyconcl -v -C cmconcl.config Configuration data is copied to all nodes and in all the participating clusters. This data includes a set of managed object files that are copied to the /ec/cmconcl/ instances directory on every node in all clusters.
3.
Be sure to make a backup copy of the configuration ascii file and save it on the other cluster after it is applied.
NOTE: If any problems occur during the execution of cmapplyconcl, repeat the command as often as necessary. Issuing the command will delete the existing Continentalclusters configuration and apply the new one. When configuration is finished, your systems should have sets of files similar to those shown in Figure 2-9.
Building the Continentalclusters Configuration
87
Figure 2-9 Continentalclusters Configuration Files New York Cluster NYnode1
recovery package files salespkg_bak.config salespkg_bak.cntl custpkg_bak.config custpkg_bak.cntl
contclust config file cmconcl.config
contclust config file cmconcl.config
contclust monitor pkg ccmonpkg.config ccmonpkg.cntl
contclust monitor pkg ccmonpkg.config ccmonpkg.cntl
managed object files /etc/cmconcl/instances/*
Los Angeles Cluster
NYnode2
recovery package files salespkg_bak.config salespkg_bak.cntl custpkg_bak.config custpkg_bak.cntl
managed object files /etc/cmconcl/instances/*
WAN
LAnode2
LAnode1 primary package files salespkg.config salespkg.cntl custpkg.config custpkg.cntl
primary package files salespkg.config salespkg.cntl custpkg.config custpkg.cntl
contclust config file cmconcl.config
contclust config file cmconcl.config
contclust monitor pkg ccmonpkg.conf ccmonpkg.cntl
contclust monitor pkg ccmonpkg.config ccmonpkg.cntl
managed object files /etc/cmconcl/instances/*
managed object files /etc/cmconcl/instances/*
Starting the Continentalclusters Monitor Package Starting the monitoring package enables all Continentalclusters monitoring functionality. Before doing this, ensure that the primary packages selected to be protected are running normally and that data sender and receiver packages, if they are being used for logical data replication, are working properly. If using physical data replication, make sure that it is operational. On each monitoring cluster start the monitor package. # cmmodpkg -e ccmonpkg After the monitor package is started, a log file /var/adm/cmconcl/sentryd.log will be created on the node where the package is running to record the 88
Designing a Continental Cluster
Continentalclusters monitoring activities. It is recommended that this log file be archived or cleaned up periodically.
Validating the Configuration The following table shows the status of Continentalclusters packages in a recovery pair when each cluster is running normally and no recovery has taken place. Table 2-6 Status of Continentalclusters Packages Before Recovery Primary Cluster
Recovery Cluster
Data Replication Method
Primary Package
Data Sender Package
Optional Monitor Package
Recovery Package
Data Receiver Package
Required Monitor Package
Physical— Symmetrix
Running
Not used
Running (optional)
Halted
Not used
Running (required)
Physical— XP Running Series
Not used
Running (optional)
Halted
Not used
Running (required)
Physical—EVA Running Series
Not used
Running (optional)
Halted
Not used
Running (required)
Logical— Oracle Standby Database
Not used
Running (optional)
Halted
Running
Running (required)
Running
Use the following steps to ensure the components are functioning correctly: 1.
Make sure all daemons are running. # ps -ef | grep cmcl Two important Continentalclusters daemons are cmclsentryd and cmclrmond.
2.
Check the cluster configuration on each cluster using the cmviewcl -v command. a. Ensure that each primary package is running correctly. b. Ensure that the data sender packages (if any are used for logical data replication) are running correctly. c. Ensure that the data receiver packages (if any are used for logical data replication) are running correctly. d. Ensure that the continental cluster monitor package is running correctly on each monitoring cluster.
3.
On all nodes, use the tail -f /adm/syslog/syslog.log command to check the end of the SYSLOG file for errors. On nodes where packages are running, check all package log files for errors, including application packages and the monitor package.
4.
Building the Continentalclusters Configuration
89
5.
Use the following command to verify the correct operation of the Continentalclusters daemon: # /opt/cmom/tools/bin/cmreadlog -f \/var/adm/cmconcl/sentryd.log
6. 7. 8.
Make sure the Continentalclusters monitor packages (default name ccmonpkg) on each cluster fails over properly if a node fails. Change each cluster’s state to test that the monitor running on the monitoring cluster will detect the change in status and send notification. View the status of the Continentalclusters primary and recovery clusters, including configured event data. # cmviewconcl -v
CAUTION: Never issue the cmrunpkg command for a recovery package when Continentalclusters is enabled, because there is no guaranteed way of preventing a package that is running on one cluster from running on the other cluster if the package is started using this command. The potential for data corruption is great. Chapters 3, 4 and 5 contain additional suggestions on testing the data replication and package configuration.
Documenting the Recovery Procedure Once everything is configured and the Continentalclusters monitor is running, it is necessary to define your recovery procedure and train the administrators and operators at both sites. The checklist in Figure 2-10 is an example of to document the recovery procedure.
90
Designing a Continental Cluster
Figure 2-10 Recovery Checklist Identify the level of alert that the monitoring site received. Cluster Alert Cluster Alarm Contact the monitored site by phone or beeper to rule out the following: WAN networking failure, primary cluster and packages are still fine. Cluster and/or package have come back up but UP notification not yet received by recovery site. Get authorization from the monitored site using one of the following: Authorized person contacted: Director 1 Admin 1 Authorization received: Human-to-human voice authorization Voice mail Notify the monitored site of successful recovery using one of the following: Authorized person contacted: Director 1 Admin 1 Confirmation received: Human-to-human voice confirmation Voice mail
Reviewing the Recovery Procedure Using the checklist described in the previous section, step through the recovery procedure to make sure that all necessary steps are included. If possible, create simulated failures to test the alert and alarm scenarios coded in the Continentalclusters configuration file.
Testing the Continental Cluster This section presents some test procedures and scenarios. Some scenarios presume certain configurations that may not apply to all environments. Additionally, these tests
Testing the Continental Cluster
91
do not eliminate the need to perform standard Serviceguard testing for each cluster individually. CAUTION: Data and system corruption can occur as a result of testing. System and data backups should always be done prior to testing.
Testing Individual Packages Use procedures like the following to test individual packages: 1. 2. 3. 4. 5. 6. 7.
8.
Use the cmhaltpkg command to shut down the package on the primary cluster that corresponds to the package to be tested on the recovery cluster. Do not switch any users to the recovery cluster. The application must be inaccessible to users during this test. Start up the package to be tested on the recovery cluster using the cmrunpkg command. Access the application manually using a mechanism that tests network connectivity. Perform read-only actions to verify that the application is running appropriately. Shut down the application on the recovery cluster using the cmhaltpkg command. If using physical data replication, do not resync from the recovery cluster to the primary cluster. Instead, manually issue a command that will overwrite any changes on the recovery disk array that may inadvertently have been made. Start the package up on the primary cluster and allow connection to the application.
Testing Continentalclusters Operations Use the following procedures to exercise typical Continentalclusters behaviors: 1.
Halt both clusters in a recovery pair, then restart both clusters. The monitor packages on both clusters should start automatically. The Continentalclusters packages (primary, data sender, data receiver, and recovery) should not start automatically. Any other packages may or may not start automatically, subject to their configuration. NOTE: If an UP status is configured for a cluster, then an appropriate alert notification (email, SNMP, etc.) should be received at the configured time interval from the node running the monitor package on the other cluster. Due to delays in email or SNMP, the notifications may arrive later than expected. In addition to alerts/alarms sent using the mechanisms defined in the Continentalclusters configuration file, they are also recorded in the file /var/opt/ resmon/log/cc/eventlog on the system reporting the event.
2.
92
While the monitor package is running on a monitoring cluster, halt the monitored cluster (cmhaltcl -f). An appropriate alert notification (email, SNMP, etc.) should be received at the configured time interval from the node running the
Designing a Continental Cluster
3.
monitor package. Run cmrecovercl. The command should fail. Additional notifications should be received at the configured time intervals. After the alarm notification is received, run cmrecovercl. Any data receiver packages on the monitoring cluster should halt and the recovery package(s) should start with package switching enabled. Halt the recovery packages. Test 2 should be rerun under a variety of conditions (and multiple conditions) such as the following: • Rebooting and powering off systems one at a time • Rebooting and powering off all systems at the same time — Running the monitor package on each node in each cluster — Disconnecting the WAN connection between the clusters If physical data replication is used disconnect the physical replication links between the disk arrays: — Powering off the disk array at the primary site — Powering off the disk array at the recovery site •
Testing cmrecovercl -f as well as cmrecovercl
Depending on the condition, the primary packages should be running to test real life failures and recovery procedures. 4.
5.
6.
After each scenario in tests 2-4, restore both clusters to their production state, restart the primary package(s) (as well as any data sender and data receiver packages) and note any issues, time delays, etc. Halt the monitor package on one cluster. Halt the other cluster. No notifications are generated that the other cluster has failed. What mechanism is available to the organization to monitor the monitor? Halt the packages on one cluster, but do not halt the cluster. No notifications are generated that the packages on that cluster have failed. What mechanism is available to the organization to monitor package status? NOTE:
7.
Continentalclusters monitors cluster status, but not package status.
View the status of the continental cluster. # cmviewconcl
Switching to the Recovery Packages in Case of Disaster Once the clusters are configured and tested, packages will be able to fail over to an alternate node in another data center and still have access to the data they need to function. The primary steps for failing over a package are: 1. 2.
Receive notification that a monitored cluster is unavailable. Verify that it is necessary and safe to start the recovery packages. Switching to the Recovery Packages in Case of Disaster
93
3. 4.
Use the recovery command to stop data replication and start recovery packages. View the status of the continental cluster. # cmviewconcl
It is important to have a well-defined recovery process, and that all members at both sites are educated on how to use this process.
Receiving Notification Once the monitor is started, as described in “Starting the Continentalclusters Monitor Package” (page 88), the monitor will send notifications as configured. The following types of notifications are generated as configured in cmclconf.ascii: •
•
CLUSTER_ALERT is a change in the status of a cluster. Recovery via the cmrecovercl command is not enabled by default. This should be treated as information that the cluster either may be developing a problem or may be recovering from a problem. CLUSTER_ALARM is a change in the status of a cluster that indicates that the cluster has been unavailable for an unacceptable amount of time. Recovery via the cmrecovercl command is enabled.
The issuing of notifications takes place at the timing intervals specified for each cluster event. However, it sometimes may appear that an alert or alarm takes longer than configured. Keep in mind that if several changes of cluster state (for example, Down to Error to Unreachable to Down) take place in a smaller time than the configured interval for an alert or alarm, the timer is reset to 0 after each change of state; thus, the time to the alert or alarm will be the configured interval plus the time used by all the earlier state changes. NOTE: The cmrecovercl command is fully enabled only after a CLUSTER_ALARM is issued; however, the command may be used with the -f option when a CLUSTER_ALERT has been issued.
Verifying that Recovery is Needed It is important to follow the established protocol for coordinating with the remote site to determine whether moving the package is required. This includes initiating person-to-person communication between sites. For example, it may be possible that the WAN network failed, causing the cluster alarm. Some network failures, such as those that prevent clients from using the application, may require recovery. Other network failures, such as those that only prevent the two clusters from communicating, may not require recovery. Following an established protocol for communicating with the remote site would verify this. See Figure 2-10 (page 91) for an example of a recovery checklist.
94
Designing a Continental Cluster
Using the Recovery Command to Switch All Packages If other types of data replication technology are chosen other than Metrocluster Continuous Access XP, or Metrocluster Continuous Access EVA, or Metrocluster with EMC SRDF, use the following steps prior to executing the Continentalclusters recovery command, cmrecovercl. Once notification is received and there is coordination between the sites in a recovery pair, (For a sample worksheet, see “Documenting the Recovery Procedure” (page 90)), and have determined that moving the package is necessary: • •
•
Check to make sure the data used by the application is in usable state. Usable state means the data is consistent and recoverable, even though it may not be current. Check to make sure the secondary devices are in read-write mode. If you are using database or software data replication make sure the data copy at the recovery site is in read-write mode as well. If LVM and physical data replication are used, the ID of the primary cluster is also replicated and written on the secondary devices in the recovery site. The ID of the primary cluster must be cleared and the ID of the recovery cluster must be written on the secondary devices before they can be used. If LVM exclusive-mode is used, issue the following commands from a node in the recovery cluster on all the volume groups that are used by the recovery packages: # vgchange -c n # vgchange -c y If LVM shared-mode (SLVM) is used, from a node in the recovery cluster, issue the following commands: # vgchange -c n -S n # vgchange -c y -S y
•
If VxVM and physical data replication are used, the host name of a node in the primary cluster is the host name of the last owner of the disk group. It is also replicated and written on the secondary devices in the recovery site. The host name of the last owner of the disk group must be cleared out before the secondary devices can be used. If VxVM is used, issue the following command from a node in the recovery cluster on all the disk groups that are used by the recovery packages: # vxdg deport
To Start the Failover Process Use the following commands to start the failover process: # cmrecovercl
Switching to the Recovery Packages in Case of Disaster
95
If a notification defined in a CLUSTER_ALARM statement in the configuration file is not received, but a CLUSTER_ALERT and the remote site has confirmed the need to fail over has been received, then override the disabled cmrecovercl command by using the -f forcing option: # cmrecovercl -f Use this command only after positive confirmation from the remote site. The cmrecovercl command will skip recovery for recovery groups in maintenance mode. In a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster, runningcmrecovercl without any option will attempt to recover packages for all of the recovery groups of the configured primary clusters. Recovery can also be done in this multiple recovery pair case on a per cluster basis by using option -c. # cmrecovercl -c If the monitored cluster comes back up following an alert or alarm, but it is certain that the primary packages cannot start (say, because of damage to the disks on the primary site), then use a special procedure to initiate recovery: 1. 2. 3.
Use the cmhaltcl command to halt the primary cluster. Wait for the monitor to send an alert. Use cmrecovercl -f to perform recovery.
After the cmrecovercl command is issued, Continentalclusters displays a warning message, such as the following and prompts for a verification that recovery should proceed (the names “LAcluster” and “NYcluster” are examples):WARNING: This command will take over for the primary cluster “LAcluster” by starting the recovery package on the recovery cluster "NYCluster.You must follow your site disaster recovery procedure to ensure that the primary packages on "LAcluster" are not running and that recovery on "NYCluster" is necessary. Continuing with this command while the applications are running on the primary cluster may result in data corruption.Are you sure that the primary packages are not running and will not come back, and are you certain that you want to start the recovery packages? [Y/N]. Reply “Y” to proceed only if you are certain that recovery should take place. After replying “Y”, a group of messages will appear as shown below. As the processing of each recovery group occurs (the message about the data receiver package appears only using logical data replication with data sender and receiver packages):Processing the recovery group nfsgroup on recovery cluster eastcoast.Disabling switching for data receiver package nfsreceiverpkg on recovery cluster eastcost.Halting data receiver package nfsreceiverpkg on recovery cluster east coast.Starting 96
Designing a Continental Cluster
recovery package nfsbackuppkg on recovery cluster eastcoast.Enabling package nfsbackuppkg in cluster eastcoast.----------------exit status = 0---------------The command cmrecovercl starts up all the recovery packages that are configured in the recovery groups. The cmrecovercl -c command will skip recovery for recovery groups in maintenance mode. In addition to starting the recovery packages all at once, another option is to recover an individual recovery group by using the following command: # cmrecovercl -g Recovery_Group_Name Running the cmrecovercl with option -g starts up only the recovery package configured in the specified recovery group. The cmrecovercl -g command fails to recover if the specified recovery group is in maintenance mode. NOTE: After the cmrecovercl command is issued, there is a delay of at least 90 seconds per recovery group as the command makes sure that the package is not active on another cluster. Use the cmviewcl command on the local cluster to confirm that the recovery packages are running correctly. Following recovery, halt the package that was monitoring the remote cluster if preferred. If this is not done then notification, if there is a change in the remote cluster’s state, will continue to be received. The following table shows the status of Continentalclusters packages after recovery has taken place, and applications are now running on the local cluster. Table 2-7 Status of Continentalclusters Packages After Recovery Primary Cluster
Recovery Cluster
Data Replication Method
Primary Package
Data Sender Package
Optional Monitor Package
Recovery Package
Data Receiver Package
Required Monitor Package
Physical— Symmetrix
Halted
Not used
Halted or Running
Running
Not used
Halted or Running
Physical— XP Series
Halted
Not used
Halted or Running
Running
Not used
Halted or Running
Physical—EVA Halted Series
Not used
Halted or Running
Running
Not used
Halted or Running
Logical— Halted Oracle Standby Database
Not used
Halted or Running
Running
Halted
Halted or Running
Switching to the Recovery Packages in Case of Disaster
97
How the cmrecovercl Command Works The cmrecovercl command uses the configuration file to loop through each defined recovery group of a target remote cluster to be recovered. For each recovery group that is not in the maintenance mode, the command communicates with the monitor package (ccmonpkg) and verifies that the remote cluster is unreachable or down, then if there is a data replication package it is halted, and the recovery package is enabled on the Recovery Cluster. The recovery package can then start up on the local cluster on the appropriate node, as determined by the FAILOVER_POLICY configured for the package. The process continues for the next recovery group, even if there are problems with one recovery group. The command will skip recovery for any recovery group in maintenance mode. After processing one recovery group, if the command discovers that the local cluster is back up, the command exits, since the alarm or alert state no longer exists. This process keeps both the primary and recovery packages from running on the remote cluster and local cluster at the same time, which would result in data corruption. NOTE: If the remote cluster comes back up following a cluster event but the primary packages cannot run, halt the primary cluster with the cmhaltcl command, then issue cmrecovercl with the -f option.
Forcing a Package to Start The cmforceconcl command is used to force a Continentalclusters package to start even if the status of a remote package in the recovery group is unknown. This command is used as a prefix to a cmrunpkg and cmmodpkg command. Under normal circumstances, Continentalclusters will not allow a package to start in the recovery cluster unless it can determine that the package is not running in the primary cluster. In some cases, communication between the two clusters may be lost, and it may be necessary to start the package on the recovery cluster anyway. To do this, use the cmforeconcl command, which is used along with a cmrunkpg or cmmodpkg command, as in the following example:
98
Designing a Continental Cluster
# cmforceconcl cmrunpkg -n node3 Pkg1 CAUTION: When using this command, ensure that the other cluster is not running the package. Failure to do this may result in the package running in both clusters, which will cause data corruption.
Restoring Disaster Tolerance After a failover to a cluster occurs, restoring disaster tolerance has many challenges, the most significant of which are: •
Restoring the failed cluster. Depending on the nature of the disaster it may be necessary to either create a new cluster or to restore the cluster. Before starting up the new or the failed cluster, make sure the AUTO_RUN flag for all of the Continentalclusters application packages is disabled. This is to prevent starting the packages unexpectedly with the cluster.
•
Resynchronizing the data To resynchronize the data, you either restore the data to the cluster and continue with the same data replication procedure, or set up data replication to function in the other direction.
The following sections briefly outline some scenarios for restoring disaster tolerance.
Restore Clusters to their Original Roles If the disaster did not destroy the cluster, there is the option to return both clusters in a recovery pair to their original roles. To do this: 1. 2.
Make sure that both clusters are up and running, with the recovery packages continuing to run on the surviving cluster. On each cluster, stop the Continentalclusters monitor package if it is still running. # cmhaltpkg ccmonpkg
3. 4.
Compare the clusters to make sure their configurations are consistent. Correct any inconsistencies. For each recovery group where the repaired cluster will run the primary package: a. Synchronize the data from the disks on the surviving cluster to the disks on the repaired cluster. This may be time-consuming. b. Halt the recovered application on the surviving cluster if necessary, and start it on the repaired cluster. c. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.
Restoring Disaster Tolerance
99
5.
Restart the monitor using the following command on each cluster: # cmrunpkg ccmonpkg Alternatively, if the monitoring package configuration has been modified, use the following sequence on each cluster to apply the new configuration and start the monitor: # cmapplyconf -P ccmonpkg.config # cmmodpkg -e ccmonpkg
6.
View the status of the Continentalcluster. # cmviewconcl
Primary Packages Remaining on the Surviving Cluster Configure the failed cluster in a recovery pair as a recovery-only cluster and the surviving cluster as a primary-only cluster. This minimizes the downtime involved with moving the applications back to the restored cluster. It also assumes that the surviving cluster has sufficient resources to handle running all critical applications indefinitely. NOTE: In a multiple recovery pairs scenario, where more than one primary cluster are configured to share the same recovery cluster, the following procedure to switch the role of the failed cluster and the surviving cluster should not be used. Use the following: 1.
Halt the monitor packages. Issue the following command on each cluster: # cmhaltpkg ccmonpkg
2.
3.
Edit the Continentalclusters ASCII configuration file. It is necessary to change the definitions of monitoring clusters, and switch the names of primary and recovery packages in the definitions of recovery groups. It may also be necessary to re-create data sender and data receiver packages. Check and apply the Continentalclusters configuration. # cmcheckconcl -v -C cmconcl.config # cmapplyconcl -v -C cmconcl.config
4.
Restart the monitor packages on each cluster. # cmmodpkg -e ccmonpkg
5.
View the status of the Continentalcluster. # cmmviewconcl
Before applying the edited configuration, the data storage associated with each cluster needs to be prepared to match the new role. In addition, the data replication direction 100
Designing a Continental Cluster
needs to be changed to mirror data from the new primary cluster to the new recovery cluster.
Primary Packages Remaining on the Surviving Cluster using cmswitchconcl Continentalclusters provides the command cmswitchconcl to facilitate steps two and three described in the section “Primary Packages Remaining on the Surviving Cluster”. The command cmswitchconcl is used to switch the roles of primary and recovery packages of the Continentalclusters recovery groups for which the specified cluster is defined as the primary cluster. Do not use the cmswitchconcl command in a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster. Otherwise, the command will fail. When switching roles for a recovery group configured with a rehearsal package, the rehearsal package in the old recovery cluster should be removed before the configuration is applied. The newly generated recovery group configuration will not have any rehearsal package configured. WARNING! When you configure the maintenance mode for a recovery group, you must move all recovery groups, whose roles have been switched, out of the maintenance mode, before applying the new configuration. NOTE: Before running the cmswitchconcl command, the data storage associated with each cluster needs to be prepared properly to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster. The cmswitchconcl command cannot be used for the recovery groups that have both data sender and data receiver packages specified. To restore disaster tolerance with cmswitchconcl while continuing to run the packages on the surviving cluster, use the following procedures: 1.
Halt the monitor package on each cluster. # cmhaltpkg ccmonpkg
2.
Run this command. # cmswitchconcl \ -C currentContinentalclustersConfigFileName \ -c oldPrimaryClusterName \ [-a] [-F NewContinentalclustersConfigFileName] The above command switches the roles of the primary and recovery packages of the Continentalclusters recovery groups for which “OldPrimaryClusterName” is defined as the primary cluster. Restoring Disaster Tolerance
101
The default values of monitoring package name (ccmonpkg) and interval (60 seconds), and notification scheme (SYSLOG) with notification delay (0 seconds) will be added for cluster “OldPrimaryClusterName”, which will serve as the recover-only cluster. If editing of the default values are desired, do it with file “NewContinentalclusterConfigFileName” if -F is specified, or with file, “CurrentContinentalclustersConfigFileName” if -F is not specified. If editing of the new configuration file is needed, do not use the -a option. If option -a is specified the new configuration will be applied automatically. 3.
If option -a is specified with cmswitchconcl in step 2, skip this step. Otherwise manually apply the new Continentalclusters configuration. # cmapplyconcl -v -c newContinentalclustersConfigFileName (if -F is specified in step 2) # cmapplyconcl -v -c \ CurrentContinentalcusterConfigFileName (if -F is not specified in step 2)
4.
Restart the monitor packages on each cluster. # cmmodpkg -e ccmonpkg
5.
View the status of the Continentalcluster. # cmviewconcl
NOTE: The cluster shared storage configuration file /etc/cmconcl/ccrac/ ccrac.config is not updated by cmswitchconcl. The CCRAC_CLUSTER and CCRAC_INSTANCE_PKGS variables in the cluster shared storage configuration file must be manually updated on all nodes in the clusters to reflect the new primary cluster and package names. The cmswitchconcl command is also used to switch the package role of a recovery group. If only a subset of the primary packages will remain running on the surviving (recovery) cluster, a new option -g is provided with the cmswitchconclcommand. This option reconfigures the roles of the packages of a recovery group and helps retain recovery protection after a failover. Usage of option -g(recovery group based role switch reconfiguration) is the same as the one for -c(cluster based role switch reconfiguration). Note, option -c and -g of the cmswitchconcl command are mutually exclusive. # cmswitchconcl \ -C currentContinentalclustersConfigFileName \ -g RecoverGroupName \ [-a] [-F NewContinentalclustersConfigFileName]
102
Designing a Continental Cluster
The following is a sample of input and output files for running cmswitchconcl -C sample.input -c clusterA -F Sample.out sample.input ============ ### Section 1. Cluster Information CONTINENTAL_CLUSTER_NAME Sample_CC_Cluster CLUSTER_NAME ClusterA CLUSTER_DOMAIN cup.hp.com NODE_NAME node1 NODE_NAME node2 MONITOR_PACKAGE_NAME ccmonpkg CLUSTER_NAME ClusterB CLUSTER_DOMAIN cup.hp.com NODE_NAME node3 NODE_NAME node4 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS ### Section 2. Recovery Groups RECOVERY_GROUP_NAME RG1 RIMARY_PACKAGE ClusterA/pkgX ECOVERY_PACKAGE ClusterB/pkgX' RECOVERY_GROUP_NAME RG2 PRIMARY_PACKAGE ClusterA/pkgY RECOVERY_PACKAGE ClusterB/pkgY' DATA_RECEIVER_PACKAGE ClusterB/pkgR1 RECOVERY_GROUP_NAME RG3 PRIMARY_PACKAGE ClusterB/pkgZ RECOVERY_PACKAGE ClusterA/pkgZ' RECOVERY_GROUP_NAME RG4 PRIMARY_PACKAGE ClusterB/pkgW RECOVERY_PACKAGE ClusterA/pkgW' DATA_RECEIVER_PACKAGE ClusterA/pkgR2 ### Section 3.
Monitoring Definitions
CLUSTER_EVENT ClusterA/DOWN MONITORING_CLUSTER ClusterB CLUSTER_ALERT 60 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/data/events.log NOTIFICATION SYSLOG “CC alert: DOWN” CLUSTER_ALARM 90 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/data/events.log NOTIFICATION SYSLOG “CC alarm: DOWN”
“CC alert: DOWN”
“CC alarm: DOWN”
Sample output ### Section1. Cluster Information CONTINENTAL_CLUSTER_NAME Sample_CC_Cluster CLUSTER_NAME ClusterA CLUSTER_DOMAIN cup.hp.com NODE_NAME node1 NODE_NAME node2 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS CLUSTER_NAME ClusterB CLUSTER_DOMAIN cup.hp.com
Restoring Disaster Tolerance
103
NODE_NAME NODE_NAME
node3 node4
### Section 2. Recovery Groups RECOVERY_GROUP_NAME RG1 PRIMARY_PACKAGE ClusterB/pkgX' RECOVERY_PACKAGE ClusterA/pkgX RECOVERY_GROUP_NAME RG2 PRIMARY_PACKAGE ClusterB/pkgY' RECOVERY_PACKAGE ClusterA/pkgY DATA_RECEIVER_PACKAGE ClusterA/pkgR1 RECOVERY_GROUP_NAME RG3 PRIMARY_PACKAGE ClusterB/pkgZ RECOVERY_PACKAGE ClusterA/pkgZ' RECOVERY_GROUP_NAME RG4 PRIMARY_PACKAGE ClusterB/pkgW RECOVERY_PACKAGE ClusterA/pkgW' DATA_RECEIVER_PACKAGE ClusterA/pkgR2 ### Section 3. Monitoring Definitions CLUSTER_EVENT ClusterB/DOWN MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: DOWN” CLUSTER_ALARM 0 MINUTES NOTIFICATION SYSLOG “CC alarm: DOWN”CLUSTER_EVENT ClusterB/UNREACHABLE MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: UNREACHABLE” CLUSTER_ALARM 0 MINUTES NOTIFICATION SYSLOG “ CC alarm: UNREACHABLE”CLUSTER_EVENT ClusterB/ERROR MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: ERROR”CLUSTER_EVENT ClusterB/UP MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: UP”
Newly Created Cluster Will Run Primary Packages After creating a new cluster to replace the damaged cluster, restore the critical applications to the new cluster and restore the other cluster to its role as a backup for the recovered packages. 1.
2.
Configure the new cluster as a Serviceguard cluster. Use the cmviewcl command on the surviving cluster and compare the results to the new cluster configuration. Correct any inconsistencies on the new cluster. Halt the monitor package on the surviving recovery cluster. # cmhaltpkg ccmonpkg
104
Designing a Continental Cluster
3.
Edit the continental cluster configuration file to replace the data from the old failed cluster with data from the new cluster. Check and apply the Continentalclusters configuration. # cmcheckconcl -v -C cmconcl.config # cmapplyconcl -v -C cmconcl.config
4.
Do the following for each recovery group where the new cluster will run the primary package: a. Synchronize the data from the disks on the surviving recovery cluster to the disks on the new cluster. This may be time-consuming. b. Halt the application on the surviving recovery cluster if necessary, and start it on the new cluster. c. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.
5.
If the new cluster acts as a recovery cluster for any recovery group, create a monitor package for the new cluster. Apply the configuration of the new monitor package. # cmapplyconf -p ccmonpkg.config
6.
Restart the monitor package on the surviving cluster. # cmrunpkg ccmonpkg
7.
View the status of the Continentalcluster. # cmviewconcl
Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups After replacing the failed cluster, if the downtime involved in moving the applications back is a concern, then do the following: • Change the surviving cluster to the role of primary cluster for all recovery groups. • Configure the new cluster as a recovery cluster for all those groups Configure the new cluster as a standard Serviceguard cluster, and follow the usual procedure to configure the continental cluster with the new cluster used as a recovery cluster for all recovery groups.
Restoring Disaster Tolerance
105
NOTE: In a multiple recovery pairs scenario, (where more than one primary cluster is configured to share the same recovery cluster), reconfiguration of the recovery cluster should not be done due to the failure of one of the primary clusters.
Performing a Rehearsal Operation in your Environment Use the cmrecovercl -r -g command to start the disaster recovery rehearsal process in your environment. This command checks for the following prerequisites before starting the rehearsal process: • •
The recovery group is in the maintenance mode. The data receiver package, if configured in the recovery group, is halted and disabled in the recovery cluster.
The rehearsal package runs regardless of the state of the primary cluster. When the rehearsal is in progress, any attempt to start the recovery package is prevented as the recovery group is in the maintenance mode. This prevents the recovery and the rehearsal packages from running at the same time on the recovery cluster. Following is an example of running the cmrecovercl -r command to rehearse the recovery group oracle_rac1 on a cluster called secondary cluster. atlanta:/opt/cmconcl/admin/instances>cmrecovercl -r -g oracle_rac1 Warning: For this recovery group ensure that the replication environment has been prepared for rehearsal. Before proceeding further, verify that a business copy has been prepared at the recovery cluster. This command does not verify that a business copy has been prepared.Do you want to proceed with rehearsing the recovery group? [y/n]? ycmrecovercl: Attempting to rehearse Recovery Group oracle_rac1 on cluster secondary_cluster.Note: The configuration file /etc/cmconcl/ccrac/ccrac.config for cluster shared storage exists. If the primary package in the target group is configured within this file,the replication environment preparation will be verified before starting the rehearsal package. If you choose "n" make sure that the required storage for the rehearsal package has been properly prepared and that the replication environment has being prepared.Is this what you intended to do? [y/n]? yEnabling rehearsal package racp-cfs-rehearsal on recovery cluster secondary_cluster Running package racp-cfs-rehearsal on node atlanta Successfully started package racp-cfs-rehearsal on node atlanta Running package racp-cfs-rehearsal on node miami Successfully started package racp-cfs-rehearsal on node miami Successfully started package racp-cfs-rehearsal. 106
Designing a Continental Cluster
cmrecovercl -r
Completed rehearsal process for each recovery group. Rehearsal packages have been started. Use cmviewcl or check package log file to verify that the rehearsal packages are successfully started.
Warning: Once the rehearsal is complete and the rehearsal package is halted ensure that replication environment is restored for recovery and move recovery group out of Maintenance Mode. During rehearsal, if a primary site failure occurs, Continentalclusters detects it and you need to complete a recovery process. You need to restore the environment for recovery and complete the recovery processes. In case the recovery group data cannot be synchronized with the latest data from the primary cluster, you can use the business copy (BC/BCV) prepared during the preparation phase. However, this results in a delta data loss corresponding to the time the rehearsal was started. For more information on performing disaster recovery (DR) rehearsal for different types of applications and replication in a Continentalclusters environment, see Appendix G. This appendix describes how to set up and run data replication (DR) rehearsal with the example of a single instance Oracle application with Continentalclusters with Continuous Access XP integration. For additional examples of setting up and running DR rehearsal in different environments, see the Disaster Recovery Rehearsal in Continentalclusters whitepaper available at: http://docs.hp.com.
Maintaining a Continental Cluster The following common maintenance tasks are described in this section: • • • • • • • • •
Adding a Node to a Cluster or Removing a Node from a Cluster Adding a Package to a Continental Cluster Removing a Rehearsal Package from a Recovery Group Modifying a Recovery Group with a new Rehearsal Package Removing a Package from the Continental Cluster Changing Monitoring Definitions Checking the Status of Clusters, Nodes and Packages Reviewing Log Files Renaming a Continental Cluster
Maintaining a Continental Cluster
107
• •
Deleting a Continental Cluster configuration Checking Java Versions
CAUTION: Never issue the cmrunpkg command for a recovery package when Continentalclusters is enabled, because there is no guaranteed way of preventing a package that is running on the one cluster from running on the other cluster if the package is started using this command. The potential for data corruption is significant.
Adding a Node to a Cluster or Removing a Node from a Cluster To add a node to or remove a node from the continental cluster, use the following procedure: 1.
Halt any monitor packages that are running both clusters. # cmhaltpkg ccmonpkg
2.
Add or remove the node in a cluster by editing the Serviceguard cluster configuration file and applying the configuration. # cmapplyconf -C cluster.config
3. 4.
5. 6. 7.
Edit the Continentalclusters configuration ASCII file to add or remove the node in the cluster. For added nodes, ensure that the /etc/cmcluster/cmclnodelist and /etc/ opt/cmom/cmomhosts files are set up correctly on the new node. Refer to “Preparing Security Files” (page 65). Ensure that the cmclnodelist and cmomhosts files on all nodes (including the new node) contains an entry allowing write access by the host on which you are running the configuration commands. Check and apply the configuration using the cmcheckconcl and cmapplyconcl commands. Restart the monitor packages on both clusters. View the status of the continental cluster. # cmviewconcl
Adding a Package to the Continental Cluster To add a new package for possible recovery to the Continentalclusters configuration, it is necessary to first configure a new primary package and recovery package, then you must add a new recovery group to the Continentalclusters configuration file. In addition, it is necessary to ensure that the data replication is provided for the new package, either through hardware or software. Adding a new package does not require bringing down either cluster. However, in order to implement the new configuration, the following are required:
108
Designing a Continental Cluster
1. 2. 3. 4. 5.
6. 7. 8. 9.
Configure the new primary and recovery packages by editing the new package configuration files and control scripts. Use the Serviceguard cmapplyconf command to add the primary package to one cluster, and the recovery package to the other cluster. Provide the appropriate data replication for the new package. Create the new recovery group in the Continentalclusters configuration file. Ensure that the cmclnodelist and cmomhosts files on all nodes contains an entry allowing write access by the host on which you are running the configuration commands. Halt the monitor packages on both clusters. Use the cmapplyconcl command to apply the new Continentalclusters configuration. Restart the monitor packages on both clusters. View the status of the continental cluster. # cmviewconcl
Removing a Rehearsal Package from a Recovery Group To remove a rehearsal package from a recovery group, you must move the recovery group out of the maintenance mode and then delete the rehearsal package from the recovery cluster. Also, you need to update the Continentalclusters configuration file by removing the REHEARSAL_PACKAGE parameter in the recovery group definition. Distribute the Continentalclusters configuration by reapplying the configuration file.
Modifying a Recovery Group with a new Rehearsal Package To change the rehearsal package configured for a recovery group, you need to first move the recovery group out of the maintenance mode. Then the old rehearsal package must be deleted from the recovery cluster and the new rehearsal package must be configured in the recovery cluster. Update the Continentalclusters configuration file by specifying the new rehearsal package name for the REHEARSAL_PACKAGE parameter in the recovery group definition. Distribute the Continentalclusters configuration by reapplying the configuration file.
Removing a Package from the Continental Cluster To remove a package from the Continentalclusters configuration, you must first remove the recovery group from the Continentalclusters configuration file. Removing the package does not require you to bring down either cluster. However, in order to implement the new configuration, the following steps are required: 1. 2.
Edit the continental clusters configuration file, deleting the recovery group. Halt the monitor packages that are running on the clusters.
Maintaining a Continental Cluster
109
3. 4. 5. 6.
Use the cmapplyconcl command to apply the new Continentalclusters configuration. Restart the monitor packages on both clusters. Use the Serviceguard cmdeleteconf command to remove each package in the recovery group. View the status of the continental cluster. # cmviewconcl
Changing Monitoring Definitions It is allowable to change the monitoring definitions in the configuration without bringing down either cluster. This includes: adding, removing, or changing the cluster events, changing the timings, and adding, removing, or changing the notification messages. Use the following steps to change the monitoring definitions: 1. 2. 3. 4. 5.
Edit the continental clusters configuration file to incorporate the new or changed monitoring definitions. Halt the monitor packages on both clusters. Use the cmapplyconcl command to apply the new configuration. Restart the monitor packages on both clusters. View the status of the continental cluster. # cmviewconcl
Checking the Status of Clusters, Nodes, and Packages To check on the status of the continental clusters and associated packages, use the cmviewconcl command, which lists the status of the clusters, associated package status, and configured events status. This command also displays the mode of the recovery group, if configured. The following is an example of cmviewconcl output in a situation where there is a single recovery group for which the primary cluster is cjc838 and the recovery cluster is cjc1234. # cmviewconcl WARNING: Primary cluster cjc838 is in an alarm state (cmrecovercl is enabled on recovery cluster cjc1234) CONTINENTAL CLUSTER cjccc1 RECOVERY CLUSTER cjc1234 PRIMARY CLUSTER STATUS EVENT LEVEL cjc838 down ALARM PACKAGE RECOVERY GROUP
110
Designing a Continental Cluster
prg1
POLLING INTERVAL 20
MAINTENANCE MODE PACKAGE ROLE cjc1234/recovery cjc1234/rehearsal
NO STATUS cjc838/primary primary down recovery up rehearsal down
The following is an example of cmviewconcl output from a primary cluster that is down. persian (root 2131): cmviewconcl -v WARNING: Primary cluster cjc838 is in an alarm state (cmrecovercl is enabled on recovery cluster cjc1234) Primary cluster cjc838 is not configured to monitor recovery cluster cjc1234 CONTINENTAL CLUSTER cjccc1 RECOVERY CLUSTER cjc1234 PRIMARY CLUSTER cjc838 CONFIGURED EVENT alert alarm alarm alert alert alert
STATUS down
EVENT LEVEL ALARM
STATUS unreachable unreachable down error up up
DURATION 15 sec 30 sec 0 sec 0 sec 20 sec 40 sec
POLLING INTERVAL 20 LAST NOTIFICATION SENT --Fri May 12 12:13:06 PDT 2000 ----
PACKAGE RECOVERY GROUP prg1 MAINTENANCE MODE NO PACKAGE ROLE STATUS cjc838/primary primary down cjc1234/recovery recovery up cjc1234/rehearsal rehearsal down
The following is the output of a cmviewconcl command that displays data for a mutual recovery configuration in which each cluster has both the primary and the recovery roles—the primary role for one recovery group and the recovery role for the other recovery group: CONTINENTAL CLUSTER RECOVERY CLUSTER PRIMARY CLUSTER PTST_sanfran
ccluster1
PTST_dts1 STATUS Unmonitored
EVENT LEVEL POLLING INTERVAL unmonitored 1 min
CONFIGURED EVENT STATUS DURATION alert unreachable 1 min alert unreachable 2 min alarm unreachable 3 min alert down 1 min
LAST NOTIFICATION SENT ----Maintaining a Continental Cluster
111
alert alarm alert alert
down down error up
2 3 0 1
min min sec min
RECOVERY CLUSTER PRIMARY CLUSTER PTST_dts1
PTST_sanfran STATUS Unmonitored
CONFIGURED EVENT alert alert alarm alert alert alarm alert alert
STATUS DURATION unreachable 1 min unreachable 2 min unreachable 3 min down 1 min down 2 min down 3 min error 0 sec up 1 min
-----
EVENT LEVEL unmonitored
LAST NOTIFICATION SENT ---------
PACKAGE RECOVERY GROUP hpgroup10 PACKAGE ROLE PTST_sanfran/PACKAGE1 primary TST_dts1/PACKAGE1 recovery
PACKAGE RECOVERY GROUP hpgroup20 PACKAGE PTST_dts1/PACKAGE1x_ld PTST_sanfran/PACKAGE1x_ld
POLLING INTERVAL 1 min
ROLE primary recovery
STATUS down down
STATUS down down
For a more comprehensive status of component clusters, nodes, and packages, use the cmviewclcommand on both clusters. On each cluster, make note of which nodes the primary packages are running on, as well as data sender and data receiver packages, if they are being used for logical data replication. Verify that the monitor is running on each cluster on which it is configured. The following is an example of cmviewcl output for a cluster (nycluster) running a monitor package. Note that the recovery package salespkg_bak is not running, and is shown as an unowned package. This is the expected display while the other cluster is running salespkg. CLUSTER nycluster
STATUS up
NODE nynode1
STATUS up
Network Parameters: INTERFACE STATUS PRIMARY up PRIMARY up
112
Designing a Continental Cluster
STATE running
PATH
NAME 12.1 56.1
lan0 lan2
NODE nynode2
STATUS up
STATE: running
Network Parameters: INTERFACE STATUS PRIMARY up PRIMARY up PACKAGE ccmonpkg
STATUS up
PATH
STATE
NAME 4.1 56.1
lan0 lan1
PKG_SWITCH NODE running enabled
Script_Parameters: ITEM NAME STATUS Service ccmonpkg.srv up
MAX_RESTARTS 20
RESTARTS 0
Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING NAME Primary up enabled Alternate up enabled UNOWNED Packages: PACKAGE STATUS salespkg_bak down
STATE unowned
nynode2
PKG_SWITCH
nynode2 nynode1
(current)
NODE
Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover unknown Failback unknown Script_Parameters: ITEM STATUS Subnet unknown Subnet unknown
NODE_NAME nynode1 nynode2
Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary down Alternate down
NAME 195.14.171.0 195.14.171.0
NAME nynode1 nynode2
Use the ps command to check for the status of the Continentalclusters monitor daemons cmclrmond and cmclsentryd, which should be running on the cluster node where the monitor package is running.
Reviewing Messages and Log Files The Continentalclusters commands—cmquerycl, cmcheckconcl, cmapplyconcl, and cmrecovercl—all display messages on the standard output, which is the first place to look for error messages. All notification messages associated with cluster events are reported in /var/opt/ resmon/log/cc/eventlog on the cluster where monitoring is taking place. An example of output from this file follows: Maintaining a Continental Cluster
113
>-----Event Monitoring Service Event Notification ------------< Notification Time: Wed Nov 10 21:00:39 1999 system1 sent Event Monitor notification information: /cluster/concl/ccluster1/clusters/LAclust/status/unreachable is = 15 User Comments: Cluster "LAclust" has status "unreachable" for 15 sec >-----End Event Monitoring Service Event Notification ----------<
In addition, if you have defined a TEXTLOG destination, notification messages are sent to the file that were specified. (See “Editing Section 3—Monitoring Definitions” (page 79) for more information.) Also review the monitor startup and shutdown log file /etc/cmcluster/ccmonpkg/ ccmonpkg.cntl.log on any node where a Continentalclusters monitor has been running. Information about the primary or recovery packages may be found in their respective startup and shutdown log files. Messages from the Continentalclusters daemon are reported in log file /var/adm/ cmconcl/sentryd.log, and Object Manager messages appear in /var/opt/cmom/ cmomd.log. These messages may be helpful in troubleshooting. Use the cmreadlog command to view the entries in these files. Examples: # /opt/cmom/tools/bin/cmreadlog -f /var/adm/cmconcl/sentryd.log slog.txt # /opt/cmom/tools/bin/cmreadlog -f /var/opt/cmom/cmomd.log \ omlog.txt The following is sample output from the cmreadlog command for the sentryd.log file:Oct 20 18:28:22:[[main,5,main]]:FATAL:dr.sentryd:No continental cluster found on this node.Oct 22 13:38:45:[[Thread-309,5,main]]:ERROR:dr.sentryd:Error connecting to axe28Oct 22 13:38:45:[[Thread-309,5,main]]:ERROR:dr.sentryd:Connection refusedOct 22 13:38:45:[[Thread-309,5,main]]:INFO:dr.sentryd:Connection failed to axe28Oct 22 13:38:45:[[Thread-311,5,main]]:ERROR:dr.sentryd:Cannot find cluster KC-cluster at location axe29Oct 22 13:38:45:[[Thread-311,5,main]]:ERROR:dr.sentryd:null result from query General information about Serviceguard operation is found in /var/adm/syslog/ syslog.log.
Deleting a Continental Cluster Configuration The cmdeleteconcl command is used to delete the configuration on all nodes in the continental cluster configuration. To delete a continental cluster and the Continentalclusters configuration. 114
Designing a Continental Cluster
# cmdeleteconcl NOTE: If modifying the configuration, re-issue the cmapplyconcl command. There is no need to delete the previous configuration. While deleting a Continentalcluster configured with the recovery group maintenance feature, the shared disk is not removed. Before applying a fresh Continentalclusters configuration using an old shared disk, you must re-initialize the file system in the shared disk using the mkfs command.
Renaming a Continental Cluster To rename an existing continental cluster, perform the following steps: 1.
Remove the continental clusters configuration. # cmdeleteconcl
2.
Edit the CONTINENTAL_CLUSTER_NAME field in the configuration ASCII file, and run the cmapplyconcl command to configure the continental cluster with a new name.
Checking Java File Versions Some components of Continentalclusters are executed from Java .jar files. To obtain version information about these files, use the what.sh script provided in the /opt/ cmconcl/jar directory. Example: # /opt/cmconcl/jar/what.sh configcl.jar
Next Steps To implement the continental cluster design using physical data replication, use the procedures in the following sections below: • • •
Chapter 3: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP” (page 133) Chapter 4: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA” (page 185) Chapter 5: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF” (page 227)
Support for Oracle RAC Instances in a Continentalclusters Environment Support for Oracle RAC instances means that the RAC instances running on the primary cluster will be restarted by Continentalclusters on the recovery cluster to continue serving the clients' databases requests upon a primary cluster failure. Figure 2-11 is a sample of Oracle RAC instances running in the Continentalclusters environment.
Support for Oracle RAC Instances in a Continentalclusters Environment
115
Figure 2-11 Oracle RAC Instances in a Continentalclusters Environment New York Secondary Serviceguard Cluster NYnode2
NYnode1
Highly Available Network Los Angeles Primary Serviceguard Cluster Running Oracle RAC
xp LAnode1 Disk RAC Instance1 Array
XP Disk Array
WAN
LAnode2 RAC Instance2
Continuous Access XP or Continuous Access EVA or EMC SRDF Data Replication
As shown in the above example, Oracle RAC instances are configured to run in Serviceguard packages. The instance packages are running on the primary cluster and will be recovered on the recovery cluster upon a primary cluster failure. Figure 2-12 shows a recovery using an Oracle RAC configuration after failover. Oracle RAC instances are only supported in the Continentalclusters environment for physical replication using HP StorageWorks Continuous Access XP, or EMC Symmetrix Remote Data Facility (SRDF) using HP SLVM or Serviceguard Storage Management Suite using CFS for volume management. Continentalclusters support for Oracle instances using HP StorageWorks Continuous Access EVA is supported only with SLVM software. Continentalclusters Oracle RAC support is available for a cluster environment configured with only Serviceguard (for example, the environment running with Oracle 9i), or a cluster environment configured with Serviceguard plus Oracle Clusterware (for example, the environment running with Oracle 10g). Starting with Continentalclusters version A.05.01, recovery of an Oracle RAC instance in a cluster environment running Serviceguard and Oracle Clusterware is supported. There is a special configuration required for the environment running both Oracle Clusterware and Serviceguard/Serviceguard Extension for RAC (SGeRAC) for the Continentalclusters RAC instance recovery protection. For more information refer to the following section, “Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration” (page 126). 116
Designing a Continental Cluster
Figure 2-12 Sample Oracle RAC Instances in a Continentalclusters Environment After Failover New York Secondary Serviceguard Cluster Running Oracle RAC NYnode1 RAC Instance1
Highly Available Network
Los Angeles Primary Serviceguard Cluster
NYnode2 XP Disk Array
RAC Instance2
WAN
X LAnode1
LAnode2 Disk Array
Continuous Access XP or Continuous Access EVA or EMC SRDF Data Replication
Configuring the Environment for Continentalclusters to Support Oracle RAC In order to enable Continentalclusters support for Oracle RAC, there needs to be a set of configurations, which include either Continuous Access XP, or Continuous Access EVA, or EMC SRDF, Oracle RAC, and Continentalclusters. To support this feature, Continentalclusters must be configured with an environment that has physical replication set up using HP StorageWorks Continuous Access XP, HP StorageWorks Continuous Access EVA or EMC Symmetrix Remote Data Facility (SRDF) using SLVM or Cluster Volume Manager (CVM) or Cluster File System (CFS) for volume management. For more information on specific Oracle RAC configurations that are supported, refer Table 2-8. For complete installation and configuration information of Oracle and HP StorageWorks products, refer to the Oracle RAC and HP StorageWorks manuals. Table 2–8 describes configuration information for RAC support of Continentalclusters
Support for Oracle RAC Instances in a Continentalclusters Environment
117
Table 2-8 Supported Continentalclusters and RAC Configuration Oracle RAC
Disk Arrays
Oracle RAC with/without Clusterware
Volume Managers
Cluster File System
Required Metrocluster
HP StorageWorks HP SLVM XP Series with Serviceguard Continuous Access Storage Management CVM
Serviceguard Storage Management Suite CFS
Metrocluster with Continuous Access with XP
HP StorageWorks HP SLVM EVA series with Serviceguard Continuous Access Storage Management CVM
Serviceguard Storage Management Suite CFS
Metrocluster with Continuous Access with EVA
EMC Symmetrix series with SRDF
Serviceguard Storage Management Suite CFS
Metrocluster with EMC SRDF version
HP SLVM Serviceguard Storage Management CVM
Use the following set of procedures to enable Continentalclusters recovery support for Oracle RAC instances: 1.
2.
Configure either Continuous Access XP, or Continuous Access EVA or EMC SRDF for data replication between disk arrays associated with primary and recovery clusters. For more details, see Chapter 3: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP”, Chapter 4: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA”, or Chapter 5: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF”. Configure the database storage using one of the following software: • Shared Logical Volume Manager (SLVM) • Cluster Volume Manager (CVM) • Cluster File Systems (CFS) You need to configure the SLVM volume groups or CVM disk groups on the disk arrays to store the Oracle database. Configure the volume groups or disk groups on both primary and recovery clusters. Ensure that the volume groups names or disk group names on both clusters are identical. You must also setup data replication between the disk arrays associated with primary and recovery clusters. Only the volume groups or disk groups configured to store the database must be configured for replication across primary and recovery clusters. In the environment running with Oracle Clusterware, you must configure the storage used by Oracle Clusterware to reside on disks that are not replicated.
118
Designing a Continental Cluster
If you use CVM or CFS in your environment for storage infrastructure, you need to complete the following steps at both, primary and recovery clusters. a. Make sure that the primary and recovery clusters are running. b. Configure and start the CFS or CVM multi-node package using the command cfscluster config -s. When CVM starts, it automatically selects the master node. This master node is the node from which you must issue the disk group configuration commands. To determine the master node, run the following command from any node in the cluster. # vxdctl -c mode c. Create disk groups and mount points. For more information on creating disk groups and mount points, refer to Using Serviceguard Extension for RAC User’s Guide. NOTE: When you use CVM disk groups, Continentalclusters does not support configuring the CVM disk groups in the RAC instance package files using the CVM_ACTIVATION_CMD and CVM_DISK_GROUP variables. The instance packages should be configured to have a dependency with the required CVM disk group multi-node package. d. Run the following commands of the CFS scripts to add and configure the disk groups and file system mount points multi-node packages (MNP) to the clusters. These multi-node packages manipulate the disk group, and mount-point activities in the cluster. • cfsdgadm add all=SW For example: cfsdgadm add racdgl all=SW •
cfsmntadm add / all=SW For example: cfsmntadm add racdgl vol4 /cfs/mntl all=SW
e. Set the AUTO_RUN flag to NO with the following commands: • cfsdgadm set_autorun NO • cfsmntadm set_autorun < mount point name> NO f. Activate the disk group MNP using the following command: cfsdgadm activate g. Start the mount point MNP using the following command: cfsmount
Support for Oracle RAC Instances in a Continentalclusters Environment
119
NOTE: After you configure the disk group and mount point multi-node packages, you must deactivate the packages on the recovery cluster. During a recovery process, the cmrecovercl command automatically activates these multi-node packages. h. Set the access rights for volumes and disk groups to persistent using the following command: vxedit -g set user= group= set mode= This step is required because when you import disks or volume groups to the recovery site, the access rights for the imported disks or volume groups are set to root by default. As a result, the database instances do not start. To eliminate this behavior, you must set the access rights to persistent. 3.
Configure Oracle RAC. You need to configure all the database files to reside on SLVM volume groups, CVM disk groups or CFS file systems that you have configured in your environment. Ensure that the configuration of the Oracle RAC instances that must be recovered in the Continentalclusters environment are identical on the primary and recovery clusters. For more information on configuring Oracle RAC, refer to the Oracle RAC installation and configuration user’s guide. If you have Oracle Clusterware and Serviceguard running in your environment, you need to complete certain additional configuration procedures. For more information on these configuration procedures, see “Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration” (page 126).
4. 5.
Configure Continentalclusters. For more information on configuring Continentalclusters, see “Building the Continentalclusters Configuration” (page 65). Configure Oracle RAC instances in Serviceguard packages. Continentalclusters supports recovery only for applications running in Serviceguard packages. In a multiple recovery pair scenario, where more than one primary cluster share the same recovery cluster, the primary RAC instance package name must be unique on each primary cluster. Configure the Oracle RAC instance packages on both primary and recovery clusters based on the number of RAC instances configured to run on that cluster. Ensure that the same number of Oracle RAC instances are configured on both the primary and recovery clusters. This ensures Continentalclusters recovery protection. Set the AUTO_RUN parameter in the package configuration file to NO. For details on how to configure an Oracle RAC instance in a Serviceguard package, refer to the Using Serviceguard Extension for RAC user’s guide. In the Continentalclusters environment, you can configure each RAC instance in a failover type package or you can configure all RAC instances in a single multi-node package.
120
Designing a Continental Cluster
6.
Setup the environment file. Instead of one environment file for each continental cluster application package, there is only one environment file for each set of Oracle RAC instance packages accessing the same database. This file can be located anywhere except the directory where the Oracle RAC instance package configuration and control files reside. Only one environment file can reside under one directory. The setup of the file is the same as what is described in section, “Physical Data Replication using Special Environment files” (page 54) of this chapter, with the exception of the PKGDIRvariable. The value of the PKGDIR variable must be the directory where this environment file resides. For specific information on how to setup the environment file, see Chapter 3 under section, “Configuring Packages for Disaster Recovery” Chapter 4 under section, “Configuring Packages for Automatic Disaster Recovery” or Chapter 5 under section “Configuring Serviceguard Packages for Automatic Disaster Recovery”. Be sure to place this environment file in the same path on all nodes of both the primary and recovery clusters in a recovery pair. You must name the environment file using your package name as the prefix. For example, _xpca.env. You must uncomment all the AUTO variables in the environment file. Based on the disk arrays in your environment, refer to the corresponding chapters of this manual for more information on configuring the environment file for your storage.
7.
Set up the Continentalclusters Oracle RAC specification file. The existence of file /etc/cmconcl/ccrac/ccrac.config servers as an enabler for Continentalclusters Oracle RAC support. A template of this file is available in /opt/cmconcl/scripts directory. Edit this file to suit your environment. After editing, move the file to /etc/ cmconcl/ccrac/ccrac.config directory on all nodes in the participating clusters. Use the following steps to set up the file: a. Login as root on one node of the primary cluster. b. Change to your own directory: # cd c. Copy the file: # cp /opt/cmconcl/scripts/ccrac.config \ ccrac.config.mycopy d. Edit the file ccrac.config.mycopy to fit your environment. The following parameters need to be edited: CCRAC_ENV - fully qualified Metrocluster environment file name. This file naming convention as required by the Metrocluster software. It has to be appended with Support for Oracle RAC Instances in a Continentalclusters Environment
121
_.env where is the name of the data replication scheme being used. Refer to Metrocluster documents for the environment file naming convention. This parameter is mandatory CCRAC_SLVM_VGS - SLVM volume groups configured for the device specified in the above environment file for variable DEVICE_GROUP. These are the volume groups used by the associated RAC instance packages. It is important that all of the volume groups configured for the specified DEVICE_GROUP are listed. If only partial of the configured volume groups are listed, the device will not be prepared properly and the storage will result in an inconsistent state. This parameter is mandatory when SLVM volume groups are used. This parameter should not be declared when only CVM disk groups are used. CCRAC_CVM_DGS - CVM disk groups configured for the device specified in the above environment file for variable DEVICE_GROUP. These are the disk groups used by the associated RAC instance packages. It is important that all of the disk groups configured for the specified DEVICE_GROUP are listed. If only partial of the configured disk groups are listed, the device will not be prepared properly and the storage will result in an inconsistent state. This parameter is mandatory when CVM disk groups or CFS are used. This parameter cannot be declared when SLVM volume groups are used. CCRAC_INSTANCE_PKGS - the names of the configured RAC instance packages accessing in parallel the database stored in the specified volume groups. This parameter is mandatory. CCRAC_CLUSTER - Serviceguard cluster name configured as the primary cluster of the corresponding RAC instance package set. This parameter is mandatory. CCRAC_ENV_LOG - logfile specification for the storage preparation output.
122
Designing a Continental Cluster
This parameter is optional. If not specified, ${CCRAC_ENV}.log will be used.Sample setup: CCRAC_ENV[0]=/etc/cmconcl/ccrac/db1/db1EnvFile_xpca.env CCRAC_SLVM_VGS[0]=ccracvg1 ccracvg2 CCRAC_INSTANCE_PKGS[0]=ccracPkg1 ccracPkg2 CCRAC_CLUSTER[0]=PriCluster1 CCRAC_ENV_LOG[0]=/tmp/db1_prep.log
(Multiple values for CCRAC_SLVM_VGS and CCRAC_INSTANCE_PKGS should be separated by space). If multiple sets of Oracle instances accessing different databases are configured in your environment and need Continentalclusters recovery support, repeat this set of parameters with an incremented index. For example, CCRAC_ENV[0]=/etc/cmconcl/ccrac/db1/db1EnvFile_xpca.env CCRAC_SLVM_VGS[0]=ccracvg1 ccracvg2CCRAC_INSTANCE_PKGS[0]=ccracPkg1 ccracPkg2CCRAC_CLUSTER[0]=PriCluster1 CCRAC_ENV_LOG[0]=/tmp/db1_prep.log CCRAC_ENV[1]=/etc/cmconcl/ccrac/db2/db2EnvFile_srdf.env CCRAC_CVM_DGS[1]=racdg01 racdg02 CCRAC_INSTANCE_PKGS[1]=ccracPkg3 ccrac Pkg4CCRAC_CLUSTER[1]=PriCluster2 CCRAC_ENV_LOG[1]=/tmp/db2_prep.log CCRAC_ENV[2]=/etc/cmconcl/ccrac/db3/db3EnvFile_xpca.env CCRAC_SLVM_VGS[2]=ccracvg5 ccracvg6 CCRAC_INSTANCE_PKGS[2]=ccracPkg5 ccracPkg6 CCRAC_CLUSTER[2]=PriCluster2
e. Copy the edited file to the final directory: # cp ccrac.config.mycopy \/etc/cmconcl/ccrac/ccrac.config f. Copy file /etc/cmconcl/ccrac/ccrac.config to all the other nodes of the cluster. g. Login as root on one node of the recovery cluster and repeat steps “b” through “f” from above. If the recovery cluster is configured to recover the Oracle RAC instances for more than one primary cluster, the ccrac.config file on the recovery cluster should contain information for all the primary clusters. 8.
Configure Continentalclusters Recovery Group for Oracle RAC instance. If you are using an individual package for each RAC instance, define one recovery group for each Oracle RAC instance recovery. The PRIMARY_PACKAGE specified for the Oracle RAC instance recovery group is the name of the instance package configured on the primary cluster. The RECOVERY_PACKAGE specified for the RAC instance recovery group is the corresponding instance package name configured on the recovery cluster. For example: Support for Oracle RAC Instances in a Continentalclusters Environment
123
RECOVERY_GROUP_NAME instanceRG1 PRIMARY_PACKAGE ClusterA/instancepkg1 RECOVERY_PACKAGE ClusterB/instancepkg1' RECOVERY_GROUP_NAME instanceRG2 PRIMARY_PACKAGE ClusterA/instancepkg2 RECOVERY_PACKAGE ClusterB/instancepkg2'
Packages instancepkg1 and instancepkg2 are configured to run on primary cluster “ClusterA”. Packages instancepkg1’ and instancepkg2’are configured to be restarted or recovered on the recovery cluster “ClusterB” upon primary cluster failure. If you are using one multi-node package to package all RAC instances, define only one recovery group for the RAC MNP Package. For example. RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE
manufacturing_recovery ClusterA/man_rac_mnp ClusterB/man_rac_mnp
When recovering a recovery group with multi-node packages, Continentalcluster will start an instance in each cluster node configured in the MNP. After editing the Continentalclusters configuration file to add in the recovery group specification for Oracle RAC instance packages, you must manually apply the new configuration by running the cmapplyconcl command. When you finish configuring a recovery pair with RAC support, your systems must have sets of files similar to those shown in Figure 2-13.
124
Designing a Continental Cluster
NOTE: If you are configuring Oracle RAC instances in Serviceguard packages in a CFS or CVM environment, do not specify the CVM_DISK_GROUPS, and CVM_ACTIVATION_CMD fields in the package control scripts as CVM disk group manipulation is addressed by the disk group multi node package. Figure 2-13 Continentalclusters Configuration Files in a Recovery Pair with RAC Support New York Cluster NYnode1
recovery package files
RACinstance1_bak.config RACinstance1_bak.cntl
RACinstance2_bak.config
contclust config file cmconcl.config
contclust config file cmconcl.config
contclust monitor pkg ccmonpkg.config ccmonpkg.cntl cont clust RAC spec. file /etc/cmconcl/ccrac/ccrac \ .config
Los Angeles Cluster
WAN
primary package files
RACinstance1.config RACinstance1.cntl
RACinstance2.config RACinstance2.cntl
contclust config file cmconcl.config
contclust config file cmconcl.config
contclust monitor pkg ccmonpkg.conf ccmonpkg.cntl
storage env. file /etc/cmcluster/ccrac/db1\
contclust monitor pkg ccmonpkg.config ccmonpkg.cntl / contclust RAC spec. file /etc/cmconcl/ccrac/ccrac\ .;config storage env. file /etc/cmcluster/ccrac/db1\
/db1EnvFile_xpca.env
/db1EnvFile_xpca.env
managed object files /etc/cmconcl/instances/* /
contclust monitor pkg ccmonpkg.config ccmonpkg.cntl cont.clust RAC spec. file /etc/cmconcl/ccrac/ccrac\ .config storage env. file
/db1EnvFile_xpca.env
/etc/cmcluster/ccrac/db1\ /db1EnvFile_xpca.env
managed object files
managed object files / etc/cmconcl/instances/*
LAnode2
primary package files
/ contclust RAC spec. file /etc/cmconcl/ccrac/ccrac.\ .config
RACinstance2_bak.cntl
storage env. file /etc/cmcluster/ccrac/db1\
/etc/cmconcl/instances/*
LAnode1
NYnode2
recovery package files
managed object files /etc/cmconcl/instances/* /
Support for Oracle RAC Instances in a Continentalclusters Environment
125
Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration The following are the required configurations for Continentalclusters RAC instance recovery support for the cluster environment running with Serviceguard/Serviceguard Extension for RAC and CRS (Oracle Cluster Software): 1.
2.
3.
4.
The Oracle RAC environment running with Serviceguard/Serviceguard Extension for RAC and Oracle Cluster Software should follow all the recommendations listed in the Serviceguard and SGeRAC manuals for running with CRS (Oracle Cluster Software). CRS should not activate the volume groups configured for the database automatically at startup time. The file /var/opt/oracle/oravg.conf should not exist on any node of the primary and recovery cluster. The CRS storage (OCR and voting disk) should be configured on a separate volume group than the ones for the databases which are to be accessed by the RAC instances. The RAC instance attribute AUTO_START listed in the CRS service profile should be set to 2 on both primary and recovery clusters so that the instance will not be automatically started when the node rejoins the cluster. Login as the oracle administrator and use the following steps to change the attribute value: a. Generate the resource profile. crs_stat -p instance_name > $CRS_HOME/crs/public/instance_name.cap b. Edit the resource profile and set AUTO_START value to 2. c. Register the value. crs_register -u instance_name d. Verify the value. crs_stat -p instance_name
Initial Startup of Oracle RAC Instance in a Continentalclusters Environment To ensure that the disk array will be ready for access in shared mode for the Oracle RAC instances, it is recommended that the user runs the Continentalclusters tool /opt/ cmconcl/bin/ccrac_mgmt.ksh to initially startup the configured instance packages. This tool ensures that the configured disk array will be ready in writable mode for shared access before starting up the RAC instance packages. If this tool is not used, manual checking is needed to make sure the storage is ready in writable and shared access mode before starting the RAC instance packages.
126
Designing a Continental Cluster
NOTE: It is recommended that ccrac_mgmt.ksh is used for the initial startup of the RAC instance package, or for failing back the RAC instance packages. This tool should not be used at the recovery site for recovering RAC instance packages, instead cmrecovercl is used in this case. After the initial startup, use Serviceguard commands cmhaltpkg, cmrunpkg, cmmodpkg as needed to halt and restart the packages on the primary cluster. Use the following steps on any node of the primary cluster to do the initial startup of the Oracle RAC instance packages: 1.
2. 3.
If the cluster is running with Serviceguard and Oracle CRS, make sure that the CRS daemons and the required Oracle services, such as listener, GSD, ONS, and VIP are up and running on all the nodes the RAC database instances are configured to run. Make sure /etc/cmconcl/ccrac/ccrac.config exists and was edited to contain the appropriate information. To start all the RAC instance packages configured to run as primary packages on the local cluster. # /opt/cmconcl/bin/ccrac_mgmt.ksh start To start a specific set of RAC instance packages. # /opt/cmconcl/bin/ccrac_mgmt.ksh -i start is the index used in the /etc/cmconcl/ccrac/ccrac.config file for the target set of the Oracle RAC instance packages.
4.
To stop all the RAC instance packages configured to run as primary packages on the local cluster. # /opt/cmconcl/bin/ccrac_mgmt.ksh stop To stop a specific set of RAC instance packages. # /opt/cmconcl/ccrac_mgmt.ksh -i stop is the index used in the /etc/cmconcl/ccrac/ccrac.config file for the target set of the Oracle RAC instance packages.
Failover of Oracle RAC Instances to the Recovery Site Upon a disaster that disables the primary cluster, to start up a Continentalclusters recovery process, run the following command: # cmrecovercl For the cluster environment running with Serviceguard and Oracle Clusterware, confirm that the Clusterware daemons and the required Oracle services, such as listener, GSD, ONS, and VIP, are started on all the nodes, which the database instance are configured to run before initiating the recovery process. Support for Oracle RAC Instances in a Continentalclusters Environment
127
If you have configured CFS or CVM in your environment, ensure the following: •
The SG-CFS-PKG (system multi-node package) is up and running. The SG-CFS-PKG package is not part of the continentalclusters configuration.
•
The cmrecovercl command is run from the CVM master node. Use the following command to display the CVM master node: # vxdctl -c mode Starting with Continentalclusters A.07.00, recovery groups of applications using CFS or CVM can be recovered by running the cmrecovercl command from any node at the recovery cluster.
NOTE: Make sure that the primary site is unavailable and all of the Oracle RAC instance packages are not running on the primary cluster before initiating the recovery process. The Continentalclusters command, cmrecovercl prepares the configured storage for Oracle RAC instances shared access only when the file /etc/cmconcl/ccrac/ ccrac.config exists. If this file does not exist, the configured storage will not be prepared for shared access before recovering the Oracle RAC instance packages. As a result, if Continentalclusters recovery group configuration includes Oracle RAC instance packages, these packages will not be able to start or operate successfully. The recovery process will startup the configured Oracle RAC instance packages as well as other application packages configured in the Continentalclusters environment. If the Continentalclusters Oracle RAC support is enabled (the /etc/cmconcl/ccrac/ ccrac.config file exists), the following messages will be prompted to the user when the command cmrecovercl is invoked and confirmations are needed for the process to proceed. WARNING: This command will take over for the primary cluster LACluster by starting the recovery package on the recovery cluster NYCluster. You must follow your site disaster recovery procedure to ensure that the primary packages on LACluster are not running and that recovery on NYCluster is necessary. Continuing with this command while the applications are running on the primary cluster may result in data corruption. Are you sure that the primary packages are not running and will not come back, and are you certain that you want to start the recovery packages [y/n]? y cmrecovercl: Attempting to recover Recovery Groups from cluster LACluster. NOTE: The configuration file /etc/cmconcl/ccrac/ccrac.config for cluster shared storage recovery exists. Data storage specified 128
Designing a Continental Cluster
in the file for this cluster will be prepared for this recovery process. If you choose "n" - not to prepare the storage for this recovery process, make sure that the required storage for this recovery process has been properly prepared. Is this what you intend to do [y/n]? y The Oracle RAC instance package can be started in sequence. # cmrecovercl -g Option -g is used to start up the first instance package, wait until the disk arrays are synchronized before starting up the second instance package. If option -g is used with the command cmrecovercl, the following messages will be given instead: WARNING: This command will take over for the primary cluster primary_cluster by starting the recovery package on the recovery cluster secondary_cluster. You must follow your site disaster recovery procedure to ensure that the primary packages on primary_cluster are not running and that recovery on secondary_cluster is necessary. Continuing with this command while the applications are running on the primary cluster may result in data corruption. Are you sure that the primary packages are not running and will not come back, and are you certain that you want to start the recovery packages [y/n]? y cmrecovercl: Attempting to recover RecoveryGroup subsrecovery1 on cluster secondary_cluster NOTE: The configuration file /etc/cmconcl/ccrac/ccrac.config for cluster shared storage recovery exists. If the primary package in the target group is configured within this file, the corresponding data storage will be prepared before starting the recovery package. If you choose "n" - not to prepare the storage for this recovery process, make sure that the required storage for the recovery package has been properly prepared. Is this what you intend to do [y/n]? y Enabling recovery package racp-cfs on recovery cluster secondary_cluster Running package racp-cfs\ Running package racp-cfs on node atlanta Successfully started package racp-cfs on node atlanta Running package racp-cfs on node miami Successfully started package racp-cfs on node miami Support for Oracle RAC Instances in a Continentalclusters Environment
129
Successfully started package racp-cfs. cmrecovercl: Completed recovery process for each recovery group. Recovery packages have been started. Use cmviewcl or check package log file to verify that the recovery packages are successfully started. These message prompts can be disabled by running cmrecovercl with option -y. If you have configured the Oracle RAC instance package such that there is one instance for every package, the instance or recovery group can be recovered individually. If you have configured all instances as a single multi-node package (MNP), recovering the recovery group of this package starts all instances. NOTE: At the recovery time, Continentalclusters is responsible for recovering the Oracle RAC instance packages configured. The data integrity and currency at the recovery site are based on your data replication configuration in the Oracle environment.
Failback of Oracle RAC Instances After a Failover After failover, the configured disk array at the old recovery cluster becomes the primary storage of the database. The Oracle RAC instances are running at the recovery cluster after a successful recovery. To failback the Oracle RAC instances to the primary cluster, follow the procedures listed below. Before failing back the Oracle RAC instances, make sure that the data in the original primary site disk array is in an appropriate state. Follow the disk array specific procedures for data resynchronization between two clusters, and the Oracle RAC failback procedures before restarting the instance. NOTE: Make sure the AUTO_RUN flag for all the configured Continentalclusters packages is disabled before restarting the cluster. 1. 2.
Fix the problems that caused the primary site failure. Stop the Oracle RAC instance packages running on the recovery cluster. On any node of the recovery cluster. # /opt/cmconcl/bin/ccrac_mgmt.ksh stop If you have configured CVM or CFS in your environment, you need to complete the following procedure:
130
Designing a Continental Cluster
a. Unmount the CFS mount points using the following command: cfsumount b. Deactivate the disk groups using the following command: cfsdgadm deactivate c. Deport the disk groups using the following command: vxdg deport The recovery cluster is now ready to failback packages and applications to the primary cluster. 3. 4.
Synchronize the data between the two participating clusters. Make sure that the data integrity and the data currency are at the expected level at the primary site. Verify that the primary cluster is up and running. # cmviewcl
5.
If the cluster is running with Serviceguard and Oracle CRS, make sure that CRS and the required services, such as listener, GSD, ONS, and, VIP are up and running on all of the instance nodes. By default, when CRS is started, these Oracle services are initiated. NOTE: Ensure that the SG-CFS-PKG (system multi-node) package is running for the CFS/CVM environment.
6.
Startup the Oracle RAC instance packages on the primary cluster. If you have configured CFS or CVM in your environment, issue the following command from the master node: # /opt/cmconcl/bin/ccrac_mgmt.ksh start Alternatively, you can run the command on any node in the primary cluster. This command fails back all of the RAC instance packages configured to adopt to this cluster as the primary cluster. To failback only a specific set of the Oracle RAC instance package set. # /opt/cmconcl/bin/ccrac_mgmt.ksh [-i ] \ start
is the index used in the/etc/cmconcl/ccrac/ccrac.config file for the target set of the Oracle RAC instance packages.
Rehearsing Oracle RAC Databases in Continentalclusters Special precaution is required for running disaster recovery (DR) rehearsal for Oracle RAC databases. For information on configuring and running rehearsal for RAC databases, see Disaster Recovery Rehearsal in Continentalclusters whitepaper available at: http://www.docs.hp.com. Support for Oracle RAC Instances in a Continentalclusters Environment
131
132
3 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP The HP StorageWorks Disk Array XP Series allows you to configure data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the Continuous Access XP software and the additional files that integrate the XP with Serviceguard clusters. It then shows how to configure both metropolitan and continental cluster solutions using Continuous Access XP. The topics discussed in this chapter are: • • • • • •
Files for Integrating XP Disk Arrays with Serviceguard Clusters Overview of Continuous Access XP Concepts Creating the Cluster Preparing the Cluster for Data Replication Configuring Packages for Disaster Recovery Completing and Running a Metrocluster Solution with Continuous Access XP
Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323).
Files for Integrating XP Disk Arrays with Serviceguard Clusters Metrocluster is a set of executable programs and an environmental file that work in an Serviceguard cluster to automate failover to alternate nodes in the case of disaster in metropolitan cluster. The Metrocluster/Continuous Access product contains the following files. Table 3-1 Metrocluster/Continuous Access Template Files Name
Description
/opt/cmcluster/toolkit/SGCA/xpca.env The Metrocluster/Continuous Access environmental file. This file must be customized for the specific Disk Array XP Series and HP host system configuration. Copies of this file must be customized for each separate Serviceguard package. /usr/sbin/DRCheckDiskStatus
The executable module that checks for a specific environment file in the package directory and should not be edited.
Files for Integrating XP Disk Arrays with Serviceguard Clusters
133
Table 3-1 Metrocluster/Continuous Access Template Files (continued) Name
Description
/usr/sbin/DRCheckXPCADevGrp
The program that checks the status of the XP/Continuous Access device group that is used by the package.
/usr/sbin/DRMonitorXPCADevGrp
The program that monitors the status of the XP/Continuous Access package device group, sends notification, and performs pre-defined actions on the device group.
Metrocluster/Continuous Access needs to be installed on all nodes that will run a Serviceguard package whose data are on an HP StorageWorks Disk Array XP Series, and where the data is replicated to a second XP using the Continuous Access XP facility. In the event of node failure, the integration of Metrocluster/Continuous Access with the package will allow the application to fail over in the following ways: • •
among local host systems attached to the same XP Series array between one system that is attached locally to its XP and another “remote” host that is attached locally to the other XP
Configuration of Metrocluster/Continuous Access must be done on all the cluster nodes, as is done for any other Serviceguard package. To use Metrocluster/Continuous Access, Raid Manager XP host-based software for control and status of the XP Series boxes must also be installed and configured on each HP 9000 or Integrity host system that might execute the application package.
Overview of Continuous Access XP Concepts The HP Storage Works Disk Array XP Series may be configured for use in data replication from one XP series unit to another. This type of physical data replication is a part of the Metrocluster/Continuous Access and Continentalclusters solutions. This section describes the hardware and software concepts necessary for understanding how to use Continuous Access software for physical data replication in disaster tolerant solutions.
PVOLs and SVOLs Continuous Access allows you to define primary and secondary volumes that are redundant copies of one another, as shown in Figure 3-1.
134
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Figure 3-1 XP Series Primary and Secondary Volume Definitions
XP Series Array
XP Series Array
Redundant, Continuous Access links with DWDM
BC1
PVOL
SVOL
BC3
In a continental cluster, Continuous Access links may be bidirectional.
BC2
SVOL
PVOL
BC4
Optional BCs
PVOLa
SVOLa
Optional BCs
There may be multiple P/S devices
Data Center A
Packages with primary nodes in this data center see this XP as the primary side and the XP in Data Center B as the secondary side.
Data Center B Packages with primary nodes in this data center see this XP as the primary side and the XP in Data Center A as the secondary side.
Data replication proceeds from PVOL to SVOL. When failover is necessary, the SVOL can be changed into a PVOL for access by a package on the failover node.
Device Groups and Fence Levels A device group is the set of XP devices that are used by a given package. The device group is the basis on which PVOLs and SVOLs are created. The fence level of the device group is set when you define it. All devices defined in a given device group must be configured with the same fence level. A fence level of DATA or NEVER results in synchronous data replication; a fence level of ASYNC is used to enable asynchronous data replication. Fence Level of NEVER Fence level = NEVER should only be used when the availability of the application is more important than the data currency on the remote XP disk array. In the case when all Continuous Access links fail, the application will continue to modify the data on PVOL side, however the new data is not replicated to the SVOL side. The SVOL only contains a copy of the data up to the point of Continuous Access links failure. If an additional failure, such as a system failure before the Continuous Access link is fixed, causes the application to fail over to the SVOL side, the application will have to deal with non-current data. If Fence level = NEVER is used, the data may be inconsistent in the case of a rolling disaster—additional failures taking place before the system has completely recovered from a previous failure. See an example of rolling disaster in the following section “Fence Level of DATA”. Overview of Continuous Access XP Concepts
135
Fence Level of DATA Fence level = DATA is recommended to ensure a current and consistent copy of the data on all sides. If Fence level = DATA is not enabled, the data may be inconsistent in the case of a rolling disaster—additional failures taking place before the system has completely recovered from a previous failure.Fence level = DATA is recommended, in case of Continuous Access link failure, to ensure there is no possibility of inconsistent data at the SVOL side. Since only dedicated Continuous Access links are supported, the probability of intermittent link failure and inconsistent data at the remote (SVOL) side is extremely low. Additionally, if the following sequence of events occur, it will cause inconsistent and therefore unusable data: • Fence level = DATA is not enabled. • The Continuous Access links fail. • The application continues to modify data. • The link is restored. • Resynchronization from PVOL to SVOL starts, but does not finish. • The PVOL side fails Although the risk of this sequence of events taking place is extremely low, if your business cannot afford a minimal level of risk, then enable Fence level = DATA to ensure that the data at the SVOL side are always consistent. The disadvantage of enabling Fence level = DATA is when the Continuous Access link fails, or if the entire remote (SVOL) data center fails, all I/Os will be refused (to those devices) until the Continuous Access link is restored, or manual intervention is used to split the PVOL side from the SVOL side. NOTE: Using manual intervention will allow the application to write the data to the PVOL side without replicating the data to SVOL side. The data may be inconsistent in the case of a rolling disaster. See the above example. Applications may fail or may continuously retry the I/Os (depending on the application) if Fence level = DATA is enabled and the Continuous Access link fails. NOTE: If data currency is required on all sides, Fence level = DATA should be used and manual intervention should NOT be taken when the Continuous Access links fail. Fence Level of ASYNC Fence level = ASYNC is recommended to improve performance in data replication between the primary and the remote site. The XP disk array supports asynchronous mode with guaranteed ordering. When the host does a write I/O to the XP disk array, as soon as the data is written to cache, the array sends a reply to the host. A copy of the data with a sequence number is saved in an internal buffer, known as the side file, for later transmission to the remote XP disk 136
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
array. When synchronous replication is used, the primary system cannot complete a transaction until a message is received acknowledging that data has been written to the remote site. With asynchronous replication, the transaction is completed once the data is written to the side file on the primary system, which allows I/O activity to continue even if the Continuous Access link is temporarily unavailable. The side file is 30% to 70% of cache (default 50%) that is assigned through the XP system’s Service Processor (SVP). The high water mark (HWM) is 30% of the cache as shown in Figure 3-2. However, if the quantity of data in the side file exceeds 30% then the write I/O to the side file will be delayed. The delay can be from .5 seconds to a maximum of 4 seconds, in 500 ms increments, with every 5% increase over the HWM. If the HWM continues to grow, it will eventually hit the side file threshold of 30% to 70% cache. When this limit has been reached, the XP on the primary site cannot write to the XP on the secondary site until there is enough room in the side file. The primary XP will wait until there is enough room in the side file before continuing to write. Furthermore, the primary XP will keep trying until it reaches its side file timeout value, which is configured through the SVP. If the side file timeout has been reached, then the primary XP disk array will begin tracking data on its bitmap that will be copied over to the secondary volume during resync. Figure 3-2 depicts the side file operation. Figure 3-2 XP Series Disk Array Side File Side File
Side File Area
Cache High Water Mark (30% of cache)
Writing responses are delayed
Writing waits until the quantity of data is under threshold unless timeout has been reached
Overview of Continuous Access XP Concepts
137
NOTE: The side file must be configured using the XP Service Processor (SVP). Refer to the XP Series documentation for details. In case all the Continuous Access links fail, the remaining data in the side file that has not been copied over to the SVOL will be tracked in the bit map. The application continues to modify the data on the PVOL, which will also be tracked in the bit map. The SVOL only contains a copy of the data up to the point the failure of the Continuous Access links. If an additional failure, such as a system failure before the Continuous Access link is fixed, causes the application to fail over to the SVOL side, the application will have to deal with non-current data. Continuous Access Link Timeout In asynchronous mode, when there is an Continuous Access link failure, both the PVOL and SVOL sides change to a PSUE state. When the SVOL side detects missing data blocks from the PVOL side, it will wait for those data blocks from the PVOL side until it has reached the configured Continuous Access link timeout value (set in the SVP). Once this timeout value has been reached, then the SVOL side will change to a PSUE state. The default Continuous Access link timeout value is 5 minutes (300 seconds). Consistency Group An important property of asynchronous mode volumes is the Consistency Group (CT group). A CT group is a grouping of LUNs that need to be treated the same from the perspective of data consistency (I/O ordering). A CT group is equal to a device group in the Raid Manager configuration file. A consistency group ID (CTGID) is assigned automatically during pair creation. NOTE: Different XP model has a different maximum number of Consistency Groups. For details check the XP user’s guide. Limitations of Asynchronous Mode The following are restrictions for an asynchronous CT group in a Raid Manager configuration file: • •
138
Asynchronous device groups cannot be defined to extend across multiple XP Series disk arrays. When making paired volumes, the Raid Manager registers a CTGID to the XP Series disk array automatically at paircreate time, and the device group in the
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
•
configuration file is mapped to a CTGID. Efforts to create a CTGID with a higher number will be terminated with a return value of EX_ENOCTG. Metrocluster/Continuous Access supports only one consistency group per package. This is based on the number of packages, in a metropolitan cluster, that can be configured to use a consistency group. Furthermore, the number of packages that can be configured to use a consistency group is limited by either; the maximum number of consistency groups that are supported by the XP model in the configuration, or the maximum number of packages in the cluster (whichever is smaller).
Other Considerations on Asynchronous Mode The following are some additional considerations when using asynchronous mode: •
•
• •
•
When adding a new volume to an existing device group, the new volume state is SMPL. The XP disk array controller (DKC) is smart enough to do the paircreate only on the new volume. If the device group has mixed volume states like PAIR and SMPL, the pairvolchk returns EX_ENQVOL, and horctakeover will fail. If you change the LDEV number associated with a given target/LUN, you must restart all the Raid Manager instances even though the Raid Manager configuration file is not modified. Any firmware update, cache expansion, or board change, requires a restart of all Raid Manager instances. pairsplit for asynchronous mode may take a long time depending on how long the synchronization takes. there is a potential for the Continuous Access link to fail while pairsplit is in progress. If this happens, pairsplit will fail with a return code of EX_EWSUSE. In most cases, Metrocluster/Continuous Access in asynchronous mode will behave the same as when the fence level is set to NEVER in synchronous mode.
Continuous Access Journal Overview Continuous Access XP Journal is an asynchronous data replication between two HP XP12000 or HP XP10000 storage disk arrays. As depicted in Figure 3-3, Continuous Access Journal uses two main features, “disk-based journaling” and “pull-style replication”. These two features reduce XP12000 internal cache memory consumption, while maintaining performance and operational resilience.
Overview of Continuous Access XP Concepts
139
Figure 3-3 Journal Based Replication
Continuous Access Journal performs remote copy operations for data volume pairs. Each Continuous Access Journal pair consists of primary data volumes (PVOL) and secondary data volumes (SVOL) which are located in different storage arrays. The Continuous Access Journal PVOL contains the original data, and the SVOL contains the duplicate data. During normal data replication operations, the PVOL remains available to all hosts at all times for read and write I/O operations. During normal data replication operations, the storage array rejects all host-requested write I/Os for the SVOL. The SVOL write enable option allows write access to a secondary data volume while the pair is split and uses the SVOL and PVOL track maps to resynchronize the pair. Journal Volume When Continuous Access Journal is used, updates to PVOL can be stored in other volumes, which are called journal volumes. The update data that will be stored in journal volumes are called journal data. Figure 3-3 depicts Continuous Access Journal data replication for disk-based journaling in which the data volumes at the primary data center are being replicated to a secondary storage array at the remote data center. When collecting the data to be replicated, the primary XP12000 array writes the designated records to a special set of journal volumes. The remote storage array then reads the records from the journal volumes, pulling them across the communication link as described in the next section “Pull-Based Replication”. By writing the records to journal disks instead of keeping them in cache, Continuous Access Journal overcomes the limitations of earlier asynchronous replication methods. Writes to the journal are cached for application, but are quickly de-staged to disk to minimize cache usage. The journal volumes are architected and optimized for keeping large amounts of host-write data in sequence. 140
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
In addition to the records being replicated, the journal contains metadata for each record to ensure the integrity and consistency of the replication process. Each transmitted record set includes both time stamp and sequence number information, which enables the replication process to verify that all the records are received at the remote site, and to arrange them in the correct write order for storage. These processes build on the proven algorithms of XP Continuous Access Asynchronous Data Replication. The journaling and replication processes also support consistency across multiple volumes. Pull-Based Replication In addition to disk-based journaling, Continuous Access Journal uses pull-style replication. The primary storage system does not dedicate resources to pushing data across the replication link. Rather, a replication process on the remote system pulls the data from the primary system's journal volume, across the Continuous Access link, and writes it to the journal volume at the receiving site. The replication process then applies the journaled writes to the remote data volumes, using metadata and consistency algorithms to ensure data integrity. In the default configuration, Continuous Access Journal considers replication complete when the data is received in mirrored system cache at the remote system, written to the journal disk, and applied to the remote data volumes.Since the process that controls asynchronous replication is located on the remote system, this approach shifts most of the replication workload to the remote site, reducing resource consumption on the primary storage system. In effect, Continuous Access Journal restores primary site storage to its intended role as a transaction processing resource, not a replication engine. The pull-style replication engine also contributes to resource optimization. It controls the replication process from the secondary system and frees up valuable production resources on the primary system. Mitigation of Network Problems In Continuous Access Asynchronous replication, typical issues include temporary communication problems, such as Continuous Access link failure or insufficient bandwidth for peak-load requirements. These conditions can cause cache-based “push” replication methods to fail. When this happens, traditional replication solutions suspend the replication process and go into bitmap mode, noting changed tracks in a bitmap for future resynchronization. Recovery typically involves a destructive process such as rewriting all the changed tracks, with possible loss of data consistency for ordered writes. In contrast, Continuous Access Journal logs every change to the journal disk at the primary site, including the metadata needed to apply the changes consistently. Should the replication link between sites fail, Continuous Access Journal keeps logging changes in the local journal so that they can be transmitted later, without interruption to the protection process or the application. The journal data is simply transferred after the network link failure or bandwidth limitation is corrected, with no loss of consistency. The recovery time may be extended a bit during temporary link failures or congestion, Overview of Continuous Access XP Concepts
141
but the asynchronous replication process does not fail, and the catch-up process is simple and automatic. Data consistency is preserved. With Continuous Access Journal, the remote storage system pulls data from the primary journal volumes over the data replication network as fast as the bandwidth allows while adjusting to available network conditions. If available bandwidth does not support optimal replication, such as during peak-load spikes in transaction volume, the primary journal volumes buffer the data on disk until more bandwidth becomes available. Fence Level The Continuous Access Journal has the asynchronous data replication characteristic. In XP12000, the fence level of the Continuous Access journal is defined to “async”, the same as the Continuous Access Asynchronous fence level. Journal Group The journal group is a component of the Continuous Access Journal operations that consists of two or more data and journal volumes. The data update sequence from the host is managed per the journal group. This ensures the data update sequence consistency between the paired journal groups is maintained. Journal groups are managed according to the journal group number. The paired journal numbers of journal groups can be different. One journal group can have more than one data volume and journal volume belong to it. Journal Cache, Journal Volumes, and Inflow Control When a primary array performs an update (host-requested write I/O) on PVOL, the primary array creates the journal data (metadata and new write data) to be transferred to secondary array. The journal data is stored in the journal cache or journal volumes depending on an amount of data in cache. If available cache memory for Continuous Access Journal is low, the journal data is stored in the journal volumes. A secondary array receives the journal data that is transferred from the primary array according to the read journal command. The received journal data is stored in the journal cache or the journal volumes depending on the “Use of Cache” parameter and/or amount of data in cache. If the “Use of Cache” is set to “Use”, journal data will be stored into the journal cache. If it is set to “No Use”, journal data will bypass the cache and move directly to the journal volumes. In addition, if available cache memory for Continuous Access Journal is low, the journal data is stored in the journal volume. For Continuous Access Journal processing, Continuous Access Journal allows the usage rate of journal volume to be specified. The Journal volume stores journal data to be transferred to the secondary array asynchronously using host write I/Os to PVOL. However, if the hosts transfer excessive amounts of data, the journal volume may become full. Consequently, if the journal volumes remains full for the specified period of time, the journal group will be suspended due to a failure. To specify the period of
142
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
time for how long the journal volume can remain full, use the Data Overflow Watch option. The XP12000 array uses the following parameters to control the inflow of data into journal group and state change of the journal group: • Inflow Control: Indicates whether to restrict inflow of update I/Os (slow down the host response) to the journal volume, when the journal volume is full. ‘Yes’ indicates inflow will be restricted. ‘No’ indicates inflow will not be restricted. • Data Overflow Watch: Indicates the time (in seconds) to implement Inflow Control before suspending the journal group. If the amount of data in the journal volume, in the primary array, reaches the capacity, the disk array I/Os will be delayed. If journal volume remains full for the period of time specified by the Data Overflow Watch parameter, the primary array suspends the affected journal groups due to a failure. Continuous Access Journal Pair State If the amount of data in the journal cache, in the secondary subsystem, reaches the specified journal cache capacity, the secondary subsystem stores the received journal data into the restore journal volume, and then issues the next read-journal command to the primary subsystem. This suppresses the cache usage rate increase. To accommodate, the Continuous Access Journal retains the PAIR state when the Continuous Access links fail while the Continuous Access Asynchronous switches to PSUE state as long as the journal volumes has enough space. In addition, this allows host write-data to be kept continuously as journal data in the journal volumes while the updated data is not being replicating to the remote array. Once the links are recovered, the data replication of the primary and secondary arrays is resumed automatically. The journal data accumulated in the primary journal volumes is replicated to the secondary site automatically. NOTE: If the journal volumes get full, the pair state will be switched to PFUS and the data written to the data volume is tracked in bitmap. Limitations of XP12000 Continuous Access Journal The following two sections describe the “One-to-One Volume Copy Operations” and “One-to-One Journal Group Operations” limitations of the XP12000 Continuous Access Journal. One-to-One Volume Copy Operations Continuous Access Journal requires a one-to-one relationship between the logical volumes of the volume pairs. A volume can only be assigned to one journal group pair at a time.
Overview of Continuous Access XP Concepts
143
NOTE: Continuous Access Journal does not support operations in which one primary data volume is copied to more than one secondary data volume, or more than one primary data volume is copied to one secondary data volume. One-to-One Journal Group Operations The Continuous Access Journal supported configuration for a journal group pair is a one-to-one relationship. This means one journal group in one XP12000 can only pair with one journal group in another XP12000. Journal Group Requirement The journal groups require that each data volume pair be assigned to one and only one journal group. Configuring XP12000 Continuous Access Journal One journal group can contain multiple journal volumes. Each of the journal volumes can have different volume sizes and different RAID configurations. Journal data will be stored sequentially and separately into each journal volume in the same journal group, and each of the journal volumes that are used equally. Journal volumes in the same journal group can be of different capacity. A journal volume in primary subsystem and the corresponding restore journal volume can be of different capacity. Registering Journal Volumes Unlike the Continuous Access Asynchronous device group that only contains data volumes, a journal group includes data volumes as well as journal volumes. Journal volumes must be registered in a journal group before creating a data volume pair for the first time in the journal group. Journal volumes are assigned to a specific journal group. Each journal group has it own ID. The journal volumes assigned to the specific journal group can be used to create one journal group pair. One journal group (JID) on primary array and one journal group (JID) on secondary array are used to create a journal group pair. Be sure to register journal volumes to journal groups on both primary and secondary arrays. The number and capacity of the journal volumes for a specific journal group on a primary or secondary array depends on the business need and IT infrastructure. To register journal volumes in a journal group use the “HP StorageWorks Command View XP”. For more information on this feature, refer to the HP-UX 11i Version 2 Release Notes.
144
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
NOTE: The “HP StorageWorks Command View XP” utility is the only way to register journal volumes. (No command line interface is available for registering the journal volumes). Journal volumes can be registered in a journal group or can be deleted from a Journal group. Journal volumes cannot be registered or deleted when data copying is performed (that is, when one or more data volume pairs exist). The journal volumes can be deleted from a journal group in the following occasions: • When the journal group does not contain data volumes (that is, before a data volume pair is created for the first time in the journal group, or after all data volume pairs are deleted). • When all data volume pairs in the journal group are suspended. If a path is defined from a host to a volume, do not register the volume as a journal volume and define paths from hosts to journal volumes. This means that hosts cannot read from and write to journal volumes. Data Replication Connections The remote copy connections are the physical paths used by the primary array to communicate with the secondary array. The primary XP12000 array and secondary XP12000 array are connected using fiber-channel interface (Note: ESCON is not supported with the XP12000). Ensure the connection is established in a bidirectional manner. Metrocluster package vs. Journal Group Metrocluster Continuous Access XP supports only one journal group pair per package. Thus, in a metropolitan cluster, the number of packages can be configured to use journal group is limited by either the maximum number of journal groups that are supported by the XP12000 in the configuration, or by the maximum number of packages in the cluster, which ever is smaller.
Creating the Cluster Create the cluster or clusters according to the process described in the Managing Serviceguard user’s guide. In the case of a metropolitan cluster, create a single Serviceguard cluster with components on multiple sites. In the case of a continental cluster, create two distinct Serviceguard clusters on different sites.
Creating the Cluster
145
NOTE: Do not configure an XP series paired volume, PVOL or SVOL, as a cluster lock disk. A cluster lock disk must always be writable. Since it cannot be guaranteed that either half of a paired volume is always writable, neither half may be used as a cluster lock disk. A configuration with a cluster lock disk that is part of a paired volume is not a supported configuration.
Preparing the Cluster for Data Replication This section assumes that you have already created one or more Serviceguard clusters for use in a disaster tolerant configuration. The following three sets of procedures will prepare Serviceguard clusters for use with Continuous Access XP data replication in a metropolitan or continental cluster. • Creating the RAID Manager Configuration • Defining Storage Units • Configuring Packages
Creating the RAID Manager Configuration Use these steps to create the configuration: 1.
Ensure that the XP Series disk arrays are correctly cabled to each host system that will run packages whose data reside on the arrays. Each XP Series disk array must be configured with redundant Continuous Access links, each of which is connected to a different LCP or RCP card. To prevent a single point of failure (SPOF), there must be at least two physical boards in each XP for the Continuous Access links. Each board usually has multiple ports. However, a redundant Continuous Access link must be connected to a port on a different physical board from the board that has the primary Continuous Access link. When using bi-directional configurations, where data center A backs up data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which you want to allow failback.
2. 3.
Install the Raid Manager XP software on each host system that has data residing on the XP disk arrays. Edit the /etc/services file, adding an entry for the Raid Manager instance to be used with the cluster. The format of the entry is: horcm /udp For example: horcm0
146
11000/udp
#Raid Manager instance 0
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
4.
Use the ioscan command to determine what devices on the XP disk array have been configured as command devices. The device-specific information in the right most column of the ioscan output will have the suffix-CM for these devices; for example, OPEN-3-CM. If there are no configured command devices on the disk array, you must create two before proceeding. Each command device must have alternate links (PVLinks). The first command device is the primary command device. The second command device is a redundant command device and is used only upon failure of the primary command device. The command devices must be mapped to the various host interfaces by using the SVP (disk array console) or a remote console.
5.
Copy the default Raid Manager configuration file to an instance-specific name. # cp /etc/horcm.conf /etc/horcm0.conf
6.
Create a minimum Raid Manager configuration file by editing the following fields in the file created in the previous step: • HORCM_MON—enter the host-name of the system on which you are editing and the TCP/IP port number specified for this Raid Manager instance in the /etc/services file. • HORCM_CMD—enter the primary and alternate link device file names for both the primary and redundant command devices (for a total of four raw device file names). CAUTION: Make sure that the redundant command device is NOT on the same physical device as the primary command device. Also, make sure that it is on a different bus inside the XP series disk array.
7.
If the Raid Manager protection facility is enabled, set the HORCPERM environment variable to the pathname of the HORCM permission file, then export the variable. # export HORCMPERM=/etc/horcmperm0.conf If the Raid Manager protection facility is not used or disabled, export the HORCPERM environment variable. # export HORCMPERM=MGRNOINST
8.
Start the Raid Manager instance by using horcmstart.sh . # horcmstart.sh 0
9.
Export the environment variable that specifies the Raid Manager instance to be used by the Raid Manager commands. For example, with the POSIX shell type. # export HORCMINST=0 Now, use Raid Manager commands to get further information from the disk arrays. To verify the software revision of the Raid Manager and the firmware revision of the XP disk array. Preparing the Cluster for Data Replication
147
# raidqry -l NOTE: Check for the minimum requirement level for XP, Raid Manager software, and firmware for your version listed in the Metrocluster Continuous Access XP Release Notes. Identifying HP-UX device files Before you create volume groups, you must determine the Device Special Files (DSFs) of the corresponding LUNs used in the XP array. To determine the legacy DSFs corresponding to the LUNs in the XP array: # ls /dev/rdsk/* | raidscan -find -fx Following is the output that is displayed: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdsk/c5t0d0 0 F CL3-E 0 0 10053 321 OPEN-3 This output displays the mapping between the legacy DSFs and the CU:LDEVs. In this output the value for LDEV specifies the CU:LDEV without the : mark. To determine the agile DSFs that are supported from HP-UX 11i v3 and CU:LDEV mapping information run the following command: # ls /dev/rdisk/* | raidscan -find -fx Following is the output that you will see: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdisk/disk232 0 F CL4-E 0 0 10053 321 OPEN-3 NOTE: There must also be alternate links for each device, and these alternate links must be on different busses inside the XP disk array. For example, these alternate links may be CL2-E and CL2-F. Unless the devices have been previously paired either on this or another host, the devices will show up as SMPL (simplex). Paired devices will show up as PVOL (primary volume) or SVOL (secondary volume). XP arrays (XP 10000/XP 12000 and beyond) support external attached storage devices to be configured as either P-VOL or S-VOL or both of a Continuous Access pair. From a Continuous Access perspective, there is no difference between a pair created from internal devices and a pair created on external devices. Refer to the HP StorageWorks XP documentation for information on the configuration requirements of external storage devices attached to XP arrays and supported external storage devices. 10. Determine which devices will be used by the application package. Define a device group that contains all of these devices. It is recommended that you use a name that is easily associated with the package. For example, a device group name of 148
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
“db-payroll” is easily associated with the database for the payroll application. A device group name of “group1” would be more difficult to relate to an application. Edit the Raid Manager configuration file (horcm0.conf) in the above example to include the devices and device group used by the application package. Only one device group may be specified for all of the devices that belong to a single application package. These devices are specified in the field HORCM_DEV. Also complete the HORCM_INST field, supplying the names of only those hosts that are attached to the XP disk array that is remote from the disk array directly attached to this host. For example, with the continental cluster shown in Figure 3-4 (node 1 and node 2 in the primary cluster and nodes 3 and node 4 in the recovery cluster), you would specify only nodes 3 and node 4 in the HORCM_INST field in a file you are creating on node 1 on the primary cluster. Node 1 would have previously been specified in the HORCM_MON field. Figure 3-4 Disaster Tolerant Cluster replicated data for package A
PVOL PVOL Local XP Disk Array PVlinks A pkg A
Remote XP Continuous Access Disk Array PVlinks A’ link
node 1
node 2 pkg B
SVOL SVOL
replicated data for package B
network
network Highly Available Network
network
PVlinks
B
node 3
network
Continuous Access link
PVlinks
node 4
B’
11. Restart the Raid Manager instance so that the new information in the configuration file is read. # horcmshutdown.sh # horcmstart.sh 12. Repeat these steps on each host that will run this particular application package. If a host may run more than one application package, you must incorporate device group and host information for each of these packages. Note that the Raid Manager configuration file must be different for each host, especially for the HORC_MON and HORC_INST fields. 13. If not previously done, create the paired volumes. # paircreate -g -f -vl -c15 Preparing the Cluster for Data Replication
149
This command must be issued before creating volume groups. For creating a pair of journal groups, refer to the next section, “Pair Creation of Journal Groups”. CAUTION:
Paired devices must be of compatible sizes and types.
When using the paircreate command to create PVOL/SVOL Continuous Access pairs, specify the -c 15 switch to ensure the fastest data copy from PVOL to SVOL. Pair Creation of Journal Groups The Continuous Access Journal has the same characteristic as Continuous Access Asynchronous such that Raid Manager controls Continuous Access Journal similar to Continuous Access Asynchronous. The Raid Manager configuration of the device group pair for Continuous Access Journal is exactly the same as the configuration of the Continuous Access Asynchronous device group pair. In the /etc/horcm0.conf file, do not specify any journal volumes or journal group number. Only data volumes (device group and it’s devices) need to be in the configuration file. Creating Continuous Access Journal Pair To create a journal group pair, use the paircreate command. # paircreate -g -f async -vl -c 15 -jp \ -js Similar to Continuous Access Asynchronous, the fence “async” must be assigned to the command with two additional options -jp and -js. -jp : This option is to specify a journal group ID for PVOL -js : This option is to specify a journal group ID for SVOL The -jp and -js options are required if the device group is configured to use Continuous Access Journal. The used with -jp and -js option do not need to be the same. Sample Raid Manager Configuration File The following is an example of a Raid Manager configuration file for one node (ftsys1). ## horcm0.conf.ftsys1- This is an example Raid Manager configuration file for node ftsys1.Note that this configuration file is for Raid Manager instance 0, which can be determined by the "0" in the filename "horcm0.conf". #Whenever this configuration file is changed, you must stop and restart the instance of Raid Manager before the changes will be recognized. This can be done using the following commands:# 150
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
horcmshutdown.sh # horcmstart.sh # After restarting the Raid Manager instance, you should confirm that there are no configuration errors reported by running the pairdisplay command with the "-c" option. # # NOTE: The Raid Manager command device (RORCM_CMD) cannot be used for # data storage (it is reserved for private Raid Manager usage). #/************************ HORCM_MON *************************************/ # # The HORCM_MON parameter is used for monitoring and control of device groups # by the Raid Manager. # It is used to define the IP address, port number, and paired volume error # monitoring interval for the local host. # # Defines a network address used by the local host. This can be a host name # or an IP address. # # Specifies the port name assigned to the Raid Manager communication path, # which is must also be defined in /etc/services. If a port number, rather # than a port name is specified, the port number will be used. # # Specifies the interval used for monitoring the paired volumes. By # increasing this interval, the Raid Manager daemon load is reduced. # If this interval is set to -1, the paired volumes are not monitored. # # Specifies the time-out period for communication with the Raid Manager # server. HORCM_MON #ip_address ftsys1
service horcm0
poll_interval(10ms) 1000
timeout(10ms) 3000
#/************************* HORCM_CMD *************************************/ # # The HORCM_CMD parameter is used to define the special files (raw device # file names) of the Raid Manager command devices used for the monitoring # and control of Raid Manager device groups. # Define the special device files corresponding to two or more command devices # in order to use the Raid Manager alternate command device feature. An # alternate command device must be configured, otherwise a failure of a # single command device could prevent access to the device group. # Each command device must have alternate links (PVLinks). The first command # device is the primary command device. The second command device is a # redundant command device and is used only upon failure of the primary # command device. The command devices must be mapped to the various host # interfaces by using the SVP (disk array console) or a remote console. HORCM_CMD #Primary #dev_name /dev/rdsk/c4t1d0
Primary Alt-Link dev_name /dev/rdsk/c5t1d0
Secondary dev_name /dev/rdsk/c4t0d1
Secondary Alt-link dev_name /dev/rdsk/c5t0d1
#/************************* HORCM_DEV *************************************/ # # The HORCM_DEV parameter is used to define the addresses of the physical # volumes corresponding to the paired logical volume names. Each group # name is a unique name used by the hosts which will access the volumes. # # The group and paired logical volume names defined here must be the same for # all other (remote) hosts that will access this device group.
Preparing the Cluster for Data Replication
151
# # # # # # # # # # # # # # # # # # # #
The hardware SCSI bus, SCSI-ID, and LUNs for the device groups do not need to be the same on remote hosts. This parameter is used to define the device group name for paired logical volumes. The device group name is used by all Raid Manager commands for accessing these paired logical volumes. This parameter is used to define the names of the paired logical volumes in the device group. This parameter is used to define the XP256 port number used to access the physical volumes in the XP256 connected to the "dev_name". Consult your XP256 for valid Port numbers to specify here. This parameter is used to define the SCSI target ID of the physical volume on the port specified in "port#". This parameter is used to define the SCSI logical unit number (LUN) of the physical volume specified in "targetID".
HORCM_DEV #dev_group pkgA pkgA pkgA pkgB pkgC pkgD
dev_name pkgA_index pkgA_tables pkgA_logs pkgB_d1 pkgC_d1 pkgD_d1
port# CL1-E CL1-E CL1-E CL1-E CL1-E CL1-E
TargetID 0 0 0 0 0 0
LUN# 1 2 3 4 5 2
#/************************* HORCM_INST ************************************/ # # This parameter is used to define the network address (IP address or host # name) of the remote hosts which can provide the remote Raid Manager access # for each of the device group secondary volumes. # The remote Raid Manager instances are required to get status or provide # control of the remote devices in the device group. All remote hosts # must be defined here, so that the failure of one remote host will prevent # obtaining status. # # # This is the same device group names as defined in dev_group of HORC_DEV. # # This parameter is used to define the network address of the remote hosts # with Raid Manager access to the device group. This can be either an # IP address or a host name. # # This parameter is used to specify the port name assigned to the Raid # Manager instance, which must be registered in /etc/services. If this is # a port number rather than a port name, then the port number will be used. HORCM_INST #dev_group pkgA pkgA pkgB pkgB pkgC pkgC pkgD pkgD
152
ip_address ftsys1a ftsys2a ftsys1a ftsys2a ftsys1a ftsys2a ftsys1a ftsys2a
service horcm0 horcm0 horcm0 horcm0 horcm0 horcm0 horcm0 horcm0
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Notes on the Raid Manager Configuration A single XP device group must be defined for each package on each host that is connected to the XP series disk array. Device groups are defined in the Raid Manager configuration file under the heading HORCM_DEV. The disk target IDs and LUNs for all Physical Volumes (PVs) defined in Volume Groups (VGs) that belong to the package must be defined in one XP device group on each host system that may ever run one or more Continentalclusters packages. The device group name (dev_group) is user-defined and must be the same on each host in the continental cluster that accesses the XP disk array. The device group name (dev_group) must be unique within the cluster; it should be a name that is easily associated with the application name or Serviceguard package name. The TargetID and LU# fields for each device name may be different on different hosts in the clusters, to allow for different hardware I/O paths on different hosts. See the sample convenience scripts in the Samples directory included with this toolkit for examples. Configuring Automatic Raid Manager Startup After editing the Raid Manager configuration files and installing them on the nodes that are attached to the XP Series disk arrays, you should configure automatic Raid Manager startup on the same nodes. This is done by editing the rc script /etc/ rc.config.d/raidmgr. Set the START_RAIDMGR parameter to 1, and define RAIDMGR_INSTANCE as the number of the Raid Manager instances being used. By default, this is zero (0). An example of the edited startup file is shown below: #*************************** RAIDMANAGER ************************* # # # # # # # # # # # # # # # # # # # # # #
Metrocluster with Continuous Access Toolkit script for configuring the startup parameters for a HP StorageWorks Disk Array XP Raid Manager instance. The Raid Manager instance must be running before any Metrocluster package can start up successfully. @(#) $Revision: 1.8 $ START_RAIDMGR:
If set to 1, this host will attempt to start up an instance of the Disk Array XP Raid Manager, which must be running before a Metrocluster package can be successfully started. If set to 0, this host will not attempt to start the Raid Manager.
RAIDMGR_INSTANCE
This is the instance number of the Raid Manager instance to be started by this script. The instance number specified here must be the same as the instance number specified in the Metrocluster package control script. Consult your Raid Manager documentation for more information on Raid Manager instances.
See the Metrocluster and Raid Manager documentation for more information Preparing the Cluster for Data Replication
153
# on configuring this script. # START_RAIDMGR=0 RAIDMGR_INSTANCE=0
Defining Storage Units Both LVM and VERITAS VxVM storage can be used in disaster tolerant clusters. The following sections show how to set up each type of volume group: Creating and Exporting LVM Volume Groups using Continuous Access XP Use the following procedure to create and export volume groups: 1.
Define the appropriate Volume Groups on each host system that might run the application package. # mkdir /dev/vgxx # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the entire cluster.
2.
Create the Volume Group on the source volumes. # pvcreate -f /dev/rdsk/cxtydz # vgcreate /dev/vgname /dev/dsk/cxtydz
3. 4.
Create the logical volume(s) for the volume group. Export the Volume Groups on the primary system without removing the special device files. # vgchange -a n Make sure that you copy the mapfiles to all of the host systems. # vgexport -s -p -m
5.
On the primary cluster import the VGs on all of the other systems that might run the Serviceguard package and backup the LVM configuration. # vgimport -s -m # vgchange -a y # vgcfgbackup # vgchange -a n
6.
On the recovery cluster import the VGs on all of the systems that might run the Serviceguard recovery package and backup the LVM configuration. # pairsplit -g -rw # vgimport -s -m # vgchange -a y
154
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
# vgcfgbackup # vgchange -a n # pairresync -g -c 15 Skip the pairsplit/pairresync, however this will not activate the volume group to perform the vgcfgbackup. Perform the vgcfgbackup when the volume group is activated during the first recovery package activation. When using the pairresync command to resynchronize PVOL/SVOL Continuous Access pairs, specify the -c 15 switch to ensure the fastest resynchronization which reduces the vulnerability of a rolling disaster. From HP-UX 11i v3 onwards, HP recommends that you use agile DSF naming model for mass storage devices. For more information on the agile view, mass storage on HP-UX, DSF migration and LVM Online Disk Replacement, see the following documents that are available at http://www.docs.hp.com: • • • •
LVM Migration from Legacy to Agile Naming Model HP-UX 11i v3 Mass Storage Device Naming Serviceguard Migration LVM Online Disk Replacement
Creating VxVM Disk Groups using Continuous Access XP If using VERITAS storage, use the following procedure to create disk groups. It is assumed a VERITAS root disk (rootdg) has already been created on the system where configuring the storage. The following section shows how to set up VERITAS disk groups. On one node do the following: 1.
Create the device pair to be used by the package. # paircreate -g devgrpA -f never -vl -c 15
2.
Check to make sure the devices are in the PAIR state. # pairdisplay -g devgrpA
3.
Initialize disks to be used with VxVM by running the vxdisksetup command only on the primary system. # /etc/vx/bin/vxdisksetup -i c5t0d0
4.
Create the disk group to be used with the vxdg command only on the primary system. # vxdg init logdata c5t0d0
5.
Verify the configuration. # vxdg list
Preparing the Cluster for Data Replication
155
6.
Use the vxassist command to create the logical volume. # vxassist -g logdata make logfile 2048m
7.
Verify the configuration. # vxprint -g logdata
8.
Make the filesystem. # newfs -F vxfs /dev/vx/rdsk/logdata/logfile
9.
Create a directory to mount the volume group. # mkdir /logs
10. Mount the volume group. # mount /dev/vx/dsk/logdata/logfile /logs 11. Check if file system exits, then unmount the file system. # umount /logs IMPORTANT: VxVM 4.1 does not support the agile DSF naming convention with HP-UX 11i v3. Validating VxVM Disk Groups using Metrocluster/Continuous Access Data Replication The following section describes how to validate the VERITAS disk groups on one node: 1.
Deport the disk group. # vxdg deport logdata
2.
Enable other cluster nodes to have access to the disk group. # vxdctl enable
3.
Suspend the Continuous Access link and have SVOL Read/Write permission. # pairsplit -g devgrpA -rw
4.
Import the disk group. # vxdg -tfC import logdata
5.
Start the logical volume in the disk group. # vxvol -g logdata startall
6.
Create a directory to mount the volume. # mkdir /logs
7.
Mount the volume. # mount /dev/vx/dsk/logdata/logfile /logs
156
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
8.
Check to make sure the file system is present, then unmount the file system. # umount /logs
9.
Resynchronize the Continuous Access pair device. # pairresync -g devicegroupname -c 15
Configuring Packages for Disaster Recovery When you have completed the following steps, packages will be able to fail over to an alternate node in another data center and still have access to the data that they need in order to operate. This procedure must be repeated on all the cluster nodes for each Serviceguard package so the application can fail over to any of the nodes in the cluster. Customizations include editing an environment file to set environment variables, and customizing the package control script to include customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Consult the Managing Serviceguard user’s guide for more detailed instructions on how to start, halt, and move packages and their services between nodes in a cluster. For ease of troubleshooting, you can configure and test one package at a time. 1.
Create a directory /etc/cmcluster/pkgname for each package. # mkdir /etc/cmcluster/pkgname
2.
Create a package configuration file. # cd /etc/cmcluster/pkgname # cmmakepkg -p pkgname.config Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ pkgname.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters.
3.
In the .config file, list the node names in the order in which you want the package to fail over. It is recommended for performance reasons, to have the package fail over locally first, then to the remote data center. Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time required to obtain status from the XP Series disk array. If you are using a fence level of ASYNC, then the RUN_SCRIPT_TIMEOUT should be greater than the value of HORCTIMEOUT in the package environment file (see step 7g below).
Configuring Packages for Disaster Recovery
157
NOTE: If you are using the EMS disk monitor as a package resource, you must not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is no access from the host to the package disks. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the XP Series disk array. Clusters with multiple packages that use devices on the XP Series disk array will all cause package startup time to increase when more than one package is starting at the same time. 4.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater.
5.
6.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See Managing Serviceguard for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env. # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env NOTE: If you do not use a package name as a filename for the package control script, you must follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (xpca) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_xpca.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_xpca.env.
7.
158
Edit the environment file _xpca.env as follows:
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_. It is recommended that you retain the default values of these variables unless you have a specific business requirement to change them. See Appendix A for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. d. Uncomment the DEVICE_GROUP variable and set it to this package’s Raid Manager device group name, as specified in the Raid Manager configuration file. e. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used or disabled. If Raid Manager protection facility is enabled set it to the name of the HORCM permission file. f. Uncomment the HORCMINST variable and set it to the Raid Manager instance name used by Metrocluster/Continuous Access. g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array. h. If you are using asynchronous data replication, set the HORCTIMEOUTvariable to a value greater than the side file timeout value configured with the Service Processor (SVP), but less than the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default setting is the side file timeout value + 60 seconds. i. Uncomment the CLUSTER_TYPE variable and set it to METRO if you are using Metrocluster, or CONTINENTAL if you are using Continentalclusters. 8.
After customizing the control script file and creating the environment file, and before starting up the package, do a syntax check on the control script using the following command (be sure to include the -n option to perform syntax checking only): # sh -n If any messages are returned, you should correct the syntax errors.
9.
Check the configuration using the cmcheckconf -P pkgname.config, then apply the Serviceguard configuration using the cmapplyconf -P pkgname.config command or SAM.
Configuring Packages for Disaster Recovery
159
10. Distribute Metrocluster/Continuous Access configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via the .rhosts may create a security issue. 11. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname. pkgname.cntl Seviceguard package control script pkgname_xpca.env
Metrocluster/Continuous Access environment file
pkgname.config
Serviceguard package ASCII configuration file
pkgname.sh
Package monitor shell script, if applicable
other files
Any other scripts you use to manage Serviceguard packages
The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster/Continuous Access.
Completing and Running a Metrocluster Solution with Continuous Access XP No additional steps are required after cluster and package configuration to complete the setup of the metropolitan cluster. In normal operation, the metropolitan cluster with Continuous Access XP starts like any other cluster, and runs and halts packages in the same way as a standard cluster. However, startup time for packages may be considerably slower because of the need to check disk status on both disk arrays.
Maintaining a Cluster that uses Metrocluster with Continuous Access XP While the cluster is running, performing manual “changes of state” for devices on the XP Series disk array can cause the package to halt. This is due to unexpected conditions and can cause the package not to start up after a failover. In general, it is recommended that no manual “changes of state” be performed while the package and the cluster are running.
160
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
NOTE: Manual changes can be made when they are required to bring the device group into a “protected” state. For example, if a package starts up with data replication suspended, a user can perform a pairresync command to re-establish data replication while the package is still running. Viewing the Progress of Copy Operations While a copy is in progress between XP systems (that is, the volumes are in a COPY state), the progress of the copy can be viewed by monitoring the % column in the output of the pairdisplay command: # pairdisplay -g pkgB -fc -CLI Group pkgB pkgB
PairVol L/R pkgD-disk0 L pkgD-disk0 R
Port# TID LU Seq# LDEV# P/S Status Fence CL1-C 0 3 35422 463 P-VOL COPY NEVER CL1-F 0 3 35663 3 S-VOL COPY NEVER
% P-LDEV# M 79 460 0 -
This display shows that 79% of a current copy operation has completed. Synchronous fence levels (NEVER and DATA) show 100% in this column when the volumes are in a PAIR state. Viewing Side File Size If you are using asynchronous data replication, you can see the current size of the side file when the volumes are in a PAIR state by using the pairdisplay command. The following output, obtained during normal cluster operation, shows the percentage of the side file that is full: # pairdisplay -g pkgB -fc -CLI Group pkgB pkgB
PairVol L/R pkgD-disk0 L pkgD-disk0 R
Port# TID LU Seq# LDEV# P/S Status Fence CL1-C 0 3 35422 463 P-VOL PAIR ASYNC CL1-F 0 3 35663 3 S-VOL PAIR ASYNC
% P-LDEV# M 35 3 0 463 -
This output shows that 35% of the side file is full. When volumes are in a COPY state, the % column shows the progress of the copying between the XP frames, until it reaches 100%, at which point the display reverts to showing the side file usage in the PAIR state. Viewing the Continuous Access Journal Status The following two sections describe using the pairdisplay and raidvchkscan commands for viewing the Continuous Access Journal Status. Viewing the Pair and Journal Group Information - Raid Manager using the “pairdisplay” Command The command option “-fe” is added to the Raid Manager pairdisplay command. This option is used to display the Journal Group ID (and other data) of a device group pair. The Journal Group ID shows ‘-’ if the device pair is not in Continuous Access Journal mode. Otherwise, it shows a number. Completing and Running a Metrocluster Solution with Continuous Access XP
161
An example of the pairdisplay command with the “-fe” is as below: The pairdisplay -fe is primarily used for the following: Continuous Access Journal device group consistency set, Journal group ID (JID), and Continuous Access link status (AP). # pairdisplay -g oradb -fe Group Seq#, LDEV# P/S,Status, Fence, %, P-LDEV# M CTG JID oradb 30053 64 P-VOL PAIR Never, 75 C8 1 oradb 30054 C8 S-VOL PAIR Never, 64 1
AP
EM E-Seq# 2 0
E-LDEV#
0
Viewing the Journal Volumes Information - Raid Manager using the “raidvchkscan” Command The raidvchkscancommand supports the option (-v jnl [unit#]) in order to find the journal volume lists, and displays information for the journal volumes. raidvchkscan { -h -q -z -v jnl [unit#] [ -s Seq# ] [ -f[x ] | } An example of the raidvchkscan command is as follows: # raidvchkscan –v jnl 0 JID MU CTG JNLS AP U(%) 001 0 1 PJNN 4 21 002 1 2 PJNF 4 95 003 0 3 PJSN 4 0 004 0 4 PJSF 4 45 005 0 5 PJSE 0 0 006 - SMPL 007 0 6 SMPL 4 5
Q-Marker 43216fde 3459fd43 1234f432 345678ef
Q-CNT 30 52000 78 66
D-SZ(BLK) Seq# Nnm LDEV# 512345 62500 2 265 512345 62500 3 270 512345 62500 1 275 512345 62500 1 276 512345 62500 1 277 512345 62500 1 278 512345 62500 1 278
Figure 3-5 shows the illustration for Q-Marker and Q-CNT. The following terms define the meaning for contents in the figure. • • • •
JID: Displays the journal group ID. MU: Displays the mirror descriptions on the journal group. CTG: Displays the consistency group ID. JNLS: Displays the following status in the journal group. — SMPL: this means the journal volume is no in pair mode or is in deleting state. — P(S)JNN: this means “P(S)vol Journal Normal” — P(S)JSN: this means “P(S)vol Journal Suspend Normal” — PJNF: this means “P(S)vol Journal Normal Full” — P(S)JSF: this means “P(S)vol Journal Suspend Full” — P(S)JSE: this means “P(S)vol Journal Suspend Error” including Link failure
•
AP: shows the number of active path on the initiator port in Continuous Access links. Q-Marker: Displays the sequence number in the journal group. In case of the P-JNL, Q-Marker shows the latest sequence number on P-JNL volume. In case of the S-JNL, Q-Marker shows the latest sequence number putting on the cache. Q-CNT: Displays the number of remaining Q-Marker of a journal group.
• • • • 162
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Figure 3-5 Q-Marker and Q-CNT Q-Marker (#9) of P-JNL
Q-Marker (#2) of S-JNL
R/W P-JNL
S-JNL
9 8 7 6 5 4
7 6 5 4 3
3
P-VOL
• • • • •
Q-CNT
Asynchronous transfer
Q-CNT
S-VOL
U(%): Displays the usage rate of the journal data. D-SZ: Displays the capacity for the journal data on the journal group. Seq#: Displays the serial number of the XP12000. Num: Displays the number of LDEV (journal volumes) configured for the journal group. LDEV#: Displays the first LDEV number of journal volumes.
Normal Maintenance There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Metrocluster/Continuous Access: 1.
Stop the package with the appropriate Serviceguard command. # cmhaltpkg pkgname
2.
Split links for the package. # pairsplit -g -rw
3.
Distribute the Metrocluster with Continuous Access XP configuration changes. # cmapplyconf -P pkgname.config
4.
Start the package with the appropriate Serviceguard command: # cmmodpkg -e pkgname
Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators” (page 31).
Completing and Running a Metrocluster Solution with Continuous Access XP
163
Resynchronizing After certain failures, data is no longer remotely protected. In order to restore disaster tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored. Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection: • Failure of all Continuous Access links without restart of the application • Failure of all Continuous Access links with Fence Level “DATA” with restart of the application on a primary host • Failure of the entire secondary Data Center for a given application package • Failure of the secondary XP Series disk array for a given application package while the application is running on a primary host Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated for these failures by moving the application package back to its primary host after repairing the failure: • Failure of the entire primary data center for a given application package • Failure of all of the primary hosts for a given application package • Failure of the primary XP Series disk array for a given application package • Failure of all Continuous Access links with restart of the application on a secondary host Pairs must be manually recreated if both the primary and secondary XP Series disk array are in SMPL (simplex) state. Make sure you periodically review the files syslog.log and /etc/cmcluster/pkgname/pkgname.log for messages, warnings and recommended actions. It is recommended to review these files after system, data center, or application failures. Full resynchronization must be manually initiated after repairing the following failures: • Failure of the secondary XP Series disk array for a given application package followed by application startup on a primary host • Failure of all Continuous Access links with Fence Level NEVER and ASYNC with restart of the application on a primary host Using the pairresync Command The pairresync command can be used with special options; after a failover in which the recovery site has started the application, and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the Continuous Access link is fixed, use the pairresync command in one of the following two ways depending on which site you are on:
164
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
• •
pairresync -swapp—from the primary site. pairresync -swaps—from the failover site.
These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL (now running on the primary site). During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site. NOTE: The preceding steps are automated provided the default value of 1 is being used for the auto variable AUTO_PSUEPSUS. Once the Continuous Access link failure has been fixed, the user only needs to halt the package on the recovery cluster and restart on the primary cluster. However, if you want to reduce the amount of application downtime, you should manually invoke pairresync before failback. Failback After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site. Timing Considerations In a journal group, many journal volumes can be configured to hold a significant amount of the journal data (host-write data). The package startup time may increase significantly when a Metrocluster Continuous Access package fails over. Delay in package startup time will occur in these situations: 1. When recovering from broken pair affinity. On failover, the SVOL pull all the journal data from PVOL site. The time needed to complete all data transfer to SVOL depends on the amount of outstanding journal data in the PVOL and the bandwidth of the Continuous Access links. 2. When host I/O faster than Continuous Access data replication. The outstanding data not being replicated to the SVOL is accumulated in journal volumes. Upon package fail over to the SVOL site, the SVOL pull all the journal data from PVOL site. The completion of the all data transfer to the SVOL depends on the bandwidth of the Continuous Access links and amount of outstanding data in the PVOL journal volume.
Completing and Running a Metrocluster Solution with Continuous Access XP
165
Data maintenance with the failure of a Metrocluster Continuous Access XP Failover The following sections, “Swap Takeover Failure (Asynchronous/Journal mode)” and “Takeover Timeout (for Continuous Access Journal mode)” describes data maintenance upon failure of a Metrocluster Continuous Access XP failover. Swap Takeover Failure (Asynchronous/Journal mode) When a device group pair state is SVOL-PAIR at a local site and is PVOL-PAIR at the remote site, the Metrocluster Continuous Access performs a swap takeover. The swap takeover would fail if there is an internal (unseen) error (for example, cache or shared memory failure) in the device group pair. In this case, if the AUTO-NONCURDATA is set to 0, the package will not be started and the SVOL state is change to SVOL-PSUE (SSWS) by the takeover command. The PVOL site either remains in PVOL-PAIR or is changed to PVOL-PSUE. The SVOL is in SVOL-PSUE(SSWS) meaning that the SVOL is read/write enabled and the data is usable but not as current as PVOL. In this case, either use FORCEFLAG to startup the package on SVOL site or fix the problem and resume the data replication with the following procedures: 1. Split the device group pair completely (pairsplit -g -S). 2. Re-create a pair from original PVOL as source (use paircreate command). 3. Startup package on either the PVOL site or SVOL site. Takeover Timeout (for Continuous Access Journal mode) A takeover timeout occurs when a package failover to the secondary site (SVOL) and Metrocluster Continuous Access issues takeover (either swap or SVOL takeover) command on SVOL. If the journal group pair is flushing the journal data from PVOL to SVOL and takeover timeout occurs, the package would not start and the following situations would occur: 1. The device group pair state remains in PVOL-PAIR/SVOL-PAIR. 2. The journal data is continuously transferring to the SVOL. In this case, it is required to wait for the completion of the journal data flushing and the state for each of the following: • Primary site: PVOL-PAIR or PVOL-PSUS(E) • Secondary site: SVOL-PSUS(SSWS) or SVOL-PSUE(SSWS) At this point, execute either: (1) by using the FORCEFLAG to startup the package on SVOL site or (2) to fix the problem (if any of Continuous Access links was failed) and resume the data replication with the following procedures: 1. Split the device group pair completely (pairsplit -g -S). 2. Re-create a pair from original PVOL as source (use the paircreate command). 3. Startup package on PVOL site (or SVOL site).
166
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
PVOL-PAIR with SVOL-PSUS(SSWS) State (for Continuous Access Journal Mode) PVOL-PAIR with SVOL-PSUS(SSWS) is an intermediate state. The following is one scenario that leads to this state: • •
•
At T1, device pair is in PVOL-PAIR/SVOL-PAIR and the AP value is 0 in SVOL site. At T2, a failover occurs; package failover from PVOL site to SVOL site. The Metrocluster Continuous Access issues SVOL-Takeover and the state will become SVOL-PSUS(SSWS) and PVOL-PAIR. At T3, all Continuous Access links have been recovered. The state stays in SVOL-PSUS (SSWS) and PVOL-PAIR. The duration the PVOL remains in PAIR state is relatively short
The PVOL-PAIR/SVOL-PSUS (SSWS) is an invalid state for XP Asynchronous (both Continuous Access/Asynchronous and Continuous Access Journal). In this state, by issuing a pairresync or takeover command, it would fail. It is necessary to wait for the PVOL to become PSUE.
XP Continuous Access Device Group Monitor In the Metrocluster/Continuous Access environment, where the device group state is not actively monitored, it may not be apparent when the application data is not remotely protected for an extended period of time. Under these circumstances, the XP/Continuous Access device group monitor provides the capability to monitor the status of the XP/Continuous Access device group used in a package. The XP/Continuous Access device group monitor, based on a pre-configured environment variable, also provides the ability to perform automatic resynchronization of the XP/Continuous Access device group upon link recovery. NOTE: If the monitor is configured to automatically resynchronize the data from PVOL to SVOL upon link recovery, a Business Copy (BC) volume of the SVOL should be configured as another mirror. In the case of a rolling disaster and the data in the SVOL becomes corrupt due to an incomplete resynchronization, the data in the BC volume can be restored to the SVOL. This will result non-current, but usable data in the BC volumes The monitor, as a package service, periodically checks the status of the XP/Continuous Access device group that is configured for the package, and sends notification to the user via email, syslog, and console if there is a change in the status of the package’s device group. XP/Continuous Access Device Group Monitor Operation Overview The XP/Continuous Access device group monitor runs as a package service. The user can configure the monitor's setting through the package's environment file. Once the Completing and Running a Metrocluster Solution with Continuous Access XP
167
package has started the XP/Continuous Access device group monitor, the monitor will periodically check the status of the XP/Continuous Access device group. If there is a change in the status or the monitor is configured to notify after an interval of no status change, the monitor will send a notification that states the reason for the notification, a timestamp, and the status of the XP/Continuous Access device group. Configuring the Monitor Use the following steps to configure a monitor for a package’s device group: • •
Configure the monitor’s variables in the package environment file. Configure the monitor as a service of the package.
Configure the Monitor’s Variables in the Package Environment File. Edit the following variables of the monitor’s section in the environment file _xpca.env as follows: NOTE: • •
•
• • •
See Appendix A for an explanation of these variables.
Uncomment the MON_POLL_INTERVAL variable and set it to the desired value in minutes. If this variable is not set, it will default to a value of 10 minutes. Uncomment the MON_NOTIFICATION_FREQUENCY variable and set it to the desired value. This value is used to control the frequency of notification message when the state of the device group remains the same after the first check of the device group's state. If the value is zero, the monitor will only send notification when the state of the device group has changed. If the variable is not set, the default will be 0. If you want to receive notification messages over email, uncomment the MON_NOTIFICATION_EMAIL variable and set it to a fully qualified email address. Multiple email addresses can be configured using comma as separator between the addresses. If you want notification messages to be logged in the syslog file, uncomment the MON_NOTIFICATION_SYSLOG variable and set it to 1. If you want notification messages to be logged on the system's console, uncomment the MON_NOTIFICATION_CONSOLE variable and set it to 1. If you want an automatic resynchronization upon link recovery, uncomment the AUTO_RESYNC variable and set it to either 0, 1 or 2. If AUTO_RESYNC is set to 0 (DEFAULT), the monitor will not try to do the resynchronization from PVOL to SVOL. This setting will only send notifications. If AUTO_RESYNC is set to 1, the monitor will split the remote BC if one is configured from the mirror group before trying to do the resynchronization from PVOL to SVOL.
168
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
If AUTO_RESYNC is set to 2, the monitor will only do the resynchronization from PVOL to SVOL when it finds the MON_RESYNC file in the package directory on the node that the package is running. The monitor will not manage the remote BC prior to and after the resynchronization. This setting is used if the user wants to manage the BC themselves. To enable the Continuous Access resynchronization for AUTO_RESYNC=2, it is necessary to create a file using the HP-UX command touch. For example: # touch /etc/cmcluster/packageA/MON_RESYNC (where/etc/cmcluster/packageA is the package directory) After the monitor detects the MON_RESYNC file, it is automatically removed. The following is an example of the XP/Continuous Access device group monitor definition section in the environment file (_xpca.env>) where the monitor will perform the following: • • • • • •
poll every 15 minutes. send a notification on every third polling, if the state of the device group remains the same. send the notifications to [email protected] and [email protected]. log notifications to system log file, syslog. display notifications to system console. perform automatic resynchronization with BC management when detecting the device group local state change to PVOL-PSUE or PVOL-PDUB. MON_POLL_INTERVAL=15 MON_NOTIFICATION_FREQUENCY=3 [email protected],[email protected] MON_NOTIFICATION_SYSLOG=1 MON_NOTIFICATION_CONSOLE=1 AUTO_RESYNC=1
Completing and Running a Metrocluster Solution with Continuous Access XP
169
Configure XP/Continuous Access Device Group Monitor as a Service of the Package Add the monitor as a service in the package's configuration file and control script file as follows: •
In the package's configuration file, add the following lines: SERVICE_NAME pkgXdevgrpmon.srv SERVICE_FAIL_FAST_ENABLED NOSERVICE_HALT_TIMEOUT 5
NOTE: The SERVICE_HALT_TIMEOUT value of 5 is a recommended value. If the value is set to lower than 5 seconds as the service halt timeout, then it may not allow enough time for the monitor to properly clean itself up. •
In the package's control script file, add the following lines on the SERVICE NAMES AND COMMANDS section: SERVICE_NAME[0]=”pkgXdevgrpmon.srv” SERVICE_CMD[0]=”/usr/sbin/DRMonitorXPCADevGrp ” SERVICE_RESTART[0]=”-r 10”
CAUTION: If the Continuous Access links are still down while the monitor is trying to do the resynchronization and another failure occurs that causes a remote failover to the secondary site, the SVOL’s BC volumes will remain split from its mirror group. This will only occur if the monitor is configured to perform automatic resynchronization using AUTO_RESYNC=1. Configuring the XP/Continuous Access Device Group Monitor as a Service in the Site Controller Package The Device Group Monitor must be configured as a service in the Site Controller package for Site Aware Disaster Tolerant Architecture configurations. The Metrocluster environment file is located in the Site Controller package directory and the same file path must be passed to the Device Group Monitor service. The Site Controller package can be halted in the detached mode for maintenance in the Site Controller package configuration. The Site Controller package can also fail in the cluster. In these conditions, the Device Group Monitoring service is not available for the workloads. The service resumes once the Site Controller package is restarted. For more information on the Site Controller package and the detached mode halt, see “Site Controller Package” (page 327) and “Maintaining Site Controller Package” (page 370).
170
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Troubleshooting the XP/Continuous Access Device Group Monitor The following is a guideline to help identify the cause of potential problems with the XP/Continuous Access device group monitor. •
Problems with email notifications: XP/Continuous Access device group monitor uses SMTP to send out email notifications. All email notification problems are logged in the package log file. If a warning message in the package log file indicates the monitor is unable to determine the SMTP port. it is caused by not having the SMTP port defined in the /etc/services file. The monitor assumes that SMTP port is 25. If a different port number is defined, the monitor will need to be restarted in order for it to connect to the correct port. If an error message in the package control log file states that the SMTP server cannot be found is caused by not having a mail server configured on the local node, such as sendmail. A mail server needs to be configured and run in the local node for email notification. Once the mail server is running in the local node, the monitor will start sending email notifications.
•
Problems with Unknown Continuous Access Device Status: XP/Continuous Access device group monitor relies on the Raid Manager instance to get the Continuous Access device group state. Under the circumstances when the local Raid Manager instance fails, the monitor will not be able to determine the status of the Continuous Access device group state. The monitor will send out a notification to all configured destinations, via email, stating that the state has changed to an UNKNOWN status. Since the monitor will not try to restart the Raid Manager instance, the user is required to restart the Raid Manager instance before the monitor will be able to determine the status of the Continuous Access device group. Make sure to start Raid Manager instance with the same instance number that is defined in the package’s environment file.
Completing and Running a Continental Cluster Solution with Continuous Access XP The following section describes how to configure a continental cluster solution using Continuous Access XP, which requires the Metrocluster Continuous Access product.
Setting up a Primary Package on the Primary Cluster Use the procedures in this section to configure a primary package on the primary cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster.
Completing and Running a Continental Cluster Solution with Continuous Access XP
171
NOTE: Neither the primary cluster nor the recovery cluster may configure an XP series paired volume, PVOL or SVOL, as a cluster lock disk. A cluster lock disk must always be writable. Since it cannot be guaranteed that either half of a paired volume is always writable, neither half may be used as a cluster lock disk. A configuration with a cluster lock disk that is part of a paired volume is not a supported configuration. 1. 2.
Create and test a standard Serviceguard cluster using the procedures described in the Managing Serviceguard user’s guide. Install Continentalclusters on all the cluster nodes in the primary cluster (skip this step if the software has been pre installed). NOTE:
Serviceguard should already be installed on all the cluster nodes.
Run swinstall(1m) to install Continentalclusters and Metrocluster Continuous Access (Continuous Access) products from an SD depot. 3.
When swinstall(1m) has completed, create a directory as follows for the new package in the primary cluster. # mkdir /etc/cmcluster/ Create an Serviceguard package configuration file in the primary cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize it as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Only after primary packages start, use cmmodpkg to enable package switching on all primary packages. Enabling package switching in the package configuration would automatically start the primary package when the cluster starts. However, had there been a primary cluster disaster, resulting in the recovery package starting and running on the recovery cluster, the primary package should not be started until after first stopping the recovery package.
4.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD, and SERVICE_RESTART parameters. Set LV_UMOUNT_COUNT to 1 or greater.
172
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
5.
6.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env. # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env
7.
Edit the environment file _xpca.env as follows: a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_. It is recommended that you retain the default values of these variables unless you have a specific business requirement to change them. See Appendix A for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. d. Uncomment the DEVICE_GROUP variable and set it to this package’s Raid Manager device group name, as specified in the Raid Manager configuration file. e. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used or disabled. If Raid Manager protection facility is enabled set it to the name of the HORCM permission file. f. Uncomment the HORCMINST variable and set it to the Raid Manager instance name used by Metrocluster/Continuous Access. g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array. h. If using asynchronous data replication, set the HORCTIMEOUTvariable to a value greater than the side file timeout value configured with the Service Processor (SVP), but less than the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default setting is the side file timeout value + 60 seconds. i. Uncomment the CLUSTER_TYPE variable and set it to CONTINENTAL.
8.
Distribute Metrocluster/Continuous Access configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ Completing and Running a Continental Cluster Solution with Continuous Access XP
173
other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 9. Apply the Serviceguard configuration using the cmapplyconf command or SAM. 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Metrocluster/Continuous Access package control script pkgname_xpca.env
Metrocluster/Continuous Access environment file
pkgname.ascii
Serviceguard package ASCII configuration file
pkgname.sh
Package monitor shell script, if applicable
other files
Any other scripts you use to manage Serviceguard packages.
The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster/Continuous Access. 11. Edit the file /etc/rc.config.d/raidmgr, specifying the Raid Manager instance to be used for Continentalclusters, and specify the instance is to be started at boot time. The appropriate Raid Manager instance used by Continentalclusters must be running before the package is started. This normally means the Raid Manager instance must be started before starting Serviceguard. 12. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and package failover. 13. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted at this time.
Setting up a Recovery Package on the Recovery Cluster Use the procedures in this section to configure a recovery package on the recovery cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster.
174
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
NOTE: Neither the primary cluster nor the recovery cluster may configure an XP series paired volume, PVOL or SVOL, as a cluster lock disk. A cluster lock disk must always be writable. Since it cannot be guaranteed that either half of a paired volume is always writable, they may not be used as a cluster lock disk. Using a disk as a cluster lock disk that is part of a paired volume is not a supported configuration. 1. 2.
Create and test a standard Serviceguard cluster using the procedures described in the Managing Serviceguard user’s guide. Install Continentalclusters on all the cluster nodes in the recovery cluster (skip this step if the software has been pre installed). NOTE:
Serviceguard should already be installed on all the cluster nodes.
Run swinstall(1m)to install Continentalclusters and Metrocluster Continuous Access products from an SD depot. The toolkit integration scripts, environment file and contributed scripts will reside in the /opt/cmcluster/toolkit/SGCA and /usr/sbin directories. 3.
When swinstall(1m) has completed, create a directory as follows for the new package in the recovery cluster. # mkdir /etc/cmcluster/ Create an Serviceguard package configuration file in the recovery cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize it as appropriate to your application. Make sure to include the pathname of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Do not usecmmodpkg to enable package switching on any recovery package. Enabling package switching will automatically start the recovery package. Package switching on a recovery package will be automatically set by the cmrecovercl command on the recovery cluster when it successfully starts the recovery package.
4.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard. standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. Completing and Running a Continental Cluster Solution with Continuous Access XP
175
NOTE: Some of the control script variables, such as VG and LV, on the recovery cluster must be the same as on the primary cluster. Some of the control script variables, such as, FS, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART are probably the same as on the primary cluster. Some of the control script variables, such as IP and SUBNET, on the recovery cluster are probably different from those on the primary cluster. Make sure that you review all the variables accordingly. 5.
6.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env. # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env
7.
176
Edit the environment file _xpca.env as follows: a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_. It is recommended that you retain the default values of these variables unless you have a specific business requirement to change them. See Appendix A for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. d. Uncomment the DEVICE_GROUP variable and set it to this package’s Raid Manager device group name, as specified in the Raid Manager configuration file. e. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used or disabled. If Raid Manager protection facility is enabled set it to the name of the HORCM permission file. f. Uncomment the HORCMINST variable and set it to the Raid Manager instance name used by Metrocluster/Continuous Access. g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
h. If you are using asynchronous data replication, set the HORCTIMEOUTvariable to a value greater than the side file timeout value configured with the Service Processor (SVP), but less than the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default setting is the side file timeout value + 60 seconds. i. Uncomment the CLUSTER_TYPE variable and set it to CONTINENTAL. 8.
Distribute Continentalclusters/Continuous Access configuration, environment and control script files to other nodes in the cluster by using ftp or rcp. # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue.
9. Apply the Serviceguard configuration using the cmapplyconf command or SAM. 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: bkpbkgname.cntl Metrocluster/Continuous Access package control script bkpkgname_xpca.env
Metrocluster/Continuous Access environment
file
bkpkgname.ascii
Serviceguard package ASCII configuration file
bkpkgname.sh
Package monitor shell script, if applicable
other files
Any other scripts you use to manage Serviceguard packages
11. Edit the file /etc/rc.config.d/raidmgr, specifying the Raid Manager instance to be used for Continentalclusters, and specify that the instance be started at boot time. NOTE: The appropriate Raid Manager instance used by Continentalclusters must be running before the package is started. This normally means that the Raid Manager instance must be started before Serviceguard is started. 12. Make sure the packages on the primary cluster are not running. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg) test the recovery cluster for cluster and package startup and package failover. 13. Any running package on the recovery cluster that has a counterpart on the primary cluster should be halted at this time.
Completing and Running a Continental Cluster Solution with Continuous Access XP
177
Setting up the Continental Cluster Configuration The steps below are the basic procedure for setting up the Continentalclusters configuration file and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2: “Designing a Continental Cluster”. 1.
Generate the Continentalclusters configuration. # cmqueryconcl -C cmconcl.config
2.
Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. When data replication is done using Continuous Access XP, there are no data sender and receiver packages. Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step.
3. 4.
5.
Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/ scripts to/etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The desired result is to have the monitor package start automatically when the cluster is formed. Apply the monitor package to both cluster configurations. # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config
6.
Apply the continental cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/ cmclconfig nor is there an equivalent file for Continentalclusters. # cmapplyconcl -C cmconcl.config
7.
Start the monitor package on both clusters. NOTE: The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status.
8.
178
Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
9.
Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster. 10. View the status of the continental cluster primary and recovery clusters, including configured event data. # cmviewconcl -v The continental cluster is ready for testing. (See “Testing the Continental Cluster” (page 91))
Switching to the Recovery Cluster in Case of Disaster It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following: • • •
During an alert, the cmrecovercl will not start the recovery packages unless the -f option is used. During an alarm, the cmrecovercl will start the recovery packages without the -f option. When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up.
Failback Scenarios The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following discussion addresses some of those failures and suggests recovery approaches applicable to environments using data replication provided by HP StorageWorks XP series disk arrays and Continuous Access. In Chapter 2: “Designing a Continental Cluster”, there is a discussion of failback mechanisms and methodologies in “Restoring Disaster Tolerance” (page 99). Scenario 1 The primary site has lost power, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the XP disk array or the operating systems of the systems at the primary site. Scenario 2 The primary site XP disk array experienced a catastrophic hardware failure and all data was lost on the array.
Completing and Running a Continental Cluster Solution with Continuous Access XP
179
Failback in Scenarios 1 and 2 After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. Each Continentalclusters package control script that invokes Metrocluster Continuous Access XP will evaluate the status of the XP paired volumes. Since neither the systems nor the XP disk array at the primary site are accessible, the control file will initially report the paired volumes with a local status of SVOL_PAIR or SVOL_PSUE (in ASYNC mode) and a remote status of EX_ENORMT, PSUE or PSUS, indicating that there is an error accessing the primary site. The control file script is programmed to handle this condition and will enable the volume groups, mount the logical volumes, assign floating IP addresses and start any processes as coded into the script. NOTE: In ASYNC mode, the package will halt unless a force flag is present or unless the auto variable AUTO_SVOLPSUE is set to 1. The fence level of the paired volume—NEVER, ASYNC, or DATA—will not impact the starting of the packages at the recovery site. The Metrocluster CAXP pre-integrated solution will perform the following command with regards to the paired volume. # horctakeover -g -S Subsequently, the paired volume will have a status of SVOL_SWSS. To view the local status of the paired volumes. # pairvolchk -g -s To view the remote status of the paired volumes. # pairvolchk -g -c (While the remote XP disk array and primary cluster systems are down, the command will time out with an error code of 242.) After power is restored to the primary site, or when a newly configured array is brought online, the XP paired volumes may have either a status of PVOL_PSUE on the primary site or SVOL_SWSS on the secondary site. The following procedure applies to this situation: 1. While the package is still running, from the recovery host. # pairresync -g -c 15 -swaps This starts the resynchronization, which can take a long time if the entire primary disk array was lost or a short time if the primary array was intact at the time of failover. 2.
When resynchronization is complete, halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg
180
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the paired volumes will remain SVOL_PAIR at the recovery site and PVOL_PAIR at the primary site. 3.
4.
Start the cluster at the primary site. Assuming they have been properly configured, the Continentalclusters primary packages should not start. The monitor package should start automatically. Manually start the Continentalclusters primary packages at the primary site. # cmrunpkg
5.
Ensure that the monitor packages at the primary and recovery sites are running.
Failback when the Primary has SMPL Status Use the following procedure when the primary site paired volumes have a status set to SMPL, possibly through manual intervention: 1. Halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the paired volumes will remain SMPL at the recovery site and PSUE at the primary site. 2.
3.
Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically. Since the paired volumes have a status of SMPL at both the primary and recovery sites, the XP views the two halves as unmirrored. From a system at the primary site, manually create the paired volume. # paircreate -g -f -vr -c 15 See the XP Raid Manager user’s guide on more paircreate command options. Since the most current data will be at the remote or recovery site, this will synchronize the data from the remote or recovery site (use of the -vr option directs the command to synchronize from the remote site). Wait for the synchronization process to complete before proceeding to the next step. Failure to wait for the synchronization to complete will result in the package failing to start in the next step.
4.
Manually start the Continentalclusters primary packages at the primary site. # cmrunpkg
Completing and Running a Continental Cluster Solution with Continuous Access XP
181
The control script is programmed to handle this case. The control script recognizes that the paired volume is synchronized and will proceed with the programmed package startup. 5.
Ensure that monitor packages are running at both sites.
Maintaining the Continuous Access XP Data Replication Environment Resynchronizing After certain failures, data are no longer remotely protected. In order to restore disaster-tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored. Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection: • failure of ALL Continuous Access links without restart of the application • failure of ALL Continuous Access links with Fence Level DATA with restart of the application on a primary host • failure of the entire recovery Data Center for a given application package • failure of the recovery XP disk array for a given application package while the application is running on a primary host Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated by moving the application package back to its primary host after repairing the failure. • • • •
failure of the entire primary Data Center for a given application package failure of all of the primary hosts for a given application package failure of the primary XP disk array for a given application package failure of all Continuous Access links with application restart on a secondary host
NOTE: The preceding steps are automated provided the default value of 1 is being used for the auto variable AUTO_PSUEPSUS. Once the Continuous Access link failure has been fixed, the user only needs to halt the package at the failover site and restart on the primary site. However, if you want to reduce the amount of application downtime, you should manually invoke pairresync before failback. Full resynchronization must be manually initiated (as described in the next section) after repairing the following failures: • •
failure of the recovery XP disk array for a given application package followed by application startup on a primary host failure of all Continuous Access links with Fence Level NEVER or ASYNC with restart of the application on a primary host
Pairs must be manually recreated if both the primary and recovery XP disk arrays are in the SMPL (simplex) state. 182
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Make sure you periodically review the following files for messages, warnings and recommended actions. It is recommended to review these files after system, data center and/or application failures: • • •
/var/adm/syslog/syslog.log /etc/cmcluster//.log /etc/cmcluster/.log
Using the pairresync Command The pairresync command can be used with special options after a failover in which the recovery site has started the application and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the Continuous Access link is fixed, depending on which site you are on, use the pairresync command in one of the following two ways: • pairresync -swapp—from the primary site. • pairresync -swaps—from the failover site. These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL, which is now running on the primary site. During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site. Some Further Points •
This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the XP disk array or to synchronize. NOTE: Long delays in package startup time will occur in those situations when recovering from broken pair affinity.
•
•
The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time due to getting status from the XP disk array. (See the previous paragraph for more information on the extra startup time). Online cluster configuration changes may require a Raid Manager configuration file to be changed. Whenever the configuration file is changed, the Raid Manager instance must be stopped and restarted. The Raid Manager instance must be running before any Continentalclusters package movement occurs. Completing and Running a Continental Cluster Solution with Continuous Access XP
183
•
•
•
184
A given file system must not reside on more than one XP frame for either the PVOL or the SVOL. A given LVM Logical Volume (LV) must not reside on more than one XP frame for either the PVOL or the SVOL. The application is responsible for data integrity, and must use the O_SYNC flag when ordering of I/Os is important. Most relational database products are examples of applications that ensure data integrity by using the O_SYNC flag. Each host must be connected to only the XP disk array that contains either the PVOL or the SVOL. A given host must not be connected to both the PVOL and the SVOL of a continuous access pair.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
4 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA The HP StorageWorks Enterprise Virtual Array (EVA) allows you to configure data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the Continuous Access EVA software and the additional files that integrate the EVA with Serviceguard clusters. It then shows how to configure metropolitan cluster solutions using Continuous Access EVA. The topics discussed in this chapter are: • • • • •
Files for Integrating the EVA with Serviceguard Clusters Overview of EVA and Continuous Access EVA Concepts Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA
Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323).
Files for Integrating the EVA with Serviceguard Clusters Metrocluster consists of a script, program files, and an environment file that work in an Serviceguard metropolitan cluster to automate failover to alternate nodes in the case of a disaster. The Metrocluster Continuous Access EVA product contains the following files. Table 4-1 Metrocluster Continuous Access EVA Template Files Name
Description
/usr/sbin/DRCheckDiskStatus
The script that checks for a specific environment file in the package directory and executes the specific storage DR check program. This file should not be edited.
/usr/sbin/DRCheckCA EVADevGrp The program that manages the Continuous Access EVA DR group that is used by the package. /usr/sbin/smispasswd
The utility that is used to define the information about Management Server and SMI-S that are used in the solution.
/usr/sbin/evadiscovery
The utility that is used to define the information about EVA storage and DR groups that are used in the solution. Files for Integrating the EVA with Serviceguard Clusters
185
Table 4-1 Metrocluster Continuous Access EVA Template Files (continued) Name
Description
/opt/cmcluster/toolkit/SGCA EVA/smiseva.conf
The Metrocluster Continuous Access EVA Management Server and SMI-S configuration template. This file must be edited for the specific Management Server and SMI-S information before using it.
/opt/cmcluster/toolkit/SGCA EVA/mceva.conf
The Continuous Access EVA configuration template. This file must be edited for the specific EVA storage cells and DR Group information to be used in a Metrocluster environment before using it.
/opt/cmcluster/toolkit/SGCA EVA/caeva.env
The Metrocluster Continuous Access EVA environment file. This file must be customized for specific EVA DR groups and Serviceguard packages. Copies of this file must be customized for each separate Serviceguard package.
/opt/cmcluster/toolkit/SGCA EVA/Samples
A directory containing sample convenience shell scripts that must be edited before using. These shell scripts may help to automate some configuration tasks. These scripts are contributed, and not supported.
Metrocluster Continuous Access EVA software has to be installed on all nodes that will run a Serviceguard package whose data is on an HP StorageWorks EVA and where the data is replicated to a second EVA using the Continuous Access EVA facility. In the event of a node failure, the integration of Metrocluster Continuous Access EVA with the package will allow the application to fail over in the following ways: • •
Among local host systems attached to the same EVA. Between one system that is attached locally to its EVA and another “remote” host that is attached locally to the other EVA.
Configuration of Metrocluster Continuous Access EVA must be done on all the cluster nodes, as is done for any other Serviceguard package. To use Metrocluster Continuous Access EVA, Command View EVA and SMI-S EVA must also be installed and configured on the Management Server.
Overview of EVA and Continuous Access EVA Concepts Continuous Access EVA provides remote data replication from primary EVA systems to remote EVA systems. Continuous Access EVA uses the remote-copy function of the Hierarchical Storage Virtualization (HSV) controller running the controller software (VCS or XCS) to achieve host-independent data replication. This section describes some basic Continuous Access EVA terminology, concepts, and features. The topics discussed are: • • • 186
Data Replication Copy Sets DR Groups
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
• • •
Log Disk Managed Sets Failover
Metrocluster with EVA and Data Replication The HSV controller pairs at the primary location are connected to their partner HSV controller pairs at the alternate location. To configure storage for data replication, a source Vdisk is specified in the primary storage system. The destination Vdisk is then created by the controller software at the remote storage system. As data is written to the source Vdisk, it is mirrored to the destination Vdisk. Applications continue to run while data replication goes on in the background over a separate interconnect. When a storage system contains both source Vdisks and destination Vdisks, it is said to be bidirectional. A given storage system can have a bi-directional data replication relationship with only one other storage system, and an individual Vdisk can have a uni-directional replicating relationship with only one other Vdisk. The remote copy feature is intended not only for disaster recovery, but also to replicate data from one storage system or physical site to another storage system or site. It also provides a method for performing a backup at either the source or destination site. DR Groups A data replication (DR) group is a software construct comprising one or more Vdisks in an HSV storage system so that they: • Replicate to the same specified destination storage array • Fail over together • Preserve write order within the data replication collection groups • Share a log disk All virtual disks used for replication must belong to a DR group, and a DR group must contain at least one Vdisk. A DR group can be thought of as a collection of copy sets. The replicating direction of a DR group is always from a source to a destination. By default, the storage system on which the source Vdisk is created is called the home storage system. The home designation denotes the preferred storage system for the source and this designation can be changed to another storage system. A DR group contains pointers to another DR group for replication. A DR group replicating from a home storage system to a destination system is in the original state. When replication occurs from a storage system that was created as the destination to the home storage system (for example, after a failover, which is discussed later), it is in a reversed state.
Overview of EVA and Continuous Access EVA Concepts
187
DR Group Properties
Properties are defined for every DR group that is created. DR group properties are described below: • •
•
• •
•
Name: A unique name given to each DR group. HP recommends that the names of replicating DR groups at the source and destination be the same. DR Mode — Source: A DR group established as an active source that replicates to a passive destination. — Destination: A DR group established as a passive destination that receives replication data from an active source. Failsafe mode: When this mode is enabled, all source Vdisks become both unreadable and unwritable if the destination Vdisk is unreachable. This condition is known as failsafe-locked and may require immediate intervention. When the failsafe mode is disabled and the destination Vdisk is unreachable, normal logging occurs. Connected system: A pointer to the storage system where the DR group is replicated. Write mode: — Asynchronous mode: When a write operation provides an I/O completion acknowledgement to the host after data is delivered to cache at the source controller, but before data delivery to cache on the destination controller. — Synchronous mode: An I/O completion acknowledgement is sent to the host after data is written to the source and destination caches. Suspension: — Suspend: When this command is enabled and failsafe mode is not enabled, I/O replication is halted between the source and destination Vdisks. Source Vdisks continue to run I/O locally and the I/O is also copied to a log Vdisk. — Resume: When this command is enabled, replication resumes between the source and destination Vdisks. Merging of the log Vdisk or a full copy is also performed.
Log Disk The DR group has storage allocated on demand called a log. The virtual log collects host write commands and data if access to the destination storage system is severed. When a connection is later re-established, the contents of the log are written to the destination Vdisk to synchronize it with the source Vdisk. This process of writing the log contents, in the order that the writes occurred, is called merging. Sometimes it is more practical to copy the source Vdisk directly to the destination Vdisk. This copy operation is called a “full copy-all 1-MB”. There is no manual method for forcing a full copy. It is an automatic process that occurs when a log is full. If synchronous replication is configured, a log can be in one of the following states: 188
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
• • •
Normal: No source Vdisk is logging or merging. Logging: At least one source Vdisk is logging (capturing host write commands), but none are merging. Merging: At least one source Vdisk is merging and logging.
When a DR group is in a logging state, the log will grow in proportion to the amount of write I/O being sent to the source Vdisks. As the log grows, more space must be allocated out of the available capacity of the disk group where it is a member. The capacity available to the log does not include the spare capacity or any capacity being used by Vdisks, snapshots, or Snapclones. This means the log disk will never overwrite any other data. Similarly, when a DR group is logging, the available capacity for creating Vdisks, snapshots, and Snapclones does not include the capacity already used by the log disk. Therefore, a log disk will never be overwritten by any other data. When creating disk groups and distributing Vdisks within them, sufficient capacity must remain for log disks to expand to their maximum level. The log is declared full, and reaches its maximum level, whenever the first of the following conditions is reached: • The size of the log data file exceeds twice the capacity of the DR group. • No free space remains in the physical disk group. • The log reaches 2 TB of Vraid1 (4 TB total).
Copy Sets Vdisks are user-defined storage allotments of virtual or logical data storage. A pairing relationship can be created to automatically replicate a logical disk to another logical disk. The generic term for this is a copy set. A relationship refers to the arrangement created when two storage systems are partnered for the purpose of replicating data between them. A Vdisk does not have to be part of a copy set. Vdisks at any site can be set up for local storage and used for activities such as testing and backup. Clones and snapclones are examples of Vdisks used in this manner. When a Vdisk is not part of a copy set, it is not disaster tolerant, but it can use various Vraid types for failure tolerance.
Managed Sets A managed set is a collection of DR groups selected for the purpose of managing them. For example, a managed set can be created to manage all DR groups of a particular application that reside in separate storage arrays.
Failover The recovery process whereby one DR group, managed set, fabric, or controller switches over to its backup is called a failover. The process can be planned or unplanned. A planned failover allows an orderly shutdown of the system before the redundant system takes over. An unplanned failover occurs when a failure or outage occurs that may not
Overview of EVA and Continuous Access EVA Concepts
189
allow an orderly transition of roles.Listed below are several types of Continuous Access EVA failovers: •
• • •
DR group failover: An operation to reverse the replication direction of a DR group. A DR group can have a relationship with only one other DR group, and a storage system can have a relationship with only one other storage system. Managed set failover: An operation to reverse the replication direction of all DR groups in the managed set. Fabric or path failover: The act of transferring I/O operations from one fabric or path to another. Controller failover: When a controller assumes the workload of its partner (within the same storage system).
Continuous Access EVA Management Software Metrocluster Continuous Access EVA requires the following two software components to be installed in the Management Server: • •
HP StorageWorks Command View EVA (CV EVA). This software component allows you to configure and manage the storage and DR group via a web browser interface. Storage Management Interface Specification (SMI-S). The SMI-S EVA software provides the Storage Management Interface Specification (SMI-S) interface for the management of EVA arrays. Metrocluster Continuous Access EVA software uses WBEM API to communicate with SMI-S to automatically manage the DR Groups that are used in the application packages.
Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA When the following procedures are completed, an adoptive node will be able to access the data belonging to a package after it fails over.
Setting up the Storage Hardware 1.
2. 3.
4.
190
Before configuring Metrocluster Continuous Access EVA, the EVA must be correctly cabled with redundant paths to each node in the cluster that will run packages accessing data on the array. Install and configure the hardware components of the EVA, including HSV controllers, disk arrays, SAN switches, and Management Server. Install and configure CV EVA and SMI-S EVA on the Management Server. For the installation and configuration process, refer to the HP StorageWorks Command View EVA Installation Guide. Start CV EVA User Interface (CV EVA-UI). You can configure virtual disks and DR groups using the CV EVA web user interface shown in Figure 4-1.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Figure 4-1 Configuration of Virtual Disks and DR groups
For more detailed information on setting up Command View EVA for configuring, managing and monitoring your HP StorageWorks Enterprise Virtual Array Storage System, refer to the HP StorageWorks Command View EVA User Guide. After a DR group is created, only the source volume (primary volume) is visible and accessible with Read/Write mode. The destination volume (secondary volume) by default is not visible and accessible to its local hosts. The destination volume access mode needs to be changed to Read-only mode before the DR group can be used. The destination volumes need to be presented to its local host. NOTE: In the Metrocluster Continuous Access EVA environment, it is required that the destination volume access mode be set to read-only mode. The destination Vdisk read-only mode can be changed by using the SSSU command for HP-UX. When executing the SSSU command, it needs to be executed against the storage cell that holds the source Vdisk of the DR group. For users who are not familiar with the SSSU command, an input sample file is provided below and in the following location: /opt/cmcluster/toolkit/SGCAEVA/ Samples/sssu_sample_input. select manager 15.13.244.182 user=administrator pass=administrator select system DC-1 set DR_GROUP “\Data Replication\DRG_DB1” accessmode=readonly show DR_GROUP “\Data Replication\DRG_DB1” NOTE: For more detailed information on the sssu commands used in the sample input file, refer to the sssu ReadMe file found at /opt/cmcluster/toolkit/ SGCAEVA/Samples/Readme.sssu_sample_\ input Follow the steps below when copying and editing the sample file: Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
191
1.
Copy the sample file /opt/cmcluster/toolkit/SGCAEVA/Samples/ sssu_sample_input to the /etc/dtsconf/directory. # cp /opt/cmcluster/toolkit/SGCAEVA \/Samples/sssu_sample_input /etc/dtsconf/sssu_input
2. 3.
Customize the file sssu_input. After you customize the sssu_input file, run the SSSU command as follows to set the destination Vdisk to read-only mode # /sbin/sssu “FILE ”
4.
After changing the access mode of the destination Vdisk, it is necessary to run the ioscan command and the insf command on remote clustered nodes to create the special device file name for the destination Vdisk on remote EVA.
Cluster Configuration For detailed information on Serviceguard cluster configuration, refer to the Managing Serviceguard user’s guide. The following information pertains to cluster configuration in a EVA Continuous Access environment. First create a Serviceguard cluster without specifying cluster-aware volume groups in the cluster configuration ASCII file. This is necessary because the LUNs in the EVA storage units are not read/write to all cluster nodes at configuration time. Only the LUNs configured as source volumes are read/write on one cluster site. The remote site can see those LUNs with read-only mode and therefore, the cmapplyconf command cannot succeed if volume groups are specified in the file. Volume groups are created and made cluster aware in separate steps, shown in the “Configuring Volume Groups” (page 200) of this chapter. NOTE: If your ASCII file contains volume group definitions derived from the LUNs visible on the source node, comment them out before running the cmapplyconfcommand.
Management Server/SMI-S and DR Groups Configuration The Metrocluster Continuous Access EVA product provides two utility tools for users to provide Metrocluster Continuous Access EVA software the information about the SMI-S EVA service running on the Management Servers and DR groups that will be used in Metrocluster Continuous Access EVA environment. This section discusses the smispasswd and evadiscovery tools, including the description of the tools, the tool operations, and the input file templates.The first utility, called smispasswd, is a Command Line Interface (CLI) that provides functions for defining Management Server list and SMI-S username and password pair. The second utility, called evadiscovery,is also a CLI that provides functions for defining EVA storage cells and DR group information.
192
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
When Metrocluster Continuous Access EVA program requests a storage state, it sends a request message to a local Management Server. For preparing the message, several data items need to be available so that the Metrocluster Continuous Access EVA program knows which Management Server it will communicate with. These data items include Management Server's hostname/IP address, and SMI-S username/password. Before configuring and bringing up any Metrocluster package, this is the first information that needs to be configured. Metrocluster software communicates with the SMI-S service running on the Management Server, which communicates with the EVA controller. When querying EVA storage states through the SMI-S, the code first needs to find the internal device IDs by querying and searching for a list of devices information. These processes take time and are not necessary since the IDs are static in the EVA system. To improve the query performance, the software will cache these IDs in the clustered nodes. To cache the object IDs in the clustered nodes, it is required to run the evadiscovery tool after the EVA and Continuous Access EVA are configured, and the storage is accessible from the hosts. The tool will query the active Management Server for the needed information and save it in a mapping file. It is necessary to distribute the mapping file to all the clustered nodes.
Defining Management Server and SMI-S Information To define Management Server and SMI-S information use the smispasswd tool. The following steps describe the options for defining Management Server and SMI-S information: Creating the Management Server List On a host that resides on the same data center as the active management server, create the Management Server list using an input file, use the following steps: 1. 2.
Create a configuration input file (A template of this file can be found in /opt/ cmcluster/toolkit/SGCAEVA/smiseva.conf). Copy the template file /opt/cmcluster/toolkit/SGCAEVA/smiseva.conf to the /etc/dtsconf/ directory. # cp /opt/cmcluster/toolkit/SGCAEVA/smiseva.conf \/etc/dtsconf/smiseva.conf
3.
For each Management Server in your configuration (both local and remote sites), enter the Management Server’s hostname or IP address, the administrator login name, type of connection (secure or non-secure), and SMI-S name space.
An example of the smiseva.conf file is as follows: ############################################################## ## ## ## smiseva.conf CONFIGURATION FILE (template)for use with ## ## the smispasswd utility in the Metrocluster Continuous ## ## Access EVA Environment. ## Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
193
## Note: This file MUST be edited before it can be used. ## ## For complete details about Management Server/SMI-S ## ## configuration for use with Metrocluster Continuous ## ## Access EVA, consult “Designing Disaster Tolerant High ## ## Availability Clusters. ## ############################################################## ## This file provides input to the smispasswd utility, ## ## which you use to set up secure access paths between ## ## cluster nodes and SMI-S services. ## ## Edit this file to include the appropriate information ## ## about the SMI-S services that will be used in your ## ## Metrocluster Continuous Access EVA environment. ## ## After entering all the desired information, run the ## ## smispasswd command to generate the security ## ## configuration that allows cluster nodes to communicate ## ## with the SMI-S services. ## ## Below is an example configuration. The data is ## ## commented out. ## ## Hostname/IP_Address User_login_name Secure Namespace ## ## IP_Address Connection ## ## 15.13.244.182 administrator y root/EVA ## ## 15.13.244.183 administrator y root/EVA ## ## 15.13.244.192 admin12309 y root/EVA ## ## SANMA04 admin y root/EVA ############################################################ ## The example shows a list of 4 Management Server/SMI-S ## ## data in the Metrocluster Continuous Access EVA ## ## environment. Each line represents a different SMI-S’s ## ## data; fields on each line should be separated either by ## ## space(s)or tab(s). The order of fields is significant. ## ## The first field must be a hostname or IP address, the ## ## second field must be a user login name on the host. The ## ## third field must be ‘y’ or ‘n’ to use SSL connect. The ## ## last field must be the namespace of the SMI-S service. ## ## For details of each field data, refer to the smispasswd ## ## man page, ‘man smispasswd’. ## ############################################################# ## Note: Lines beginning with the pound sign (#) are ## ## comments. You # cannot use the ‘#’ character in your ## ## data entries. Enter your SMI-S services data under the ## ## dashed lines: ## ## Hostname/IP_Address User_login_name Secure Namespace ## ## IP_Address Connection ## 15.13.172.11 administrator n root/EVA ## ## 15.13.172.12 administrator n root/EVA ## ###############################################################
##
##
Fill in the Management Server information for each Management Server in your cluster configuration.
194
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
NOTE: list.
Ensure that you place the active Management Server information first on the
Creating the Management Server Mapping File Use the smispasswdcommand to create or modify the Management Server information stored in the mapping file. For each Management Server listed in the file, there will be a password prompt displayed. A username and password are required because of the security protocol for EVA and is created by your system administrator when the Management Server is configured. Input the password associated with the username of the SMI-S. Then, re-enter it (as prompted) to verify that it is correct. Example: # smispasswd -f /etc/dtsconf/smiseva.conf NOTE: For more information on configuring the username and password for SMI-S on the management server, refer HP StorageWorks Command View EVA Installation Guide. Enter password of 15.13.172.11: ********** Re-enter password of 15.13.172.11: ********** Enter password of 15.13.172.12: ********** Re-enter password of 15.13.172.12: ********** All the Management Server information has been successfully generated.
When all the passwords have been entered, the configuration is written to the map file /etc/dtsconf/caeva.map. Setting a Default Management Server Use the smispasswd command to set the active Management Server that is to be used by EVA discovery tool, which will be discussed later in the section. Example: # smispasswd -d 15.13.172.12 The Management Server 15.13.172.11 has been set as the default active SMI-S. Displaying the List of Management Servers Use the smispasswdcommand to display the current list of storage management servers that are accessible by the cluster software. Example: # smispasswd -l MC/CAEVA Server list: HOST USERNAME USE_SSL ------------------------------------------------------15.13.172.11 administrator N 15.13.172.12 administrator N
NAMESPACE root/EVA root/EVA
Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
195
Adding or Updating Management Server Information To add or update individual Management Server information, use the following command options shown in Table 4-2: smispasswd -h -n -u -s Table 4-2 Individual Management Server Information Command Options
Description
-h
This is either a DNS resolvable hostname or IP address of the Management Server
-n
This is the name space configured for the SMI-S CIMOM1. The default namespace is root/EVA.
-u
This is the user name used to connect to SMI-S. The user name and password is the same as those used with the sssu tool.
s
This option specifies the type of connection needed to be established between Metrocluster software and the SMI-S CIMOM.
“y”
This option allows secure connection to Management Server using the HTTPS protocol (HTTP using Secure Socket Layer encryption).
“n”
This option means a secure connection is not required.
1
CIMOM - Common Information Model Object Manager, a key component that routes information between providers and clients.
When you issue the command with these options, the “Enter password:” will prompt you to input the password associated with the username. After inputting a password and issuing the command, the “Re-enter password:” request will prompt you to re-enter the same password again for verification. Subsequently, this command will either add new or update the existing Management Server information to the map file. In addition, this command will add a new record if it does not find the in the mapping file. Otherwise it only updates the record. Examples: % smispasswd -h 15.13.244.202 -u administrator -n root/EVA -s y Enter password: ********** Re-enter password: ******** A new information has been successfully created % % smispasswd -h 15.13.244.203 -u administrator -n root/EVA -s n 196
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Enter password: ********** Re-enter password: ******** A new information has been successfully created % % smispasswd -h 15.13.244.202 -s n Enter password: ********** Re-enter password: ******** The information has been successfully updated % Deleting a Management Server To delete a Management Server from the group used by the cluster, use the smispasswd command with the -r option. Example: # smispasswd -r 15.13.172.12 The Management Server 15.13.172.11 has been successfully removed from the file
Defining EVA Storage Cells and DR Groups On the same node, which the management server list was created, define the EVA storage cells and DR Groups information to be used in the Metrocluster Continuous Access EVA environment, and use the evadiscovery tool with the following steps: 1. Create a configuration input file. This file will contain the names of storage pairs and DR groups. (A template of this file can be found in /opt/cmcluster/ toolkit/SGCAEVA/mceva.conf) 2. Copy the template file /opt/cmcluster/toolkit/SGCA EVA/ mceva.conf)to the /etc/dtsconf directory: # cp /opt/cmcluster/toolkit/SGCAEVA/mceva.conf \ /etc/dtsconf/mceva.conf 3.
4. 5.
For each pair of storage units, enter the WorldWideName (WWN) of the first and second storage units. The WWN can be found on the front of the panel of the EVA controller or from the Command View EVA user interface. For each pair of storage units, enter the names of all DR groups that are managed by that storage pair. Save the file.
The following is an example of the mceva.conf file. Fill in the file as in the following example: ############################################################## ## mceva.conf CONFIGURATION FILE (template) for use with ## ## the evadiscovery utility in the Metrocluster Continuous ## Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
197
## Access EVA Environment. ## ## Version: A.01.00 ## ## Note: This file MUST be edited before it can be used. ## ## For complete details about EVA configuration for use ## ## with Metrocluster Continuous Access EVA, consult the ## ## manual “Designing Disaster Tolerant High Availability ## ## Clusters”. ## ############################################################## ## This file provides input to the evadiscovery utility, ## ## which you use to generate the /etc/dtsconf/caeva.map ## ## file. During Metrocluster Continuous Access EVA ## ## configuration, this file is copied to all cluster nodes. ## ## Edit the file to include the appropriate data about the ## ## EVA storage systems and DR groups that will be used in ## ## your Metrocluster Continuous Access EVA environment. ## ## After entering all the desired information, run the ## ## evadiscoverycommand to generate the mapping data and save## ## it in a map file. ## ## Note: Before running evadiscovery, you need to use the ## ## smispasswd command to create a SMI-S services ## ## configuration. ## ## Enter the data for storage device pairs and DR groups ## ## after the and tags. ## ## The tag represents the starting ## ## definition of a storage pair and its DR groups. Under a ## ## tag, you must provide two storage ## ## Node World Wide Name (WWN)which both contain the DR groups## ## defined under the tag. You can define as ## ## many DR groups as you need, but each DR group must belong ## ## to only one of the storage pairs. A storage pair can have ## ## a maximum of 64 DR groups. ## ## Note that you can find storage Node World Wide Names form ## ## the front panel of your EVA controllers or from the ## ## ‘Initialized Storage Properties’ page of command view ## ## EVA through your Web browser. ## ## Below is an example of a configuration with two storage ## ## pairs (4 storage units). The first storage pair contains ## ## 2 DR groups and the second pair contains 1 DR group. ## ## ## ## “5000-1FE1-5000-4280” Enter first storage WWN in double ## ## quotes. ## ## “5000-1FE1-5000-4180” Enter second storage WWN in double ## ## quotes. ## ## ## ## “DR Group - Package1” Enter a DR group name in double ## ## quotes. ## ## “DR Group - OracleDB1” Enter a DR group name in double ## ## quotes. ## ## ## ## “5000-1FE1-5000-4081” Enter first storage WWN in double ## ## quotes. ## ## “5000-1FE1-5000-4084” Enter second storage WWN in double ## 198
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
quotes. "DR Group - Package2” Enter a DR group name in double quotes. Note:Since '#’ meant a start of a comment, you cannot include the ‘#’ in any , , storage name and DR group name.
## ## ## ## ## ## ## ## Note: All the storage and DR Group names should be ## enclosed in double quotes (““), otherwise the ## evadiscovery command will not detect them. ## Enter your MC EVA Storage pairs and DR Groups under the ## # dashed lines: ## ----------------------------------------------------------## ## “5000-1FE1-5000-00DF” ## “5000-1FE1-5000-00DE” ## ## “DR Group 1” ## “DR Group 2” ## “DR Group 3” ## “DR Group 4” ##
Creating the Storage Map File After completing the EVA Storage Cells and DR Groups configuration file, use the EVA discovery utility to create or modify the storage map file stored on the configuration node. # evadiscovery -f /etc/dtsconf/mceva.conf % Verifying the storage systems and DR Groups ……… Generating the mapping data ………… Adding the mapping data to the file /etc/dtsconf/caeva.map ……… The mapping data is successfully generated.
The command generates the mapping data and stores it in /etc/dtsconf/caeva.map The mapping file/etc/dtsconf/caeva.mapcontains information of the Management Servers as well as information of the EVA Storage Cells and DR Groups. Copying the Storage Map File After running the smispasswd and evadiscovery commands to generate the /etc/ dtsconf/caeva.map file, copy this file to all cluster nodes so that they can be used by Metrocluster Continuous Access EVA to communicate with the EVA units. Be sure to use the same full pathname. Displaying Information about Storage Devices Use the evadiscovery command to display information about the storage systems and DR groups in your configuration. Example: Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
199
# evadiscovery -l % MC EVA Storage Systems and DR Groups map list: Storage WWN: 5000-1FE1-5000-4280 DR Group Name: DR Group - PkgA DR Group Name: DR Group - PkgB Storage WWN: 5000-1FE5-5000-4288 DR Group Name: DR Group - PkgA DR Group Name: DR Group - PkgB
NOTE: Before running the evadiscovery command, the management server configuration must be completed using the smispasswd. Otherwise, the evadiscovery command, will fail. NOTE: Run the discovery tool after all storage DR Groups are configured or when there is any change to the storage device. For example, the user removes and recreates a DR group that is used by an application package. In this case the DR Group's internal IDs are regenerated by the EVA system. Update the external configuration file if any name of storage systems or DR groups is changed, run the evadiscovery utility, and redistribute the map file /etc/dtsconf/caeva.mapto all Metrocluster clustered nodes.
Verifying the EVA Configuration Use the following checklist to verify the configuration. Figure 4-2 EVA Configuration Checklist Redundant Management Servers configured and accessible to all nodes. Source and Destination volumes created for use with all packages. Management Servers Security configuration is complete (smispasswd command). EVA mapping is complete (evadiscovery command). /etc/dtsconf/caeva.map file is copied to all cluster nodes.
Configuring Volume Groups This section describes the required steps to create a volume group for use in a Metrocluster Continuous Access EVA environment. Identifying Special Device File Name for Vdisk in DR Group using Secure Path V3.0D or V3.0E For each Vdisk in a DR group use CV EVA to retrieve its own unique World Wide Name (WWN) identifier. To identify the special device file name for the matching WWN identifier in a single clustered node use: # spmgr display 200
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Below is a sample output after running the spmgr command: TGT/LUN 0/ 3
Device c12t0d3
WWLUN_ID H/W_Path 6000-1FE1-0016-6C30-0009-2030-2549-000A 255/0.0.3 Path_Instance HBA
Controller Path_Status ZG20302549 c4t0d4 c10t0d4 Controller Path_Instance Path_Status ZG20400420 c6t0d4 c8t0d4 0/ 4
Preferred?
td1 td3 HBA
no no Active no Available Preferred?
td1 td3
no no Standby no Standby
c12t0d4
6000-1FE1-0016-6C30-0009-2030-2549-000E 255/0.0.4 Path_Instance HBA
Controller Path_Status ZG20302549 c4t0d3 c10t0d3 Controller Path_Instance Path_Status ZG20400420 c6t6d3 c8t6d3
#_Paths 4
4 Preferred?
td1 td3 HBA
no no Active no Available Preferred?
td1 td3
no no Standby no Standby
From the output file, look for the special device file name that corresponds to the WWN identifier of the Vdisk in the DR group. Use the special device file while creating the volume group, which is described in section, “Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F”. The EVA Command View for the WWN Identifier of the Vdisk is shown in Figure 4-3. Figure 4-3 EVA Command View for the WWN Identifier
For more detailed information on setting up Command View EVA for configuring, managing, and monitoring your HP StorageWorks Enterprise Virtual Array Storage System, refer to the HP StorageWorks Command View EVA Getting Started Guide.
Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
201
Identifying Special Device Files using Secure Path v3.0F As described in the previous section, for each Vdisk in a DR group, use CV EVA to retrieve its own unique World Wide Name (WWN) identifier. When Secure Path v3.0F is used for path failover capabilities all the paths to the vdisk are visible. To identify the special device file names for the matching WWN identifier. # autopath display Below is a sample output after running the autopath command: ================================================================ HPswsp Version : A.3.0F.00F.00F ================================================================= Array WWN : 5000-1FE1-5000-2EE0 ================================================================= Lun WWN : 6005-08B4-0010-0E01-0001-B000-0287-0000 Load Balancing Policy : No Load Balancing ================================================================= Device Path Status ================================================================= /dev/dsk/c3t0d1 Active /dev/dsk/c9t0d1 Active /dev/dsk/c15t0d1 Active /dev/dsk/c21t0d1 Active /dev/dsk/c4t0d1 Active /dev/dsk/c10t0d1 Active /dev/dsk/c16t0d1 Active /dev/dsk/c22t0d1 Active ================================================================= Lun WWN : 6005-08B4-0010-0E01-0001-B000-028E-0000 Load Balancing Policy : No Load Balancing ================================================================= Device Path Status ================================================================= /dev/dsk/c3t0d2 Active /dev/dsk/c9t0d2 Active /dev/dsk/c15t0d2 Active /dev/dsk/c21t0d2 Active /dev/dsk/c4t0d2 Active /dev/dsk/c10t0d2 Active /dev/dsk/c16t0d2 Active /dev/dsk/c22t0d2 Active
From the output display identify the device file listing that corresponds with the WWN of the vdisk in the DR group. In the above sample listing there are eight device files that correspond to different paths of the same vdisk. Use any one of the device file names while creating a volume group, which is described in section, “Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F”. The CV EVA display can be used to identify the WWN for a vdisk. 202
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Identifying Special Device Files for PVLinks Configuration LVM PVlink feature can be used to handle path failovers to a storage device. The following describes how to identify device files for a Vdisk while setting up volume group using PVlinks. Use the RSM HV Mapper Tool or the HP StorageWorks EVAInfo tool to display the special device files that correspond to the WWN of the vdisk in the DR group. For RSM HV Mapper Tool # RSM_HV_Mapper.pl Collecting Host Volume info might take time. Please wait.Collecting Host Volume info from mc-node1.cup.hp.com.Collecting Host Volume info from mc-node2.cup.hp.com.Collecting Host Volume info from mc-node3.cup.hp.com.Collecting Host Volume info from mc-node4.cup.hp.com.Collecting Host Volume data done. See HostVolTable.txt for results. The HostVolTable.txt output file provides a mapping of the devices file to vdisks for all the hosts that are RSM enabled. In addition, the tool displays the WWID of the vdisk and the storage system to which the vdisk belongs. In the following sample listing there are eight device files that correspond to different paths to the same vdisk. Use all the device files identified while creating a volume group which is described in section, “Configuring Volume Groups using PVLinks”. ======================= mc-node1.cup.hp.com ======================= Virtual Disk Name..: \\XL-1\Vdisk001-DRGSynDCN Disk...............: /dev/dsk/c16t0d1 Disk...............: /dev/dsk/c17t0d1 Disk...............: /dev/dsk/c18t0d1 Disk...............: /dev/dsk/c20t0d1 Disk...............: /dev/dsk/c12t0d1 Disk...............: /dev/dsk/c13t0d1 Disk...............: /dev/dsk/c14t0d1 Disk...............: /dev/dsk/c15t0d1 World Wide Lun ID..: 6005-08b4-0010-203d-0000-6000-0017-0000 Virtual Disk Name..: \\XL-1\Vdisk002-DRGSynDCS Disk...............: /dev/dsk/c16t0d5 Disk...............: /dev/dsk/c17t0d5 Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
203
Disk...............: /dev/dsk/c18t0d5 Disk...............: /dev/dsk/c20t0d5 Disk...............: /dev/dsk/c12t0d5 Disk...............: /dev/dsk/c13t0d5 Disk...............: /dev/dsk/c14t0d5 Disk...............: /dev/dsk/c15t0d5 World Wide Lun ID..: 6005-08b4-0010-299b-0000-a000-002f-0000 For more information on configuring and installing RSM and RSM_HV_Mapper tool, contact your HP representative. For EVAInfo Tool # evainfo -w wwn This command displays device file information for vdisk with the specified vdisk WWN. Use HP StorageWorks Command View EVA to determine the WWN of the vdisks. The tool also displays information on the connected port, the controller on EVA and if the device path is active and optimized for I/O for that LUN. The following sample displays eight device files that correspond to different paths to the same vdisk. Use all the device files identified while creating a volume group which is described in section, "Configuring Volume Groups using PVLinks". Devicefile Array WWNNCapacity Controller/Port/Mode /dev/rdsk/c12t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-1/NonOptimized /dev/rdsk/c13t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-3/NonOptimized/dev/rdsk/c14t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-2/NonOptimized/dev/rdsk/c15t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-4/NonOptimized/dev/rdsk/c16t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-1/Optimized/dev/rdsk/c17t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-3/Optimized/dev/rdsk/c18t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-2/Optimized/dev/rdsk/c19t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-4/Optimized
Following is a sample output of the command on HP-UX 11i v3: # evainfo -P -w wwn Devicefil Array /dev/rdisk/disk10
204
WWNNCapacity Controller/Port/Mode 5000-1FE1-5007-DBA0
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
6005-08B4-0010-786B-0000-A000-02DB-0000 Ctl-A/FP-3/NonOptimized
2048MB
For more information on using the EVAInfo tool, see HP StorageWorks EVAInfo Release Notes. Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F Use the following procedure to create volume groups for source volumes and export them for access by other nodes. NOTE: Create volume groups only for source storage on a locally connected EVA unit. To create volume groups for source volumes on EVA unit located at the remote site, it is necessary to log onto a node located at that site before configuring the volume groups. The sample scriptmk1VGsin the /opt/cmcluster/toolkit/SGCAEVA/Samples directory can be modified to automate these steps. 1.
Define the appropriate Volume Groups on each node that might run the application package. Use the following commands: # mkdir /dev/vgname # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the cluster.
2.
Create the Volume Groups on source volume. Use the following commands: # pvcreate -f /dev/dsk/cxtydz" ---> "pvcreate -f /dev/rdsk/cxtydz
3. 4.
Create the logical volume(s) for the volume group. De-activate the Volume Groups. # vgchange -a n /dev/vgname
5.
Start the cluster and clusterize the Volume Groups. # cmruncl (if cluster is not already up and running) # vgchange -c y /dev/vgname
6.
Test activating the Volume Groups with exclusive option. # vgchange -a e /dev/vgname
7.
Create a back-up config file that will contain the cluster ID, having already an ID on disks/luns. # vgcfgbackup /dev/vgname
Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
205
8.
Use the vgexport command with the -p option to export the Volume Groups on the primary system without removing the HP-UX device files. # vgexport -s -p -m mapfile /dev/vgname Make sure that you copy the map files to all of the nodes. The sample script Samples/ftpit shows a semi-automated way (using ftp) to copy the files. It is necessary to only enter the password interactively.
9.
De-activate the volume group. # vgchange -a n /dev/vgname
Configuring Volume Groups using PVLinks Use the following steps to create volume groups for source volumes using PVLinks and export them for access by other nodes. NOTE: Create volume groups only for source storage on a locally connected EVA unit. To create volume groups for source volumes on EVA unit located at the remote site, it is necessary to log onto a node located at that site before configuring the volume groups. 1.
Define the appropriate Volume Groups on each node that run the application package with the following commands: # mkdir /dev/vgname # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the cluster.
2.
Create the Volume Groups on the source volume, which uses PVLink for path failover. All the special device files names associated for the vdisk as identified in the section “Identifying Special Device Files for PVLinks Configuration”. The following commands are an example of how VG using Pvlink is created for the vdisk identified by WWN 6005-08b4-0010-203d-0000-6000-0017-0000: # # # # # # # # #
3.
pvcreate vgcreate vgextend vgextend vgextend vgextend vgextend vgextend vgextend
-f /dev/dsk/c16t0d1 /dev/vgname /dev/dsk/c16t0d1 /dev/vgname /dev/dsk/c17t0d1 /dev/vgname /dev/dsk/c18t0d1 /dev/vgname /dev/dsk/c20t0d1 /dev/vgname /dev/dsk/c12t0d1 /dev/vgname /dev/dsk/c13t0d1 /dev/vgname /dev/dsk/c14t0d1 /dev/vgname /dev/dsk/c15t0d1
De-activate the Volume Groups. # vgchange -a n /dev/vgname
206
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
4.
Start the cluster and configure the Volume Groups. # cmruncl (if cluster is not already up and running) # vgchange -c y /dev/vgname
5.
Test the Volume Groups activation with exclusive option. # vgchange -a e/dev/vgname
6.
Create a back-up conffile that will contain the cluster ID, already having an ID on disks/luns. # vgcfgbackup /dev/vgname
7.
Use the vgexport command with the-p option to export the Volume Groups on the primary system without removing the HP-UX device files. # vgexport -s -p -m mapfile /dev/vgname Make sure to copy the map files to all of the nodes. The sample script Samples/ ftpit shows a semi-automated way (using ftp) to copy the files. Only enter the password interactively.
8.
De-activate the volume group. # vgchange -a n /dev/vgname
Importing Volume Groups on Nodes at the Same Site Use the following procedure to import volume groups on cluster nodes located at the same site as the EVA on which you are doing the Logical Volume Manager configuration. The sample script mk2imports can be modified to automate these steps. NOTE: Before running vgimport, it is necessary to create the directory under the /dev directory and create the group file. 1.
Define the Volume Groups on all nodes at the same site that will run the Serviceguard package. # mkdir /dev/vgname # mknod /dev/vgname/group c 64 0xnn0000
2.
Import the Volume Groups on all nodes at the same site that will run the Serviceguard packages. # vgimport -vs -m mapfile /dev/vgname
3.
Activate the Volume Groups and back up the configuration. # vgchange -a e /dev/vgname # vgcfgbackup /dev/vgname
Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA
207
See the sample script Samples/mk2imports. 4.
De-activate the Volume Groups. # vgchange -a n /dev/vgname
NOTE: Exclusive activation must be used for all volume groups associated with packages that use EVA. The design of Metrocluster Continuous Access EVA assumes that only one node in the cluster will have a Volume Group activated at a time. Importing Volume Groups on Nodes at the Remote Site Use the following procedure to import volume groups on all cluster nodes located at the site of the remote EVA. The sample script mk2imports can be modified to automate these steps. 1.
Define the Volume Groups on all nodes at the same site that will run the Serviceguard package. # mkdir /dev/vgname # mknod /dev/vgname/group c 64 0xnn0000
2.
Import the Volume Groups on all nodes at the same site that will run the Serviceguard packages. # vgimport -vs -m mapfile /dev/vgname
3.
Verify the Volume Group configuration with the following procedures: • From the command view EVA, shown in Figure 4-4 failover the DR group to make it the source on the REMOTE site instead of the destination by following the steps described below: a. Select the destination site storage system from the command view EVA. b. Next select the desired Disaster Recovery group and click on “Fail Over”.
4.
Activate the Volume Groups and back up the configuration. # vgchange -a e /dev/vgname # vgcfgbackup /dev/vgname See the sample script Samples/mk2imports.
5.
De-activate the Volume Groups. # vgchange -a n /dev/vgname
6.
208
From the command view EVA, failback the SOURCE to its original site.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Figure 4-4 EVA Command View DR Group Properties
Building a Metrocluster Solution with Continuous Access EVA Configuring Packages for Automatic Disaster Recovery After completing the following steps, packages will be able to automatically fail over to an alternate node in another data center and still have access to the data that they need in order to operate. This procedure must be repeated on all the cluster nodes for each Serviceguard package so the application can fail over to any of the nodes in the cluster. Customizations include editing an environment file to set environment variables, and customizing the package control script to include customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Consult the Managing Serviceguard user’s guide for more detailed instructions on how to start, halt, and move packages and their services between nodes in a cluster. For ease of troubleshooting, configure and test one package at a time. 1.
Create a directory /etc/cmcluster/pkgname for each package: # mkdir /etc/cmcluster/pkgname
2.
Create a package configuration file. # cd /etc/cmcluster/pkgname # cmmakepkg -p pkgname.config Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ pkgname.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters.
Building a Metrocluster Solution with Continuous Access EVA
209
3.
In the .config file, list the node names in the order in which you want the package to fail over. It is recommended for performance reasons, that you have the package fail over locally first, then to the remote data center. Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time required to obtain status from the EVA. NOTE: If using the EMS disk monitor as a package resource, do not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is no access from the host to the package disks. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the EVA. Clusters with multiple packages that use devices on the EVA will all cause package startup time to increase when more than one package is starting at the same time.
4.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set FS_UMOUNT_COUNT to 1.
5.
6.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. Refer to the Managing Serviceguard user’s guide for more detailed information on these functions. Copy the environment file template/opt/cmcluster/toolkit/SGCAEVA/ caeva.env to the package directory, naming it pkgname_caeva.env: # cp /opt/cmcluster/toolkit/SGCAEVA/caeva.env \ /etc/cmcluster/pkgdir/pkgname_caeva.env
210
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
NOTE: If not using a package name as a filename for the package control script, it is necessary to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (caeva) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_caeva.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_caeva.env. 7.
Edit the environment file _caeva.env as follows: a. Set the CLUSTER_TYPE variable to METRO if this a Metrocluster. b. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. The operator may create the FORCEFLAG file in this directory. See Appendix B for an explanation of these variables. c. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. d. Set the WAIT_TIMEvariable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster. e. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created. f. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. g. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. h. Set the DC1_HOST_LIST variable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. Building a Metrocluster Solution with Continuous Access EVA
211
i. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. j. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. k. Set the DC2_HOST_LIST variable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. l. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds. 8.
After customizing the control script file and creating the environment file, and before starting up the package, do a syntax check on the control script using the following command (be sure to include the -n option to perform syntax checking only): # sh -n If any messages are returned, it is recommended to correct the syntax errors.
9.
Distribute Metrocluster Continuous Access EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue.
10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Seviceguard package control script
212
pkgname_caeva.env
Metrocluster Continuous Access EVA environment file
pkgname.config
Serviceguard package ASCII configuration file
pkgname.sh
Package monitor shell script, if applicable
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
other files
Any other scripts used to manage Serviceguard packages
11. Check the configuration using the cmcheckconf -P pkgname.config, then apply the Serviceguard configuration using the cmapplyconf -P pkgname.config command or SAM. The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster Continuous Access EVA.
Maintaining a Cluster that Uses Metrocluster Continuous Access EVA While the package is running, a manual storage failover on Continuous Access EVA outside of Metrocluster Continuous Access EVA software can cause the package to halt due to unexpected condition of the Continuous Access EVA volumes. It is recommended that no manual storage failover be performed while the package is running. A manual change of Continuous Access EVA link state from suspend to resume is allowed to re-establish data replication while the package is running. Continuous Access EVA Link Suspend and Resume Modes Upon Continuous Access links recovery, Continuous Access EVA automatically normalizes (the Continuous Access EVA term for “synchronizes”) the source Vdisk and destination Vdisk data. If the log disk is not full, when a Continuous Access connection is re-established, the contents of the log are written to the destination Vdisk to synchronize it with the source Vdisk. This process of writing the log contents, in the order that the writes occurred, is called merging. Since write ordering is maintained, the data on the destination Vdisk is consistent while merging is in progress. If the log disk is full, when a Continuous Access connection is re-established, a full copy from the source Vdisk to the destination Vdisk is done. Since a full copy is done at the block level, the data on the destination Vdisk is not consistent until the copy completes. If all Continuous Access links fail and if failsafe mode is disabled, the application package continues to run and writes new I/O to source Vdisk. The virtual log in EVA controller collects host write commands and data; DR group's log state changes from normal to logging. When a DR group is in a logging state, the log will grow in proportion to the amount of write I/O being sent to the source Vdisks. If the links are down for a long time, the log disk may be full, and full copy will happen automatically upon link recovery. If primary site fails while copy is in progress, the data in destination Vdisk is not consistent, and is not usable. To prevent this, after all Continuous Access links fail, it is recommended to manually put the Continuous Access link state to suspend mode by using the Command View EVA UI. When Continuous Access link is in suspend
Building a Metrocluster Solution with Continuous Access EVA
213
state, Continuous Access EVA will not try to normalize the source and destination Vdisks upon links recovery until you manually change the link state to resume mode. Normal Maintenance There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Metrocluster Continuous Access EVA: 1.
Stop the package with the appropriate Serviceguard command. # cmhaltpkg pkgname
2.
Distribute the Metrocluster Continuous Access EVA configuration changes. # cmapplyconf -P pkgname.config
3.
Start the package with the appropriate Serviceguard command. # cmmodpkg -e pkgname
Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that the nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators” (page 31). Failback After resynchronization is complete, halt the package on the failover site, and restart it on the primary site. Metrocluster will then do a failover of the storage, which will trigger Continuous Access EVA to swap the personalities between the source and the destination Vdisks, returning source status to the primary site. Cluster Re-Configuration There might be situations when the cluster has to be re-configured due to maintenance purposes. The following procedure is recommended for re-configuration of the Metrocluster Continuous Access EVA: 1.
214
Before running the cmapplyconf -Ccommand, it is necessary to remove the cluster awareness from the Metrocluster volume groups. This is done by halting
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
all Metrocluster packages by running the appropriate Serviceguard command on the source side. # vgchange -a n 2.
Halt the entire cluster and apply your changes with the Serviceguard command. # cmapplyconf -C
3.
Re-start the cluster and mark the cluster ID on all Metrocluster volume groups. Run on the source side. # vgchange -c y
Completing and Running a Continental Cluster Solution with Continuous Access EVA The following section describes how to configure a continental cluster solution using Continuous Access EVA, which requires the HP Metrocluster with Continuous Access EVA product. NOTE: Make sure to have completed the preparation for the Metrocluster Continuous Access EVA as described in section, “Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA ” on both primary and recovery sites.
Setting up a Primary Package on the Primary Cluster Use the procedures in this section to configure a primary package on the primary cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. 1.
Install Continentalclusters on all the cluster nodes in the primary cluster (skip this step if the software has been pre installed). Run swinstall(1m) to install HP Continentalclusters from an SD depot.
2.
When swinstall(1m) has completed, create a directory for the new package in the primary cluster: # mkdir /etc/cmcluster/ Create a Serviceguard package configuration file in the primary cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize the Serviceguard package configuration file as appropriate to your application. Be sure to include the pathname of the control script /etc/ cmcluster// .cntl for the RUN_SCRIPT and HALT_SCRIPT parameters. Completing and Running a Continental Cluster Solution with Continuous Access EVA
215
Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Only after the primary packages start, use cmmodpkg to enable package switching on all primary packages. By enabling package switching in the package configuration, it will automatically start the primary package when the cluster starts. However, had there been a primary cluster disaster, resulting in the recovery package starting and running on the recovery cluster, the primary package should not be started until after first stopping the recovery package. 3.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTARTparameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater
4.
5.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Copy the environment file template: /opt/cmcluster/toolkit/SGCA/caeva.env to the package directory, naming it pkgname_caeva.env: # cp /opt/cmcluster/toolkit/SGCA/caeva.env \/etc/cmcluster/pkgname/pkgname_caeva.env NOTE: If a package name is not used as a filename for the package control script, it is required to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (caeva) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_caeva.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_caeva.env.
6.
216
Edit the environment file _caeva.env as follows: a. Set the CLUSTER_TYPE variable to CONTINENTAL b. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/ package_name, removing any quotes around the file names. The operator
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
c. d.
e. f.
g.
may create the FORCEFLAG file in this directory. See Appendix B for a description of these variables. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. Set the WAIT_TIME variable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified.
h. Set the DC1_HOST_LIST variable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. i. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. j. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. k. Set the DC2_HOST _LISTvariable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. l. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds.
Completing and Running a Continental Cluster Solution with Continuous Access EVA
217
7.
Distribute Metrocluster Continuous Access EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp. # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname
8. 9.
Apply the Serviceguard configuration using the cmapplyconf command or SAM. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname pkgname.cntl Serviceguard package control script pkgname_caeva.env
Metrocluster Continuous Access EVA environment file
pkgname.ascii
Serviceguard package ASCII configuration file
pkgname.sh
Package monitor shell script, if applicable
other files
Any other scripts used to manage Serviceguard packages
The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster Continuous Access EVA. 10. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and package failover. 11. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted at this time.
Setting up a Recovery Package on the Recovery Cluster Use the procedures in this section to configure a recovery package on the recovery cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. Use the following steps for the recovery package set up: 1.
Install Continentalclusters on all the cluster nodes in the recovery cluster (skip this step if the software has been pre installed). NOTE: Serviceguard should already be installed on all the cluster nodes. Run swinstall(1m) to install Continentalclusters from an SD depot.
2.
When swinstall(1m) has completed, create a directory as follows for the new package in the recovery cluster. # mkdir /etc/cmcluster/ Create an Serviceguard package configuration file in the recovery cluster. # cd /etc/cmcluster/
218
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
# cmmakepkg -p .ascii Customize it as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Do not use cmmodpkg to enable package switching on any recovery package. Enabling package switching will automatically start the recovery package. Package switching on a recovery package will be automatically set by the cmrecovercl command on the recovery cluster when it successfully starts the recovery package. 3.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. NOTE: Some of the control script variables, such as VG and LV, on the recovery cluster must be the same as on the primary cluster. Some of the control script variables, such as, FS, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART are probably the same as on the primary cluster. Some of the control script variables, such as IP and SUBNET, on the recovery cluster are probably different from those on the primary cluster. Make sure that you review all the variables accordingly.
4.
5.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See Managing Serviceguard for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env: # cp /opt/cmcluster/toolkit/SGCA/caeva.env \ /etc/cmcluster/pkgname/pkgname_caeva.env
6.
Edit the environment file _caeva.env as follows: a. Set the CLUSTER_TYPE variable to CONTINENTAL b. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/ package_name, removing any quotes around the file names. The operator may create the FORCEFLAG file in this directory. See Appendix B for an explanation of these variables. c. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. Completing and Running a Continental Cluster Solution with Continuous Access EVA
219
d. Set the WAIT_TIME variable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster. e. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created. f. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. g. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. h. Set the DC1_HOST_LISTvariable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. i. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. j. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. k. Set the DC2_HOST _LIST variable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. l. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds. 7.
Distribute Metrocluster Continuous Access EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes.
220
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 8. 9.
Apply the Serviceguard configuration using the cmapplyconf command or SAM. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: bkpbkgname.cntl Serviceguard package control script bkpkgname_caeva.env
Metrocluster Continuous Access EVA environment file
bkpkgname.ascii
Serviceguard package ASCII configuration file
bkpkgname.sh
Package monitor shell script, if applicable
other files
Any other scripts you use to manage Serviceguard packages
10. Make sure the packages on the primary cluster are not running. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg) test the recovery cluster for cluster and package startup and package failover. 11. Any running package on the recovery cluster that has a counterpart on the primary cluster should be halted at this time.
Setting up the Continental Cluster Configuration The steps below are the basic procedure for setting up the Continentalclusters configuration file and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2: “Designing a Continental Cluster”. 1.
Generate the Continentalclusters configuration using the following command: # cmqueryconcl -C cmconcl.config
2.
Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. When data replication is done using Continuous Access EVA, there are no data sender and receiver packages. Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step.
3. 4.
Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/ scripts to/etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the Completing and Running a Continental Cluster Solution with Continuous Access EVA
221
5.
AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The monitor package should start automatically when the cluster is formed. Apply the monitor package to both cluster configurations. # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config
6.
Apply the continental cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/ cmclconfig nor is there an equivalent file for Continentalclusters. Example: # cmapplyconcl -C cmconcl.config
7.
Start the monitor package on both clusters. NOTE: The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status.
8.
Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. 9. Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster. 10. View the status of the continental cluster primary and recovery clusters, including configured event data. # cmviewconcl -v The continental cluster is now ready for testing. See “Testing the Continental Cluster” (page 91).
Switching to the Recovery Cluster in Case of Disaster It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following: • • •
222
During an alert, the cmrecovercl will not start the recovery packages unless the -foption is used. During an alarm, the cmrecovercl will start the recovery packages without the -f option. When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
Failover to Recovery Site After reception of the Continentalcluster’s alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The recovery package control script will evaluate the status of the DR group used by the package, and will do the failover of the DR group to the EVA in the recovery site. This means after the failover was successful, the DR group in the recovery site's EVA will be source and accessible with read/write mode. NOTE: If the Continuous Access links between the two EVAs are down, the recovery package will only start up if one of the following conditions are true: • The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package’s environment file is set to “Availability_Preferred”. • The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package's environment file is set to “ Data_Currency_Preferred”, and a FORCE_FLAG file exits in the package directory. After the recovery package is up and running, the EVA in the recovery site will have more current data than the one in the primary site.
Failover Scenarios The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following scenarios addresses some of those failures and suggests recovery approaches applicable to environments using data replication provided by HP StorageWorks EVA series disk arrays and Continuous Access. Scenario 1 The primary site has lost power for a prolonged time, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the EVA disk array or the operating systems of the systems at the primary site. Failback to the Primary Site In this scenario, the EVA in the primary site is down due to the loss of power; therefore, the storage configuration information and the application data prior to power failure remain intact in the EVA. When the primary site’s power is restored, the EVA is up and running, and Continuous Access links are up, Continuous Access EVA software will automatically resynchronize the data from the recovery site's EVA back to the primary site’s EVA. If the resynchronization is a full copy operation, the data in the Completing and Running a Continental Cluster Solution with Continuous Access EVA
223
primary site's EVA is not consistent and is not usable until the full copy (resynchronization) completes. It is recommended to wait until the resynchronization is complete before failing back the packages to the primary site. The state of the DR group in the primary site’s EVA can be checked either via Command View (CV) EVA or SSSU command. If the state of each Vdisk in the DR group is shown “Normal”, then the resynchronization is complete, and the user can move the packages back to the primary site. Scenario 2 The primary site HP StorageWorks EVA disk array experienced a catastrophic hardware failure and all data was lost on the array. Failback to the Primary Site In this scenario the disk array is repaired or a new EVA array is commissioned at the primary site. Before the application can fail back to the primary site, the EVA in the recovery site (now is the source storage) needs to establish the replication relationship with the new EVA in the primary site (now is the destination storage). Refer to the procedure named “Return Operations to Replaced New Storage Hardware” in the “Continuous Access EVA Operation Guide” to rebuild the DR groups configured in the EVA. Once the DR groups re-build and the destination storage is synchronized with the source storage, the packages can be failed back to the primary site. Scenario 3 The primary site has lost power, which only impact the systems in the primary cluster. The primary cluster is down but the EVA disk array and Continuous Access links to the recovery site are up and running. Failback in Scenario 3 In this scenario the EVA disk arrays in both sites are up and running. The Continuous Access links are functional. When the recovery packages are up and running on the recovery site, Continuous Access EVA automatically switches the replication direction; the new data written on the recovery site's EVA is replicated to the primary site's EVA. After the primary cluster is back online, the packages can be failed back to the primary site. Reconfiguring Recovery Group Site Identities in Continentalclusters after a Recovery In a disaster scenario where the primary site goes out of operation, and there was no loss of data on the disk array or the servers. After the recovery is completed the recovered application can continue to run at the recovery site without requiring to fail back when the primary cluster becomes available at a later point in time.
224
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA
This will avoid further downtime for the recovered application. But it will also be desired to have the same level of recovery capabilities for the applications in their new site, as they had in their original primary site. As described in the above scenario, Continentalclusters can be reconfigured to provide monitoring and recovery for the application now running on its recovery cluster. This is done by switching the identities of the sites in the applications context. (that is, the old (or original) primary site will become the recovery site and the old (or original) recovery site will become the primary site. This type of reconfiguration for Continentalclusters is possible only in a two cluster and two site configuration. Continentalclusters solutions using HP StorageWorks EVA disk arrays will need no disk array replication related tasks during the reconfigurations. Once the primary site EVA Disk array comes back online, the HP StorageWorks EVA Continuous Access will automatically resynchronize the data making the recovery site as “source” and the old primary site as “destination”. Use the cmswitchconclcommand (only in a two cluster configuration) to swap the site identities for all or a selected application’s recovery group. This is so that the applications can now be monitored and recovered from their once primary cluster.
Completing and Running a Continental Cluster Solution with Continuous Access EVA
225
226
5 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF The EMC and Symmetrix Remote Data Facility (EMC SRDF) disk arrays allows configuration of physical data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the EMC SRDF software and the additional files that integrate the EMC with Serviceguard clusters. It then shows how to configure both metropolitan and continental cluster solutions using EMC SRDF. The topics discussed in this chapter are: • • • • • • •
Files for Integrating Serviceguard with EMC SRDF Overview of EMC and SRDF Concepts Preparing the Cluster for Data Replication Building a Metrocluster Solution with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication Building a Continental Cluster Solution with EMC SRDF
Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323).
Files for Integrating Serviceguard with EMC SRDF Metrocluster is a set of executable programs, scripts and an environment file that work in an Serviceguard cluster to automate failover to alternate nodes in the case of disaster in a metropolitan cluster. The Metrocluster with EMC SRDF product contains the following files:
Files for Integrating Serviceguard with EMC SRDF
227
Table 5-1 Metrocluster with EMC SRDF Template Files Name
Description
/opt/cmcluster/toolkit/SGSRDF/srdf.env
The Metrocluster with EMC SRDF environmental file. This file must be customized for the specific EMC Symmetrix, and HP 9000 and, HP Integrity Servers host system configuration. Copies of this file must be customized for each separate Serviceguard package.
/opt/cmcluster/toolkit/SGSRDF/samples
A directory containing sample convenience shell scripts that must be edited before using. These shell scripts may help to automate some configuration tasks. These scripts are contributed, and not supported.
/usr/sbin/DRCheckDiskStatus
The script that checks for a specific environment file in the package directory and should not be edited.
/usr/sbin/DRCheckSRDFDevGrp
The program that manages the SRDF device group that is used by the package.
Metrocluster with EMC SRDF has to be installed on all nodes that will run a Serviceguard package that accesses data on an EMC Symmetrix where the data are replicated to a second Symmetrix using the SRDF facility. In the event of node failure, the integration of Metrocluster with EMC SRDF with the package will allow the application to fail over in the following ways: • •
Among local host systems that are attached to the same EMC Symmetrix. Between one system that is attached locally to its EMC Symmetrix and another “remote” host that is attached locally to the other EMC Symmetrix.
Metrocluster with Symmetrix SRDF is specifically for configuring one or more Serviceguard packages whose data reside on EMC Symmetrix ICDAs (Integrated Cache Disk Arrays) and replicated with SRDF (Symmetrix Remote Data Facility). Metrocluster with Symmetrix SRDF can be used in metropolitan cluster configuration. The distance between the two data centers is limited by the distance of the Symmetrix arrays physical connection requirements, and the distance of Serviceguard heartbeat round-trip time (0 < 200 ms), whichever is less. Symmetrix configurations can be either 1 by 1 (one Symmetrix at each data center) or M by N (one or two Symmetrix frames at each data center). Configuration of Metrocluster with EMC SRDF must be done on all the cluster nodes, as is done for any other Serviceguard package. To use Metrocluster with EMC SRDF, Symmetrix host-based software for control and status of the EMC Symmetrix disk arrays must also be installed and configured on each HP 9000 and HP Integrity Servers host system that would execute the application package.
228
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Overview of EMC and SRDF Concepts EMC and Symmetrix Remote Data Facility (SRDF) is a Symmetrix-based business continuance and disaster recovery solution. SRDF is a configuration of Symmetrix systems, the purpose of which is to maintain multiple, real-time copies of logical volume data in more than one location. The Symmetrix systems can be in the same room, in different buildings within the same campus, or hundreds of kilometers apart. By maintaining real-time copies of data in different physical locations, SRDF enables the following operations with minimal impact on normal business operations: • • • • •
Disaster Recovery Recovery from planned outages Remote backup Data center migration Data Replication and Mobility
Figure 5-1 EMC R1 and R2 Definitions
Symmetrix Array
Symmetrix Array
B1
R1
B2
R2
Optional BVCs
R1a
SRDF link may be bidirectional for different disk devices
There may be multiple R1/R2 devices
Data Center A
Packages with primary nodes in this data center. See this Symmetrix as the R1 side and the Symmetrix in Data Center B as the R2 side.
R2
B2
R1
B1
R2a
Optional BVCs
Data Center B Packages with primary nodes in this data center. See this Symmetrix as the R1 side and the Symmetrix in Data Center A as the R2 side.
Preparing the Cluster for Data Replication When the following procedures are completed, an adoptive node will be able to access the data belonging to a package after it fails over. Use the convenience scripts in the /opt/cmcluster/toolkits/SGSRDF/Samples to automate some of the tasks in the following sections: • • •
mk3symgrps.nodename —to create EMC Symmetrix device groups mk4gatekpr.nodename— to create gatekeeper devices mk2imports— to import volume groups Overview of EMC and SRDF Concepts
229
• • •
ftpit— to copy the configuration to other nodes in the cluster pre.cmquery— to split SRDF links before applying the package configuration post.cmapply— to restore SRDF links after applying the package configuration
These scripts should be copied from /opt/cmcluster/toolkits/SGSRDF to another directory, such as /etc/cmcluster/SRDF.
Installing the Necessary Software Before any configuration can begin, make sure the following software is installed on all nodes: • •
•
Symmetrix EMC Solutions Enabler software allows the management of the Symmetrix disks from the node. Symmetrix PowerPath software should be installed if you are building an M by N configuration using PowerPath. However, if you are building an M by N configuration using RDF Enginuity Consistency Assist (RDF-ECA), you need to install only Symmetrix EMC Solutions Enabler. You do not have to install any other software. Metrocluster with Symmetrix SRDF should be installed according to the instructions in the Metrocluster with EMC SRDF Release Notes.
NOTE: For Metrocluster/SRDF version A.05.01 and earlier, M by N configurations using PowerPath only are supported. As a result, the PowerPath software is a prerequisite for using an M by N configuration with Metrocluster.
Building the Symmetrix CLI Database The Symmetrix CLI (Command Line Interface) should be installed on all nodes running packages that use data on the EMC Symmetrix disk arrays. Create the EMC Solutions Enabler database on each system using the following steps. (Refer to the Symmetrix EMC Solutions Enabler manual). Issue the following command on each node after the hardware is installed. # symcfg discover This builds the CLI database on the node. Display what is in the EMC Solutions Enabler database. • • •
symdg list symld -g symdevgrpname list symgate list
If the EMC Solutions Enabler database is not configured, the following error message will be displayed: The Symmetrix configuration could not be loaded for a locally attached Symmetrix
230
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
NOTE: Do not set the SYMCLI_SID and SYMCLI_DG environment variables before running thesymcfg command. These environment variables limit the amount of information gathered when the EMC Solutions Enabler database is created, and therefore will not be a complete database. Also, the SYMCLI_OFFLINE variable should not be set since this environment variable disables the command line interface.
Determining Symmetrix Device Names on Each Node To correctly specify the device file names when creating Symmetrix device groups, be sure to map the HP-UX device files to the R1 and R2 Symmetrix devices. Use the following steps to gather the necessary information: 1.
Obtain a list of data for the Symmetrix devices available, using the following command on each node without any options: # syminq Sample output from both the R1 and R2 sides is shown in Figure 5-2 and Figure 5-3. Figure 5-2 Sample syminq Output from a Node on the R1 Side Device Name /dev/rdsk/c0t0d0 /dev/rdsk/c0t0d1 /dev/rdsk/c0t0d2 /dev/rdsk/c0t0d3 /dev/rdsk/c0t1d0 /dev/rdsk/c0t1d1 /dev/rdsk/c0t1d0 /dev/rdsk/c0t1d1 /dev/rdsk/c0t1d2 /dev/rdsk/c1t2d0 /dev/rdsk/c1t2d1 /dev/rdsk/c1t2d2
Type
Product Vendor ID
Rev
Ser Num
Cap(KB)
R1 R1 R2 R2 BCV BCV GK GK GK R1 R1 R2
EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC
5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264
95004160 95005160 95006160 95007160 95024160 95025160 95040160 95041160 95042160 95004320 95005320 95006320
4418880 4418880 4418880 4418880 4418880 4418880 2880 2880 2880 4418880 4418880 4418880
Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix
Preparing the Cluster for Data Replication
231
Figure 5-3 Sample syminq Output from a Node on the R2 Side Device Name /dev/rdsk/c4t0d0 /dev/rdsk/c4t0d1 /dev/rdsk/c4t0d2 /dev/rdsk/c4t0d3 /dev/rdsk/c4t1d0 /dev/rdsk/c4t1d1 /dev/rdsk/c4t1d0 /dev/rdsk/c3t1d1 /dev/rdsk/c3t1d0 /dev/rdsk/c3t1d1 /dev/rdsk/c3t3d0 /dev/rdsk/c3t3d1 /dev/rdsk/c3t3d2
2.
Type
Product Vendor ID
Rev
Ser Num
Cap(KB)
R2 R2 R1 R1 BCV BCV GK GK BCV BCV R2 R2 R1
EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC
5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264
50014321 50015321 50016321 50017321 50034321 50035321 50040321 50041321 50030161 50031161 50004161 50005161 50006161
4418880 4418880 4418880 4418880 4418880 4418880 2880 2880 4418880 4418880 4418880 4418880 4418880
Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix
The following information is needed from these listings for each Symmetrix logical device: • HP-UX device file name (for example, /dev/rdsk/c3t3d2). • Device type (R1, R2, BCV, GK, or blank) • Symmetrix serial number (for example, 50006161), useful in matching the HP-UX device names to the actual devices in the Symmetrix configuration downloaded by EMC support staff. This number is further explained in Figure 5-4. Figure 5-4 Parsing the Symmetrix Serial Number
}
} }
50 006 161 Symmetrix ID unique device number
host adapter and port numbers
— The Symmetrix ID is the same as the last two digits of the serial number of the Symmetrix frame, in this example50. — The next three hexadecimal digits are the unique Symmetrix device number that is seen in the output of the status command: # symrdf -g symdevgrpname query
232
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
This is used by the Metrocluster with Symmetrix SRDF control script and saved in the file /etc/cmcluster/package_name/symrdf.out. The contents of this file may be useful for debugging purposes. — The next three digits indicate the Symmetrix host adapter (SA or FA) and port numbers; this is useful to see multiple host links to the same Symmetrix device. For example, PV links will show up as two HP-UX device file names with the same device number, but with different host adapter and port numbers. 3.
Use the symrdf command on each Symmetrix disk array (that is, from both the R1 and the R2 side) to pair the logical device names for the R1 and R2 sides of each SRDF link: # symrdf list Sample output is shown in Figure 5-5 and Figure 5-6. NOTE:
The format of output varies depending on the symrdf version.
Sample symrdf list Output from R1 Side Symmetrix ID: 000187400684 Local Device View ------------------------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --- ---- ------------0196 0197 0198 0199 019A 019B 019C 019C
0012 0013 0014 0015 0016 0017 0018 0019
R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5
RW RW RW RW RW RW RW RW
RW RW RW RW RW RW RW RW
RW RW RW RW RW RW RW RW
S.. S.. S.. S.. S.. S.. S.. S..
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized 0 RW WD Synchronized
Preparing the Cluster for Data Replication
233
Figure 5-5 Sample symrdf list Output from R1 Side Local Device View STATUS Sym
RDF
MODES
RDF S T A T E S
-------- ----- --------- R1 Ivn
Dev
RDev Typ:G SA RA LNK Mode Dom
ACp Tracks
000 001 004 005 006 007 008 009
000 001 004 005 006 007 008 009
OFF OFF OFF OFF OFF OFF OFF OFF
R1:1 R1:1 R1:1 R1:1 R2:2 R2:2 R1:1 R1:1
?? ?? RW RW RW RW RW RW
RW RW RW RW WD WD RW RW
RW RW RW RW RW RW RW RW
SYN SYN SYN SYN SYN SYN SYN SYN
DIS DIS DIS DIS DIS DIS DIS DIS
0 0 0 0 0 0 0 0
R2 Ivn --------------Tracks Dev RDev Pair 0 0 0 0 0 0 0 0
RW RW RW RW WD WD RW RW
NR NR WD WD RW RW WD WD
Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized
Sample symrdf list Output from R2 Side Local Device View ------------------------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --- ---- ------------0012 0013 0014 0015 0016 0017 0018 0019
234
0196 0197 0198 0199 019A 019B 019C 019D
R2:13 R2:13 R2:13 R2:13 R2:13 R2:13 R2:13 R2:13
WD WD WD WD WD WD WD WD
WD WD WD WD WD WD WD WD
RW RW RW RW RW RW RW RW
S.. S.. S.. S.. S.. S.. S.. S..
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
WD RW WD RW WD RW WD RW WD RW WD RW WD RW 0 WD RW
Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Figure 5-6 Sample symrdf list Output from R2 Side Local Device View STATUS Sym
4.
RDF
MODES
RDF S T A T E S
-------- ----- --------- R1 Ivn
Dev
RDev Typ:G SA RA LNK Mode Dom
ACp Tracks
000 001 004 005 006 007 008
000 001 004 005 006 007 008
OFF OFF OFF OFF OFF OFF OFF
R2:1 R2:1 R2:1 R2:1 R1:2 R1:2 R2:1
NR NR RW RW RW RW RW
WD WD WD WD RW RW WD
RW RW RW RW RW RW RW
SYN SYN SYN SYN SYN SYN SYN
DIS DIS DIS DIS DIS DIS DIS
R2 Ivn --------------Tracks Dev RDev Pair
0 0 0 0 0 0 0
0 0 0 0 0 0 0
NR NR WD WD RW RW WD
RW RW RW RW WD WD RW
Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized
Match the logical device numbers in the symrdf listings with the HP-UX device file names in the output from the syminq command. This displays which devices are seen from each node to ensure this node can see all necessary devices. Use the Symmetrix ID to determine which Symmetrix array is connected to the node. Then use the Symmetrix device number to determine which devices are in the same logical device seen by each node that is connected to the same Symmetrix unit. Record the HP-UX device file names in your table. Table 5-2 shows a partial mapping for a 4 node cluster connected to two Symmetrix arrays (95 and 50). There may be many R1 and R2 devices and many gatekeepers for each package, so this table will be much larger for most clusters. Also, with M by N configurations, the number of devices increases according to the number of Symmetrix frames. Table 5-2 Mapping for a 4 Node Cluster connected to 2 Symmetrix Arrays Symmetrix ID, device #, and type
Node 1 /dev/rdsk Node 2 Node 3 /dev/rdsk Nodes 4 /dev/rdsk device file name /dev/rdsk device device file name device file name file name
ID
95
c0t4d0
Dev#
005
Type
R1
ID
50
Dev#
014
Type
R2
ID
95
Dev#
00A
Type
R2
c6t0d0
c4t0d0 c0t4d0
c0t2d2 c0t4d2
Preparing the Cluster for Data Replication
235
Table 5-2 Mapping for a 4 Node Cluster connected to 2 Symmetrix Arrays (continued) Symmetrix ID, device #, and type ID
50
Dev#
012
Type
R1
ID
95
Dev#
040
Type
GK
ID
50
Dev#
041
Type
GK
ID
95
Dev#
028
Type
BCV
Node 1 /dev/rdsk Node 2 Node 3 /dev/rdsk Nodes 4 /dev/rdsk device file name /dev/rdsk device device file name device file name file name c3t0d2 c4t3d2
c0t15d0 c0t15d0
c3t15d1 c5t15d1
c4t3d2 c4t3d2 n/a
n/a
NOTE: The Symmetrix device number may be the same or different in each of the Symmetrix units for the same logical device. In other words, the device number for the logical device on the R1 side of the SRDF link may be different from the device number for the logical device on the R2 side of the SRDF link. The Symmetrix logical device numbers in these examples were configured to be the same number so the cluster is easier to manage. If reconfiguring an existing cluster, the Dev and RDev devices will probably not be the same number. When determining the configuration for the Symmetrix devices for a new installation, it is recommended to use the same Symmetrix device number for both the R1 and R2 devices. It is also recommended the same target and LUN number be configured for all nodes that have access to the same Symmetrix logical device.
Building a Metrocluster Solution with EMC SRDF Setting up 1 by 1 Configurations The most common Symmetrix configuration used with Metrocluster with EMC SRDF is a 1 by 1 configuration in which there is a single Symmetrix frame at each Data Center. This section describes how to set up this configuration using EMC Solutions Enabler and HP-UX commands. It is assumed the Symmetrix CLI database is already set up 236
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
on each node, as described in the previous section “Preparing the Cluster for Data Replication.” A basic 1 by 1 configuration is shown in Figure 5-7, which is a graphical view of the data in Table 5-2. Figure 5-7 Mapping HP-UX Device File Names to Symmetrix Units Node 1
Node 2
/dev /de /rdsk/c0t 4d0 /de v/rdsk /de v/rds /c0t2d 2 v/r k/c ds 0t1 k/c 5d 0 4t3 d2 d0 t0 c6 d2 / sk 0t4 /rd /c 0 ev dsk 15d /d ev/r k/c0t s /d v/rd /de dsk/c4t3d2 /dev/r
Symmetrix ID 95
R1 R2 GK BCV
0 c4t0d rdsk/ 2 t0d 1 3 c sk/ t15d d r / 3 v /c /de dsk v/r e /d
Symmetrix ID 50
SRDF
Data Center A
R2 R1 GK
/dev/
Node 3
/d ev / v/r rdsk ds /c /de k/c 0t v/rd 4 sk/c 4t3d d0 5t1 2 5d1
Node 4
/de
Data Center B
Creating Symmetrix Device Groups A single Symmetrix device group must be defined for each package on each node that is connected to the Symmetrix. The following procedure must be done on each node that may potentially run the package: NOTE: The sample scripts mk3symgrps.nodename can be modified to automate these steps. 1.
Use the symdg command, or modify the mk3symgrps.nodename script to define an R1 and an R2 device group for each package. # symdg create -type RDF1 devgroupname Issue the above command on nodes attached to the R1 side. # symdg create -type RDF2 devgroupname Issue the above command on nodes attached to the R2 side. The group name must be the same on each node on the R1 and R2 side. The devgroup name used will be later placed in variable DEVICE_GROUP defined in pkg.env file.
2.
Use the symld command to add all LUNs that comprise the Volume Group for that package on that host. The HP-UX device file names for all Volume Groups that belong to the package must be defined in one Symmetrix device group. All
Building a Metrocluster Solution with EMC SRDF
237
devices belonging to Volume Groups that are owned by an application package must be added to a single Symmetrix device group. # symld -g devgroupnameadd dev devnumber1 # symld -g devgroupnameadd dev devnumber2 At this point, it will be helpful to refer to Table 5-2 (page 235). Although, the HP-UX device file names on each node specified may be different, the device group must be the same on each node. When creating the Symmetrix device groups, specify only one HP-UX path to a particular Symmetrix device. Do not specify alternate paths (PVLinks). The EMC Solutions Enabler uses the HP-UX path only to determine to which Symmetrix device you are referring. The Symmetrix device may be added to the device group only once. NOTE: Symmetrix Logical Device names must be the default names of the form DEVnnn (for example, DEV001). Do not use this option for creating your own device names. The script must be customized for each system including: • •
•
Particular HP-UX device file names. Symmetrix device group name (an arbitrary, but unique name may be chosen for each group that defines all of the volume groups (VGs), which belong to a particular Serviceguard package). Keyword RDF1 or RDF2.
Configuring Gatekeeper Devices Gatekeeper devices must be unique per Serviceguard package to prevent contention in the Symmetrix when commands are issued, such as two or more packages starting up at the same time. Gatekeeper devices are unique to a Symmetrix unit. They are not replicated across the SRDF link. Gatekeeper devices are marked GK in the syminq output, and are usually 2880 KB in size. NOTE: The sample scripts mk4gatekpr.nodename can be modified to automate these steps. 1.
Define at least two gatekeepers per package per node (assuming PV links are used). They will only be available for use by that node. Each gatekeeper device is configured on different physical links. # symgate -sid sidnumber1 define dev devnumber1
238
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
# symgate -sid sidnumber2 define dev devnumber2 2.
Associate the gatekeeper devices with the Symmetrix device group for that package. # symgate -sid sidnumber1 -g devgroupname \associate dev devnumber1 # symgate -sid sidnumber2 -g devgroupname \associate dev devnumber2
3.
Define a pool of four or more additional gatekeeper devices that are not associated with any particular node. The EMC Solutions Enabler will switch to an alternate gatekeeper device if the path to the primary gatekeeper device fails.
Verifying the EMC Symmetrix Configuration When finished with all these steps, use the symrdf list command to get a listing of all devices and their states. Back up the EMC Solutions Enabler database on each node, so that these configuration steps do not have to be repeated if a failure corrupts the database. The EMC Solutions Enabler database is a binary file located in the directory /var/symapi/db. Creating and Exporting Volume Groups Use the following procedure to create volume groups and export them for access by other nodes. The sample script mk1VGsin the /opt/cmcluster/toolkit/SGSRDF/ Samples directory can be modified to automate these steps. 1.
Define the appropriate Volume Groups (VGs) on each node that run the application package. # mkdir /dev/vgxx # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the cluster.
2.
Create volume groups only on the primary system. Use the vgcreate and vgextend commands, specifying the appropriate HP-UX device file names. # vgcreate vgname /dev/dsk/cxtydz # vgextend vgname /dev/dsk/cxtydz
3.
Use the vgchangecommand to de-activate the volume group and use the vgexport command with the -p option to export the VGs on the primary system without removing the HP-UX device files: # vgchange -a n vgname # vgexport -v -s -p -m mapfilename vgname
Building a Metrocluster Solution with EMC SRDF
239
Copy the map files to all of the nodes. The sample script Samples/ftpit shows a semi-automated way (using ftp) to copy the files. Enter the password interactively. Importing Volume Groups on Other Nodes Use the following procedure to import volume groups. The sample script mk2imports can be modified to automate these steps: 1.
Import the VGs on all of the other systems that might run the Serviceguard package and backup the LVM configuration. Make sure that you split the logical SRDF links before importing the VGs, especially if you are importing the VGs on the R2 side. # symrdf -g devgrpname split -v # vgimport -v -s -m mapfilename vgname
2.
Back up the configuration. # vgchange -a y vgname # vgcfgbackup vgname # vgchange -a n vgname # symrdf -g devgrpname establish -v See the sample script Samples/mk2imports.
NOTE: Exclusive activation must be used for all volume groups associated with packages that use the EMC. The design of Metrocluster with EMC SRDF assumes that only one system in the cluster will have a VG activated at a time. Configuring PV Links The examples in the previous sections describe the use of thevgimport and vgexport commands with the -s option. In addition, the mk1VGs script uses a -s in the vgexport command, and the mk2imports script uses a -s in the vgimport command. Optionally, remove this option from both commands if using PV links. The -s option to the vgexport command saves the volume group id (VGID) in the map file, but it does not preserve the order of PV links. To specify the exact order of PV links, do not use the -s option with vgexport, and in the vgimport command, enter the individual links in the desired order, as in the following example: # vgimport -v -m mapfilename vgname linkname1 linkname2
Grouping the Symmetrix Devices at Each Data Center The use of R1/R2 devices in M by N configurations of multiple Symmetrix frames is enabled by means of consistency groups. A consistency group is a set of Symmetrix RDF devices that are configured to act in unison to maintain the integrity of a database. 240
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Because Metrocluster with EMC SRDF works at the device group level, the consistency group is implemented and managed as a single device group even though it spans multiple Symmetrix frames. Consistency groups are created using either EMC PowerPath or RDF Enginuity Consistency Assist (RDF-ECA) feature of Solutions Enabler. In a consistency group, Symmetrix tracks the I/Os that are written to the devices. If an I/O cannot be written to a remote Symmetrix because a remote device or an RDF link has failed, the data flow to the other Symmetrix will be halted in less than one second. Once mirroring is resumed, any updates to the data is propagated with normal SRDF operation. Figure 5-8 shows when there is a break in the links between two of the Symmetrix frames, the use of consistency groups (depicted as dashed oval lines) ensures that the other two links are also suspended. Figure 5-8 2 X 2 Node and Data Center Configuration with Consistency Groups When these links both go down...
Data Center A node 1
x
Data Center B node 3
x
pkg A
pkg C
These links are suspended by EMC PowerPath... node 4 pkg B
pkg D
node 2
Third Location (Arbitrators) node 5
node 6
Building a Metrocluster Solution with EMC SRDF
241
Setting up M by N Configurations Metropolitan clusters using EMC SRDF can be built in configurations that use more than two EMC Symmetrix disk arrays. In such configurations, M arrays located in Data Center A may be connected to N arrays located in Data Center B. This section describes how to set up an M by N configuration using EMC Solutions Enabler and HP-UX commands. It is assumed that either Symmetrix PowerPath software is installed or RDF-ECA feature of Solutions Enabler is enabled on all nodes and the Symmetrix CLI database on each node has already been setup, as described in the section, “Preparing the Cluster for Data Replication” (page 229). CAUTION:
M by N configurations cannot be used with R1/R2 swapping.
Figure 5-9 depicts a 2 by 2 configuration. Data in this figure are used in the example commands given in the following sections. This example shows R1 devices at one data center and R2 devices with Business Continuity Volumes (BCVs) at the other. However, a bidirectional configuration is also possible, with R1 devices on both sites. Figure 5-9 Devices and Symmetrix Units in M by N Configurations SYMMETRIX A
Node1
Gatekeeper /dev/rdsk/c5t0d0 (010) R1 Devices /dev/rdsk/c6t0d0 (00C) /dev/rdsk/c6t0d1 (00D)
Sim ID 638
Node2
SYMMETRIX C Gatekeeper /dev/rdsk/c7t0d0 (002) R2 Devices /dev/rdsk/c8t0d0 (018) /dev/rdsk/c8t0d1 (019)
Channel
Sim ID 021
SYMMETRIX B Gatekeeper /dev/rdsk/c5t0d1 (009) R1 Devices /dev/rdsk/c5t0d2 (010) /dev/rdsk/c5t0d3 (011)
Sim ID 130
Node3
BCV Devices /dev/rdsk/c8t0d2 (01A) SRDF/ /dev/rdsk/c8t0d3 (01B) Fibre
SYMMETRIX D
Node4
Gatekeeper /dev/rdsk/c6t0d0 (00B) R2 Devices /dev/rdsk/c9t0d0 (050) /dev/rdsk/c9t0d1 (051) BCV Devices /dev/rdsk/c9t0d2 (052) /dev/rdsk/c9t031 (053) Sim ID 363
Creating Symmetrix Device Groups For each node on the R1 side (node1 and node2), create the device groups as follows. Note: It is necessary to create two device groups since device groups do not span frames. The following examples are based on the configuration shown in Figure 5-9. 1.
Create device groups using the following commands on each node on the R1 side. # symdg -type RDF1 create dgoraA
242
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
# symdg -type RDF1 create dgoraB 2.
For each node on the R2 side (node3 and node4), create the device groups as follows. Note: It is necessary to create two device groups since device groups do not span frames. Do the following on each node on the R2 side. # symdg -type RDF2 create dgoraA # symdg -type RDF2 create dgoraB
3.
For each node on the R1 side (node1 and node2), assign the R1 devices to the device groups. # symld -sid 638 -g dgoraA add dev 00C # symld -sid 638 -g dgoraA add dev 00D # symld -sid 130 -g dgoraB add dev 010 # symld -sid 130 -g dgoraB add dev 011
4.
For each node on the R2 side (node3 and node4), assign the R2 devices to the device groups. # symld -sid 021 -g dgoraA add dev 018 # symld -sid 021 -g dgoraA add dev 019 # symld -sid 363 -g dgoraB add dev 050 # symld -sid 363 -g dgoraB add dev 051
5.
On each node on the R2 side (node3 and node4), associate the local BCV devices to the R2 device group. # symbcv -g dgoraA add dev 01A # symbcv -g dgoraA add dev 01B # symbcv -d dgoraB add dev 052 # symbcv -d dgoraB add dev 053
6.
To manage the BCV devices from the R1 side, it is necessary to associate the BCV devices with the device groups that are configured on the R1 side. Use the following commands on hosts directly connected to the R1 Symmetrix. # symbcv -g dgoraA associate dev 01A -rdf # symbcv -g dgoraA associate dev 01B -rdf # symbcv -g dgoraB associate dev 052 -rdf # symbcv -g dgoraB associate dev 053 -rdf
7.
Establish the BCV devices using the following commands from the R2 side. # symmir -g dgoraA -full est
Building a Metrocluster Solution with EMC SRDF
243
# symmir -g dgoraB -full est 8.
Alternatively, establish the BCV devices with the following commands from the R1 side. # symmir -g dgoraA -full est -rdf # symmir -g dgoraB -full est -rdf
Configuring Gatekeeper Devices It is necessary to have a gatekeeper device for each device group in the consistency group that will be built in a later step. Use the following commands on all nodes on the R1 side to define gatekeepers and associate them with device groups. # symgate -sid 638 define dev 010 # symgate -sid 130 define dev 009 # symgate -sid 638 -g dgoraA associate dev 010 # symgate -sid 130 -g dgoraB associate dev 009 Use the following commands on all nodes on the R2 side to define gatekeepers and associate them with device groups. # symgate -sid 021 define dev 002 # symgate -sid 363 define dev 00B # symgate -sid 021 -g dgoraA associate dev 002 # symgate -sid 363 -g dgoraB associate dev 00B
Creating the Consistency Groups To configure consistency groups for using Metrocluster with EMC SRDF, first create device groups and gatekeeper groups as described in previous sections. The following examples are based on the configuration shown in Figure 5-9. Use the following steps for each package: 1.
On each node in the cluster, create an empty consistency group using the symcg command. To create a consistency group using PowerPath on the R1 side, use # symcg create cgoradb -ppath -type rdf1 Replace rdf1 with rdf2 in the command to create the consistency group on the R2 side. To create a consistency group using RDF-ECA on the R1 side, use # symcg create cgoradb -rdf_consistency -type rdf1 Replace rdf1 with rdf2 in the command to create the consistency group on the R2 side.
244
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Use the same name on all nodes. To use RDF-ECA, ensure that the RDF process daemon is running on any of the locally attached hosts. For redundancy, it is recommended that you run multiple instances of the RDF daemon on different hosts. For more information on configuring and using RDF-ECA, refer to EMC documentation web site. 2.
Add each device that is going to be used in the consistency group. Use the appropriate SID numbers and device names for the data center that the node is a part of. For example, on node1 and node2 in Data Center A. # symcg -cg cgoradb -sid 638 add dev 00C # symcg -cg cgoradb -sid 638 add dev 00D # symcg -cg cgoradb -sid 130 add dev 010 # symcg -cg cgoradb -sid 130 add dev 011 And on node3 and node4 in Data Center B. # symcg -cg cgoradb -sid 021 add dev 018 # symcg -cg cgoradb -sid 021 add dev 019 # symcg -cg cgoradb -sid 363 add dev 050 # symcg -cg cgoradb -sid 363 add dev 051
3.
Enable the consistency group. # symcg -g cgoradb enable NOTE:
4.
This important step must be carried out on every node.
Establish the BCV devices in the secondary Symmetrix as a mirror of the standard device. From either node3 or node4. # symmir -cg cgoradb -full est # symmir -cg cgoradb -full est Alternatively, from either node1 or node2. # symmir -cg cgoradb -full est -rdf
Creating Volume Groups The following procedures assume the volume groups being created for a cluster and the device groups, as shown in Figure 5-9. Use the following steps on node1: 1.
Create the physical volumes. # pvcreate -f /dev/rdsk/c6t0d0 # pvcreate -f /dev/rdsk/c6t0d1 # pvcreate -f /dev/rdsk/c5t0d2 Building a Metrocluster Solution with EMC SRDF
245
# pvcreate -f /dev/rdsk/c5t0d3 2.
Create the directories and special files for the volume groups. # mkdir /dev/vgoraA # mkdir /dev/vgoraB # mknod /dev/vgoraA/group c 64 0x01000 # mknod /dev/vgoraB/group c 64 0x02000
3.
Create the volume groups. Be careful not to span Symmetrix frames. # vgcreate /dev/vgoraA /dev/rdsk/c6t0d0 # vgextend /dev/vgoraA /dev/rdsk/c6t0d1 # vgcreate /dev/vgoraB /dev/rdsk/c5t0d2 # vgextend /dev/vgoraB /dev/rdsk/c5t0d3
4.
Create the logical volumes. (XXXX indicates size in MB) # lvcreate -L XXXX /dev/vgoraA # lvcreate -L XXXX /dev/vgoraB
5.
Install a VxFS file system on the logical volumes. # newfs -F vxfs /dev/vgoraA/rlvol1 # newfs -F vxfs /dev/vgoraB/rlvol1
6.
Create map files to permit exporting the volume groups to other systems. # vgchange -a n vgoraA # vgchange -a n vgoraB # vgexport -v -s -p -m /tmp/vgoraA.map vgoraA # vgexport -v -s -p -m /tmp/vgoraB.map vgoraB
7.
Copy the map files to the other nodes in the cluster. # rcp /tmp/vgoraA.map node2:/tmp/vgoraA.map # rcp /tmp/vgoraB.map node2:/tmp/vgoraB.map
8.
Split the SRDF logical links. # symrdf -g dgoraA split -v # symrdf -g dgoraB split -v
On node2, node3, and node4, perform the following steps. 1. Create the volume group directories and special files. # mkdir /dev/vgoraA
246
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
# mkdir /dev/vgoraB 2.
Import the volume groups to each system: # vgimport -v -s -m /tmp/vgoraA.map vgoraA # vgimport -v -s -m /tmp/vgoraB.map vgoraB
3.
After importing volume groups to all the other nodes, establish SRDF links. # symrdf -gdgoraA establish -v # symrdf -gdgoraB establish -v
NOTE: While creating a volume group, you can choose either the legacy or agile Device Special File (DSF) naming convention. To determine the mapping between these DSFs, use the # ioscan –m dsf command
Creating VxVM Disk Groups using Metrocluster with EMC SRDF If using VERITAS storage, use the following procedure to create disk groups. It is assumed VERITAS root disk (rootdg) has been created on the system where configuring the storage. The following section shows how to set up VERITAS disk groups. On one node do the following: 1.
Check to make sure the devices are in a synchronized state. # symrdf -g dgoraA query # symrdf -g dgoraB query
2.
Initialize disks to be used with VxVM by running the vxdisksetup command. # /etc/vx/bin/vxdisksetup -i c5t0d0
3.
Create the disk group to be used by using the vxdg command on the primary system. # vxdg init logdata c5t0d2 c5t0d3 c5t0d0 c5t0d1
4.
Verify the configuration. # vxdg list
5.
Create the logical volume. # vxassist -g logdata make logfile 2048m
6.
Verify the configuration. # vxprint -g logdata
7.
Make the filesystem. # newfs -F vxfs /dev/vx/rdsk/logdata/logfile
Building a Metrocluster Solution with EMC SRDF
247
8.
Create a directory to mount the volume group. # mkdir /logs
9.
Mount the volume group: # mount /dev/vx/dsk/logdata/logfile /logs
10. Check if file system exits, then unmount the file system. # umount /logs
Validating VxVM Disk Groups using Metrocluster with EMC SRDF The following section shows how to validate VERITAS diskgroups. On one node do the following: 1.
Deport the disk group. # vxdg deport logdata
2.
Enable other cluster nodes to have access to the disk group. # vxdctl enable
3.
Split the SRDF link to enable R2 Read/Write permission. # symrdf -g dgoraA split # symrdf -g dgoraB split
4.
Import the disk group. # vxdg -tfC import logdata
5.
Start the logical volume in the disk group. # vxvol -g logdata startall
6.
Create a directory to mount the volume. # mkdir /logs
7.
Mount the volume. # mount /dev/vx/dsk/logdata/logfile /logs
8.
Check to make sure the file system is present, then unmount the file system. # umount /logs
9.
Establish the SRDF link. # symrdf -g devgrpA establish
248
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
IMPORTANT: VxVM 4.1 does not support the agile DSF naming convention with HP-UX 11i v3. NOTE: In a Metrocluster/SRDF environment, VxVM commands should not be run against write-disabled disks. This is due to VxVM potentially putting these disks into an offline state. Subsequent activation of a VxVM disk group might fail when the disks are again write-enabled, and requires a vxdisk scandisks to be executed prior to disk group activation.
Additional Examples of M by N Configurations Figure 5-10 shows a 2 by 1 configuration with BCV’s, which indicates R1 volumes at Data Center A and R2 volumes and BCVs at Data Center B for pkg A and pkg B. Figure 5-10 2 by 1 Configuration
Data Center A
Data Center B R2 vols
R1 vols
node3
node1 pkg A
node4 SRDF Links
BCVs
Third Location (Arbitrators) node5
pkg B
node6
node2 R1 vols
Figure 5-11 shows a bidirectional 2 by 2 configuration with additional packages on node3 and node4, and R1 and R2 volumes at both data centers. In this configuration, R1 volumes and pkg A and pkg B are at Data Center A, and R2 volumes are at Data Center B. R1 volumes for pkg C and pkg D are at Data Center B, and R2 volumes are at Data Center A.
Building a Metrocluster Solution with EMC SRDF
249
Figure 5-11 Bidirectional 2 by 2 Configuration
Data Center A
Data Center B R1 for C & D
R1 for A & B node1
node3 pkg C
pkg A
BCVs
node4 pkg B node2 R2 for C & D
pkg D R2 for A & B Third Location (Arbitrators) node5
node6
Configuring Serviceguard Packages for Automatic Disaster Recovery Before implementing these procedures it is necessary to do the following: •
• •
•
Configure your cluster hardware according to disaster tolerant architecture guidelines. See the Understanding and Designing Serviceguard Disaster Tolerant Architectures user’s guide. Configure the Serviceguard cluster according to the procedures outlined in Managing Serviceguard user’s guide. Create the EMC Solutions Enabler database, and build Symmetrix device groups, consistency groups, and gatekeepers for each package. Export exclusive volume groups for each package as described in “Preparing the Cluster for Data Replication” (page 229). This must be done on each node that will potentially run the package. Install the Metrocluster EMC SRDF product on all nodes according to the instructions in the Metrocluster with EMC SRDF Release Notes.
When these steps have been completed, packages will be able to automatically fail over to an alternate node in another data center and still have access to the data it needs to function.
250
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
This procedure must be repeated on all the cluster nodes for each Serviceguard application package so the application can fail over to any of the nodes in the cluster. Customizations include setting environment variables and supplying customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Consult the Managing Serviceguard user’s guide for more detailed instructions on how to start, halt, and move packages and their services between nodes in a cluster. For ease of troubleshooting, it is recommended to configure and test one package at a time. 1.
Create a directory /etc/cmcluster/pkgname for each package. # mkdir /etc/cmcluster/pkgname
2.
Create a package configuration file. # cd /etc/cmcluster/pkgname # cmmakepkg -p pkgname.ascii Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ pkgname.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters.
3.
In the .ascii file, list the node names in the order for which the package is to fail over. It is recommended for performance reasons, that the package fail over locally first, then to the remote data center. NOTE: If using the EMS disk monitor as a package resource, do not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is not access from the host to the package disks. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the EMC Symmetrix disk array. Clusters with multiple packages that use devices on the EMC Symmetrix disk array will cause package startup time to increase when more than one package is starting at the same time. The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time due to getting status from the Symmetrix.
4.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, Building a Metrocluster Solution with EMC SRDF
251
SERVICE_CMD, and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. 5.
6.
7.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. In the package_name.ascii file, list the node names in the order in which you want the package to fail over. It is recommended, for performance reasons, that the package fail over locally first, then to the remote data center. For the MAX_CONFIGURED_PACKAGES parameter, the minimum value is 0 and maximum default value is 150 (depending on the number of packages that will run on the cluster). Copy the environment file template /opt/cmcluster/toolkit/ SGSRDF/srdf.env to the package directory, naming it pkgname_srdf.env: # cp /opt/cmcluster/toolkit/SGSRDF/srdf.env \ /etc/cmcluster/pkgname/pkgname_srdf.env NOTE: If not use a package name as a filename for the package control script, it is necessary to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (srdf) used. The extension .env of the file must be used. The following examples demonstrate how the environment file name should be chosen: Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_srdf.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_srdf.env.
8.
252
Edit the environment file as follows: a. Add the path where the EMC Solutions Enabler software binaries have been installed to the PATH environment variable. If the software is installed in the default location,/usr/symcli/bin, there is no need to set the PATH environment variable in this file. b. Uncomment AUTO*environment variables. It is recommended to retain the default values of these variables unless there is a specific business requirement to change them. See Appendix B for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
d. Uncomment the RDF_MODE variables and set it to the RDF mode for RDF pairs in the device group to be Synchronous (sync) or Asynchronous (async). e. Uncomment the DEVICE_GROUP variables EMC Symmetrix for the local disk array and set it to the Symmetrix device group names given in the symdg list command. If you are using an M by N configuration, configure the DEVICE_GROUP variable with the name of the consistency group. f. Uncomment the RETRY and RETRYTIME variables. These variables are used to decide how often and how many times to retry the Symmetrix status commands. The defaults should be used for the first package. For other packages RETRYTIME should be altered to avoid contention when more than one package is starting on a node. RETRY * RETRYTIME should be approximately five minutes to keep package startup time under 5 minutes. RETRYTIME
RETRY
pkgA
5 seconds
60 attempts
pkgB
7 seconds
43 attempts
pkgC
9 seconds
33 attempts
g. Uncomment the CLUSTERTYPE variable and set it to METRO. (The value CONTINENTAL is only for use with the Continentalclusters product, described in Chapter 5.) h. If using an M by N configuration, be sure that the variable CONSISTENCYGROUPS is set to 1 in the environment file CONSISTENCYGROUPS=1 9.
Distribute Metrocluster with EMC SRDF configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue.
10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Serviceguard package control script pkgname_srdf.env
Metrocluster EMC SRDF environment file
pkgname.ascii
Serviceguard package ASCII configuration file Building a Metrocluster Solution with EMC SRDF
253
pkgname.sh
Package monitor shell script, if applicable
other files
Any other scripts you use to manage Serviceguard packages
The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster/SRDF 11. Check the configuration using the cmcheckconf -P package_name.ascii, then apply the Serviceguard configuration using the cmapplyconf -P package_name.ascii command or SAM. 12. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapplyfor an example of how to automate this task. The script must be customized with the Symmetrix device group names. Redirect the output of this script to a file for debugging purposes.
Maintaining a Cluster that uses Metrocluster with EMC SRDF While the cluster is running, all EMC Symmetrix disk arrays that belong to the same Serviceguard package, and are defined in a single SRDF group must be in the same state at the same time. Manual changes of these states can cause the package to halt due to unexpected conditions. In general, it is recommended that no manual change of states should be performed while the package and the cluster are running. There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of Metrocluster EMC SRDF: 1.
Stop the package with the appropriate Serviceguard command. # cmhaltpkg pkgname
2.
Split the logical SRDF links for the package. # Samples/pre.cmquery
3.
Distribute the Metrocluster EMC SRDF configuration changes. # cmapplyconf -P pkgconfig
4.
Restore the logical SRDF links for the package. # Samples/post.cmapply
5.
Start the package with the appropriate Serviceguard command. # cmmodpkg -e pkgname
No checking of the status of the SA/FA ports is done. It is assumed that at least one PVLink is functional. Otherwise, the Volume Group activation will fail. Planned maintenance is treated the same as a failure by the cluster. If the node is taken down for maintenance, package failover and quorum calculation is based on the 254
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
remaining nodes. Make sure that the nodes are taken down evenly at each site, and enough nodes remain on-line to form a quorum if a failure occurs. For examples of failover scenarios, see section, “Example Failover Scenarios with Two Arbitrators” (page 31).
Managing Business Continuity Volumes The use of Business Continuity Volumes is recommended with all implementations of Metrocluster EMC SRDF, and it is required with M by N configurations, which employ consistency groups. These BCV devices will provide a good copy of the data when it is necessary to recover from a rolling disaster—a second failure that occurs while attempting to recover from the first failure. Protecting against Rolling Disasters The following is an example of a rolling disaster with Metrocluster with EMC SRDF At time T0, all the SRDF links go down. The application continues to run on the R1 side. At time T1, the SRDF links are restored, and at T2 a manual resynchronization is started to resync new data from the R1 to the R2 side. At time T3, while resynchronization is in progress, the R1 site fails, and the application starts up on the R2 side. Since the resynchronization did not complete when there was a failure on the R1 side, the data on the R2 side is corrupt. Using the BCV in Resynchronization In the case described above, you use the business continuity volumes, which protect against a rolling disaster. First split off a consistent copy of the data at the recovery site, and then perform the re-synchronization. After the re-synchronization is complete, re-establish the BCV mirroring. To protect data consistency on R2 in rolling disaster, use the following procedures: 1.
Before starting the re-synchronization from R1 to R2 side, it is necessary to disable the package switch capability to prevent the package automatically fail over to R2 if a new disaster occurs when the re-sync is still in progress. To disable the package switching on the R2 nodes. # cmmodpkg -d pkgname -n node_name
2.
Split the BCV in the secondary Symmetrix from the mirror group to save a good copy of the data from nodes on R2 side. # symmir -g dgname split Alternatively, from node on R1 side. # symmir -g dgname split -rdf
3.
Begin to resynchronize the data from R1 to R2 devices. # symrdf -g dgname est
Building a Metrocluster Solution with EMC SRDF
255
4.
After the resynchronization is completed, enable the package switching on the node on R2 side. # cmmodpkg -e pkgname -n node_name
5.
Re-establish the BCV to R2 devices on R2 as a mirror. # symmir -g dgname -full est Alternatively, from node on R1 side. # symmir -g dgname -full est -rdf
In Metrocluster with EMC SRDF environment, following the resynchronization process described above, which prevents the package from automatically failing over and starting on the R2 side if a disaster takes place when the resync is in progress. This ensures the package would not automatically start and operate on the inconsistent data in the event of a rolling disaster. As demonstrated above, the re-sync is a manual process and initiated by an operator after the links are fixed. The pairstate of the devices should be Synchronized for SRDF/Synchronous or Consistent for SRDF/Asynchronous when the re-sync is completed. Check the state and ensure that the re-sync is completed before enabling the package switch. If Metrocluster with EMC SRDF is used in Continentalclusters, it is not necessary to disable the package switch on the nodes on recovery site since each site has its own cluster. However, when the re-sync is in progress, make sure the recovery site will not start the recovery operation in the event of a disaster occurring on the primary site. Use the following procedures to protect data consistency on R2 in a Continentalclusters environment: 1.
Split the BCV in the secondary Symmetrix from the mirror group to save a good copy of the data from nodes on R2 side: # symmir -g dgname split Alternatively, from node on R1 side. # symmir -g dgname split -rdf
2.
Begin to resynchronize the data from R1 to R2 devices. # symrdf -g dgname est
3.
Re-establish the BCV to R2 devices on R2 as a mirror. # symmir -g dgname -full est Alternatively, from node on R1 side. # symmir -g dgname -full est -rdf
256
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
R1/R2 Swapping This section describes how the R1/R2 swapping can be done via the Metrocluster SRDF package and manual procedures. Each of these methods allows swapping the SRDF personality for each device designation of a specified device group. In this situation, each source R1 device(s) becomes a target R2 device(s), and a target R1 device(s) becomes a source R1 device(s). R1/R2 Swapping using Metrocluster SRDF The Metrocluster SRDF package can be configured to automatically do R1/R2 swapping upon package failover. To enable R1/R2 swapping in the package, set the environment variable AUTOSWAPR2 in the _srdf.env file to 1 or 2. Since the swap is done automatically upon package start up, the Metrocluster SRDF software will only do the swap if the Symmetrix frames and the SRDF links between them are working properly, that is, the SRDF state of the device group is in Synchronized state. If the failover and swap operations succeed, the devices will have their personalities switched, and the data replication will continue from the new R1 devices to the new R2 devices. Prior to Metrocluster performing an R1/R2 swap, if the failover operation fails, the package will not be automatically started. If the failover operation succeeds, but R1/R2 swapping fails, then either the package is automatically started or fails depending on the value of the environment variable AUTOSWAPR2. The environment variable AUTOSWAPR2 can be set to either “1” or “2”. This will depend on whether the package needs to be started automatically on R2, in case of R1/R2 swap failure. If AUTOSWAPR2 is set to “1”, the package will fail to start if R1/R1 swapping fails. In this scenario it is necessary to start the package manually by doing the swap operation. If preferred, this can be done at a later time. If AUTOSWAPR2 is set to “2”, the package is automatically started regardless of a R1/R2 swap failure. In this scenario the data will not be protected remotely. NOTE: When failing over a package with R1/R2 swapping, the package startup time will be longer than without the swapping. R1/R2 Swapping using Manual Procedures It is also possible to do R1/R2 swapping manually. There are two scenarios where manual swapping is supported by Metrocluster with EMC SRDF. Scenario 1: In this scenario, the package failover is due to host failure or due to planned downtime maintenance. The SRDF links and the Symmetrix frames are still up and running. Because the package startup time will be longer if the swapping is done automatically, the user can choose not to have the swapping done by the package and then manually execute the swapping after the package is up and running on the R2 Building a Metrocluster Solution with EMC SRDF
257
side. Following is the manual procedure to swap the devices personalities and change the direction of the data replication. On the host that connects to the R2 side, use the following steps: 1. Swap the personalities of the devices and mark the old R1 devices to be refresh from the old R2 devices. # symrdf -g swap -refresh R1 2. After swapping is completed, the devices will be in Suspended state. Next establish the device group for data replication from the new R1 devices to the new R2 devices. # symrdf -g establish Scenario 2: In this scenario, two failures happen before the package fails over to the secondary data center. The SRDF link fails; the package continues to run and write data on R1 devices. Sometime later, the host fails; the package then fails over to the secondary data center. In this case, even if the AUTOSWAPR2 variable is set to 1 or 2, the package will not do the R1/R2 swapping, which happens after the host in the primary data center and the SRDF links are fixed. To minimize the application down time, instead of failing the application back to the primary data center, leave the application running in the secondary data center. Then manually swap the devices personalities and change the direction of the data replication. 1. Swap the personalities of the devices and mark the old R1 devices to be refresh from the old R2 devices. # symrdf -g swap -refresh R1 2. After swapping is completed, the devices will be in a suspended state. Next Establish the device groups for data replication from the new R1 devices to the new R2 devices. # symrdf -g establish CAUTION:
R1/R2 Swapping cannot be used in an M by N Configuration.
Some Further Points Following are listed some EMC Symmetrix specific requirements: • •
258
R1 and R2 devices have been correctly defined and assigned to the appropriate nodes in the internal configurations that is downloaded by EMC support staff. R1 devices are locally protected (RAID 1 or RAID S); R2 devices are locally protected (RAID 1, RAID S or BCV).
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
NOTE: It is highly recommended that the R2 device is locally protected with RAID 1 or RAID S. If the R2 device is protected with BCV, and if it fails and there is a failover, the package cannot operate on the BCV device. The R2 device has to be fixed, the data has to be restored from the BCV device to the new R2 device, before the package can start. • •
Only synchronous and asynchronous modes are supported; adaptive copy must be disabled. Domino Mode enabled is required for M x N configuration to ensure the following: — data currency on all Symmetrix frames — there is no possibility of inconsistent data at the R2 side in case of SRDF links failure If Domino Mode is not enabled and all SRDF links fail, the new data is not replicated to the R2 side while the application continues to modify the data on the R1 side. This will result in the R2 side containing a copy of the data only up to the point of the Continuous Access link failure. If additional failure occurs, such as a system failure before the SRDF link is fixed, it will cause the application to fail over to the R2 side with only non-current data. If Domino Mode is not enabled, in the case of a rolling disaster, the data may be inconsistent. Additional failures may take place before the system has completely recovered from a previous failure. Inconsistent and therefore unusable data will result from the following sequence of circumstances: — — — — — —
Domino Mode is not enabled the SRDF links fail the application continues to modify the data the link is restored resynchronization from R1 to R2 starts, but does not finish the R1 side fails
Although the risk of this occurrence is extremely low, if the business cannot afford even a minor amount risk, then it is required to enable Domino Mode to ensure that the data at the R2 side are always consistent. The disadvantage of enabling Domino Mode is that when the SRDF link fails, all I/Os will be refused (to those devices) until the SRDF link is restored, or manual intervention is undertaken to disable Domino Mode. Applications may fail or may continuously retry the I/Os (depending on the application) if Domino Mode is enabled and the SRDF link fails.
Some Further Points
259
NOTE: • • •
•
Domino Mode is not supported in asynchronous mode.
SRDF firmware has been configured and hardware has been installed on both Symmetrix units. R1 and R2 devices must be correctly defined and assigned to the appropriate host systems in the internal configuration that is downloaded by EMC. While the cluster is running, all Symmetrix devices that belong to the same Serviceguard package, and defined in a single SRDF device group must be in the same state at the same time. Manual changes of these states can cause the package to halt due to unexpected conditions. In general, it is recommended that no manual change of states be performed while the package and the cluster are running. A single Symmetrix device group must be defined for each package on each host that is connected to the Symmetrix. The disk special device file names for all Volume Groups that belong to the package must be defined in one Symmetrix device group for both R1 side and R2 side. The Symmetrix device group name must be the same on each host for both R1 side and R2 side. This group name is placed variable DEVICE_GROUP defined in the pkg.env file. Although the name of the device group must be the same on each node, the special device file names specified may be different on each node. Symmetrix Logical Device names MUST be default names of the form “DEVnnn” (for example, DEV001). Do not use the option for creating your own device names. See the EMC Solutions Enabler manual, and the sample convenience scripts in the Samples directory included with this toolkit.
•
To minimize contention, each device group used in the package should be assigned two unique gatekeeper devices on the Symmetrix for each host where the package will run. These gatekeeper devices must be associated with the Symmetrix device groups for that package. The gatekeeper devices are typically a 2 MB logical device on the Symmetrix. For example, if a package is configured to failover across four nodes in the cluster, there should be eight gatekeeper devices (two for each node) that are assigned to the Symmetrix device group belonging to this package. It is required that there be a pool of four additional gatekeeper devices that are NOT associated with any device group. These gatekeepers would be available for other, non-cluster uses, for example, the Symmetrix Manager GUI and other EMC Solutions Enabler or SymAPI requests. After data configuration, each physical device in the Symmetrix has enough space remaining on it for gatekeeper purposes.
260
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
•
• •
•
This toolkit does not support the HP OmniBack Integration with Symmetrix. The OmniBack Integration with Symmetrix may create certain states that will cause this package to halt if a failover occurs while the backup is in progress. No checking of the status of the SA/FA ports is done. It is assumed that at least one PVLink is functional. Otherwise, the VG activation will fail. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the Symmetrix. Clusters with multiple packages that use devices on the Symmetrix will cause package startup time to increase when more than one package is starting at the same time. The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to account for the extra startup time due to getting status from the Symmetrix. See the previous paragraph for more information on the extra startup time.
Metrocluster with SRDF/Asynchronous Data Replication The following sections presents concepts, functionality and requirements for configuring Metrocluster using SRDF/Asynchronous data replication. SRDF/Asynchronous delivers asynchronous data replication solutions featuring a consistent and restartable copy of the production data at the remote side. Metrocluster with EMC SRDF supports SRDF/Asynchronous to further enhance and protect critical business information. The topics discussed in this section are as follows: • • • • •
Overview of SRDF/Asynchronous Concepts Requirements for using SRDF/Asynchronous in a Metrocluster Environment Preparing the Cluster for SRDF/Asynchronous Data Replication Building a Device Group for SRDF/Asynchronous Limitations and Restrictions
Overview of SRDF/Asynchronous Concepts SRDF/Asynchronous provides a long-distance replication solutions with minimal impact on performance. This protection level is intended for customers requiring minimal host application impact, but need to maintain a restartable copy of data at R2 site. Data is transferred from R1 site to the R2 site in predefined timed cycles called delta sets, which eliminates the redundancy of same track changes being transferred over the link. In the event of a disaster at the R1 site or if SRDF links are lost during data transfer, a partial delta set of data is discarded. However, a dependent write consistent point-in-time copy of data is retained on the target side. Figure 5-12 depicts the SRDF/Asynchronous data sets. •
At the R1 site, the capture cycle is collecting all new writes and tagging them as belonging to cycle N. There is also a transmit cycle (N-1) which is not receiving any new data, but is transferring the data it has collected when it was the active Metrocluster with SRDF/Asynchronous Data Replication
261
•
cycle to the remote side. The capture cycle switches roles from capture to transmit during the cycle switch process and a new capture cycle is created. At the R2 site, there is a receive cycle (N-1), which is receiving data from the transmit cycle at R1. The apply cycle (N-2) at the remote site is marking all the tracks from a previous cycle as write-pending to the secondary devices (R2). The data is considered committed to the R2 side devices at cycle switch time.
Figure 5-12 SRDF/Asynchronous Basic Functionality Host
Host
I/O
SRDF/A Device pair Active session
R1
R2
SRDF/A Delta set begins Capture Cycle N
Transmit Cycle N-1
Capture new writes Cycle
Transfer writes to R2 SRDF Links
Writes apply To R2 device
Apply Cycle N-2
Receive Writes R2 device Receive Cycle N-1
Requirements for using SRDF/Asynchronous in a Metrocluster Environment The following describes the hardware and software requirements for setting up SRDF/Asynchronous in a Metrocluster environment:
262
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Hardware Requirements •
•
EMC supports SRDF/Asynchronous on Symmetrix DMX Series only. The model numbers and the supported Enginuity level are available in EMC Symmetrix SRDF Product Guide. SRDF/Asynchronous supports all SRDF topologies including ESCON, point-to-point and switched fabrics. Refer to the EMC Network Storage Topology Guide for details and to plan the connectivity based on your distance requirements.
Software Requirements • • •
EMC Solutions Enabler (requires minimum version 6.0). Enginuity - Refer to the Disaster Tolerant Clusters Products Compatibility and Feature Matrix, for specific version information. An SRDF/Asynchronous license is required to access this functionality.
For the most recent version and compatibility information, refer to the Disaster Tolerant Clusters Products Compatibility and Feature Matrix (Metrocluster/EMC SRDF – MC/SRDF).
Preparing the Cluster for SRDF/Asynchronous Data Replication The following sections, “Metrocluster with SRDF/Asynchronous Data Replication”, and “Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous” describe architectures and configurations for preparing SRDF/Asynchronous data replication. Metrocluster SRDF Topology using SRDF/Asynchronous Figure 5-13 shows the recommended and supported disaster tolerant architecture in a Metropolitan cluster, using SRDF/Asynchronous data replication. The architecture consists of two main data centers and a third location with arbitrator nodes or quorum server nodes.
Metrocluster with SRDF/Asynchronous Data Replication
263
Figure 5-13 Metrocluster Topology using SDRF/Asynchronous
QS Ethernet Network NS
NS
NS
NS
NS
Node A
Node B
Node C
Node D
D W D M
FCS
D W D M
FCS
DMX
R1 Site
FCS
FCS FCS
IP Network
FCS
FCS
DMX
FCS
R2 Site
Data replication can utilize any extended SAN devices that support SRDF Links, for example DWDM, Fiber Channel over Internet Protocol, etc. However, since the network for a Serviceguard cluster heartbeat requires a “Dark Fiber” link, it is recommended to utilize the DWDM links for SRDF/Asynchronous data replication. This will increase data replication bandwidth and reliability in the Metrocluster environment.
Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous The following sections, “Building a Device Group for SRDF/Asynchronous”, and “Package Configuration using SRDF/Synchronous or SRDF/Asynchronous” describe the steps for building a device group and package configuration in an SRDF/Asynchronous environment.
Building a Device Group for SRDF/Asynchronous To perform an operation on a device group for SRDF/Asynchronous data replication, the device group must be configured with all the devices that are SRDF/Asynchronous capable within the RDF group. Use the following steps to create a device group: 264
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
1.
List SRDF/Asynchronous capable devices on the source Symmetrix unit and be sure the SRDF/Asynchronous capable devices are mapped to RDF group for use. In addition, all RDF devices that belong to the RDF (RA) group must be configured in one device group for SRDF/Asynchronous operation. For example, use the following command to display the devices from RDF (RA) group number 5: # symrdf -sid symid list -rdfa -rdfg 5 Symmetrix ID: 000187400684 Local Device View ------------------------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --- ---- ------------0196 0197 0198 0199 019A 019B 019C 019D
2.
0012 0013 0014 0015 0016 0017 0018 0019
R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5
RW RW RW RW RW RW RW RW
RW RW RW RW RW RW RW RW
RW RW RW RW RW RW RW RW
S.. S.. S.. S.. S.. S.. S.. S..
0 0 0 0
0 0 0 0 0 0 0 0
RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized 0 RW WD Synchronized 0 RW WD Synchronized 0 RW WD Synchronized 0 RW WD Synchronized
Create an RDF1 type device group. For example, the group name AsynDG. On R1 side: # symdg create AsynDG -type RDF1 On R2 side: # symdg create AsynDG -type RDF2
3.
All devices from the RDF (RA) group configuration are added to the device group for SRDF/Asynchronous operation. For example, if the RDF group displayed in the symrdflist display is group number 5, then all devices in this RDF group must be managed together within one device group for SRDF/Asynchronous operation. # symld -g AsynDG addall -rdfg 5
4. 5.
Repeat the steps 1-3 on each host that need to run Serviceguard packages. Query the device group to display the R1-to-R2 setup and the state of the SRDF/Asynchronous device pairs. # symld -g AsynDG query -rdfa Sample output from the command: Source (R1) View Target (R2) View MODES ------------------------------------------------------- ----- -----------ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE -------------------------------- -- ------------------------ ----- -----------DEV001 DEV002 DEV003
0196 RW 0197 RW 0198 RW
0 0 0
0 RW 0012 WD 0 RW 0013 WD 0 RW 0014 WD
0 0 0
0 S... 0 S... 0 S...
Synchronized Synchronized Synchronized
Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous
265
DEV004 DEV005 DEV006 DEV007 DEV008
6.
0199 019A 019B 019C 019D
RW RW RW RW RW
0 0 0 0
0 0 0 0 0
RW 0015 WD RW 0016 WD RW 0017 WD RW 0018 WD 0 RW 0019 WD
0 0 0 0
0 0 0 0 0
S... Synchronized S... Synchronized S... Synchronized S... Synchronized 0 S... Synchronized
Set the device group to Asynchronous mode: # symrdf -g AsynDG set mode async
7.
Consistency protection must be enabled to ensure the data consistency on R2 side for the SRDF/Asynchronous devices in the device group. # symrdf -g AsynDG enable
8.
If the SRDF pairs are not in a Consistent state at this point, initiate an establish command to synchronize the data on the R2 side from the R1 side. The device state will be SyncInProg until the Consistent status is reached. # symrdf -g AsynDG establish Sample output after the RDF pairs have been established: Source (R1) View Target (R2) View MODES ------------------------------------------- ----ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE ------------- -- ------------------------ ----- -----------DEV001 DEV002 DEV003 DEV004 DEV005 DEV006 DEV007 DEV007
0196 0197 0198 0199 019A 019B 019C 019D
RW RW RW RW RW RW RW RW
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
RW RW RW RW RW RW RW RW
0012 0013 0014 0015 0016 0017 0018 0019
WD WD WD WD WD WD WD WD
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
A..X A..X A..X A..X A..X A..X A..X A..X
Consistent Consistent Consistent Consistent Consistent Consistent Consistent Consistent
Package Configuration using SRDF/Synchronous or SRDF/Asynchronous The following describes configuring a package using SRDF/Synchronous or SRDF/Asynchronous MSC for first-time installation or pre-existing installations. First-time installation of Metrocluster with EMC SRDF using SRDF/Synchronous If this is a first-time installation of Metrocluster with EMC SRDF do the following steps: 1.
2.
266
Copy the template file that is shipped with the Metrocluster with EMC SRDF product from /opt/cmcluster/toolkit/SGSRDF/srdf.env to the package directory. Customize the template file based on the requirements in your environment.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Pre-existing Installations of Metrocluster SRDF using SRDF/Synchronous If there is a pre-existing installation of Metrocluster with EMC SRDF and the Serviceguard applications using SRDF/Synchronous data replication, make the following changes: • Set new variable RDF_MODE to sync RDF_MODE=sync If RDF_MODE is not present, the synchronous mode is assumed. •
Unset the SYMCLI_MODE environment variable if it has been set previously.
Migration of Existing Applications from SRDF/Synchronous to SRDF/Asynchronous EMC does not support migration on the existing applications from SRDF/Synchronous to SRDF/Asynchronous data replication. Applications that needs to use SRDF/Asynchronous mode for data replication should configure a new device group in SRDF/Asynchronous mode. The need to alter the data replication mode between synchronous and asynchronous for an application is not expected in a typical Disaster Tolerant environment. Contact EMC and HP for specific requirement to your environment.
Package Failover using SRDF/Asynchronous The EMC Solutions Enabler provides a control operation checkpoint to confirm that the data written in the current SRDF/Asynchronous cycle has been successfully committed to the R2 side. When a package fails over to secondary site, Metrocluster with EMC SRDF ensures the most current data when the SRDF link is still up. Metrocluster with EMC SRDF will invoke the action checkpoint prior to fail over to the storage. Since the checkpoint operation will prolong the failover time for a package to start, the duration for a package to start on R2 side will be longer. The time taken to complete the checkpoint operation depends on the cycle time configured which determines the amount of data outstanding on the R1 site.
Protecting against a Rolling Disaster It is recommended to use the procedure described in the previous section for protecting against rolling disaster situation.
Limitations and Restrictions • • •
Consistency Group with SRDF/Asynchronous mode is not supported. Domino mode is not supported in SRDF/Asynchronous mode. Metrocluster with ECM SRDF does not support cascading configuration using SRDF/Asynchronous.
Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous
267
Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication The following sections present the concepts, functionality and requirements for configuring Metrocluster using SRDF/Asynchronous (SRDF/A) Multi-Session Consistency (MSC) Data Replication. The topics discussed in this section are as follows: • • • •
Overview SRDF/Asynchronous MSC Concepts Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Multi Session Consistency (MSC) Data Replication Building a Composite Group for SRDF/Asynchronous MSC Package Configuration using SRDF/Synchronous or SRDF/Asynchronous
Overview of SRDF/Asynchronous MSC Concepts When a database is spread across multiple Symmetrix arrays and SRDF/A is used for long distance replication, separate software must be used to manage the coordination of the delta set boundaries between the participating Symmetrix arrays or RDF groups and to stop replication if any of the volumes in a Symmetrix array or RDF group cannot replicate for any reason. The software must ensure that all delta set boundaries on every participating Symmetrix array in the configuration are coordinated to give a dependent write consistent point-in-time image of the data. RDF-Multi Session Consistency (RDF-MSC) is the new technology that provides consistency across either multiple RDF groups or multiple Symmetrix arrays. RDF-MSC is supported by an SRDF process daemon that performs cycle switching and cache recovery operations across all SRDF/A sessions in the group. This ensures that a dependent write consistent R2 copy of the data exists at the remote site at all the times. From a single Symmetrix array perspective, the I/O is processed exactly the same way in SRDF/CG multi-session mode for SRDF/A, as in a single-session mode. Following is the sequence of tasks that are completed while processing an I/O: 1. 2. 3.
The host writes to cycle N (capture cycle) on the R1 side. SRDF/A transfers cycle N-1 (transmit cycle) from R1 to R2. The receive cycle N-1 on the R2 side receives data from the transmit cycle, and the apply cycle N-2 restores data to the R2 devices.
During this process, the status and location of the active and inactive cycles are communicated between the R1 and R2 Symmetrix systems. For example, when R1 finishes sending cycle N–1, it sends a special indication to R2 to let it know that it has completed the inactive cycle transfer. Similarly, when R2 finishes restoring cycle N – 2, it sends a special indication to let R1 know that its active cycle is empty. At this point in the process, (that is, when the transmit cycle on the R1 side is empty and the apply cycle on the R2 side is empty), SRDF/A is ready for a cycle switch.
268
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
In a single-session mode, once all the conditions are satisfied, an SRDF/A cycle switch is performed. However, when using SRDF/A MSC, it indicates switch readiness of the array or the RDF group to the host which is polling for this condition. The cycle switch command is issued by the host only when all Symmetrix systems or SDRF groups indicate their switch readiness. Hence, the cycle switch is coordinated across multiple Symmetrix systems or RDF groups. SRDF/A enters a multi-session mode after receiving a command from the host. As part of the command to enter the multi-session mode, and with each subsequent switch command issued, the host provides a tag to each capture cycle that is retained throughout that cycle life. This cycle tag is a value that is common across all participating SRDF/A sessions and eliminates the need to synchronize the cycle numbers across them. The cycle tag is the mechanism by which consistency is assured. Multi-session SRDF/A performs a coordinated cycle switch during a very short window of time when there are no host writes being completed. This time period is referred to as an SRDF/A window. When the host discovers that all systems are ready for a cycle switch, it issues a single command to each Symmetrix system that performs a cycle switch to open the SRDF/A window. When the window is open, any I/Os that start will be disconnected, and as a result no dependent I/Os will even be issued by any host to any devices in the multi-session group. The SRDF/A window remains open on each Symmetrix system until the last Symmetrix system in the multi-session group acknowledges to the host that the switch and open command has been processed and a close command has been received. As a result, a dependent write consistency across the SRDF/A multi-session group is created, and once the SRDF/A window has been opened and the cycle has successfully switched on all Symmetrix systems, the SRDF/A window can then be closed by the SRDF/A MSC software, allowing all disconnected writes to complete and normal processing to resume. As part of this switch and open operation, SRDF/A MSC assigns a cycle tag value to the active cycle. This cycle tag value is separate from the cycle number assigned internally by SRDF/A. This cycle tag is carried by the SRDF/A process to the remote side and is used by the host at the recovery site to ensure that only data from the same host cycle is applied to the R2 devices in each Symmetrix system in the event of a disaster. Once all Symmetrix systems have completed a cycle switch, the host issues a command to close the window (turn off the bit in the state table), and all disconnected write I/Os complete. During this window, read I/Os complete normally to any devices or PAV aliases that have not received a write. The SRDF/A window is an attribute of the SRDF/A group and is checked at the start of each I/O, at no additional overhead, because the host adapter is already obtaining the cycle number from global memory as part of SDRF/A’s existing minimal overhead. The RDF daemon is responsible for coordinating cycle switch between different SRDF/A session in the consistency group so that data
Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication
269
is consistent. SRDF/A MSC supports RDF daemon to be enabled on a single host or on multiple hosts. The RDF daemon is responsible for coordinating cycle switch between different SRDF/A session in the consistency group so that data is consistent. SRDF/A MSC supports RDF daemon to be enabled on a single host or on multiple hosts. It is recommended that you enable the daemon on multiple hosts. Figure 5-14 Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication Host
Primary Symmetrix Arrays
Host
Secondary Symmetrix Arrays
I/O
Symmetrix A
Symmetrix B
Symmetrix C
Symmetrix D
Apply cycle
Apply cycle
N-2
N-2
SRDF/A Delta set Capture
Capture
Cycle N
Cycle N
Transmit
Writes apply to R2 device
Transfer writes to R2
Cycle N-1
Transmit Cycle N-1
Transfer writes to R2
SRDF Links
Receive cycle
Receive Writes R2 device
N-1
Receive cycle
Receive Writes R2 device
N-1
Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Multi-Session Consistency (MSC) Data Replication The following sections describe the steps for building a composite group and package configuration in a SRDF/Asynchronous MSC environment. Following are the topics discussed in this chapter: • • • 270
Building a Composite Group for SRDF/Asynchronous MSC Configuring a package using SRDF/Asynchronous MSC Setting up the RDF Daemon
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Building a Composite Group for SRDF/Asynchronous MSC To perform an operation on a composite group for SRDF/Asynchronous MSC data replication, the composite group must be configured with devices that are SRDF/Asynchronous capable within the RDF group. Use the following steps to create a composite group. 1.
List SRDF/Asynchronous capable devices on the source Symmetrix unit and be sure the SRDF/Asynchronous capable devices are mapped to RDF group for use For example, use the following command to display the devices from RDF (RA) group number 5: # symrdf -sid symid list -rdfa -rdfg 6 Symmetrix ID: 000187400684 Local Device View -------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv --------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --0196 0012 R1:5 RW RW RW S.. 0 0 RW WD Synchronized 0197 0013 R1:5 RW RW RW S.. 0 0 RW WD Synchronized 0198 0014 R1:5 RW RW RW S.. 0 0 RW WD Synchronized
2.
Create a composite group for MSC. For example, the group name MSCcg. On R1 site, run the following command: # symcg create MSCcg -type RDF1 –rdf_consistency On R2 site, run the following command: # symcg create MSCcg -type RDF2 –rdf_consistency
3.
All devices from RDF (RA) groups configuration are added to the composite group for SRDF/Asynchronous MSC operation. For example, the RDF groups 6 and 7 are added to the composite group MSCcg for SRDF/Asynchronous MSC operation. # symcg -cg MSCcg -rdfg 6 addall pd # symcg -cg MSCcg –rdfg 7 addall pd
4. 5.
Repeat the steps 1-3 on each host that needs to run Serviceguard packages. Query the composite group MSCcg to display the R1-to-R2 setup and the state of the SRDF/Asynchronous device pairs. # symrdf -cg MSCcg query -rdfa Following is a sample output of the command: Source (R1) View Target (R2) View MODES ------------------------------------- ----- -------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE ------------------------ -- ------------------ ----- -----------
Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication
271
DEV001 0196 RW 0 0 RW 0012 WD 0 0 S... Synchronized DEV002 0197 RW 0 0 RW 0013 WD 0 0 S... Synchronized DEV003 0198 RW 0 0 RW 0014 WD 0 0 S... Synchronized Source (R1) View Target (R2) View MODES ----------------------- ------------------ ----- ---------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv Dev E Tracks Tracks S Dev E Tracks Tracks MDAC ------------------- -- ---------------- ----- -----------DEV001 01B6 RW 0 0 RW 0326 WD 0 0 S... DEV002 01B7 RW 0 0 RW 0327 WD 0 0 S... DEV003 01B8 RW 0 0 RW 0328 WD 0 0 S... DEV004 01B9 RW 0 0 RW 0329 WD 0 0 S... DEV005 01BA RW 0 0 RW 032A WD 0 0 S... DEV006 01BB RW 0 0 RW 032B WD 0 0 S... DEV007 01BC RW 0 0 RW 032C WD 0 0 S... DEV008 01BD RW 0 0 RW 032D WD 0 0 S...
6.
RDF PairDevice STATE Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized
Set the composite group to Asynchronous mode. # symrdf -cg MSCcg set mode async
7.
Enable consistency protection to ensure data consistency on R2 side for the SRDF/Asynchronous devices in the composite group. # symrdf -cg MSCcg enable
8.
If the SRDF pairs are not in a consistent state at this point, initiate an establish command to synchronize the data on the R2 side from the R1 side. The device state will beSyncInProg until the consistent status is reached. # symrdf -cg MSCcg establish # symrdf -cg MSCcg query -rdfa RDFA MSC Info { MSC Session Status : Active Consistency State : CONSISTENT } Source (R1) View Target (R2) View MODES ------------------ --------------------- ----- -----------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE -------------- -- ----------------------- ----- -----------DEV009 0196 RW 0 0 RW 005A WD 0 0 A... DEV010 0197 RW 0 0 RW 005B WD 0 0 A... DEV011 0198 RW 0 0 RW 005C WD 0 0 A... Source (R1) View Target (R2) View MODES ----------------------------------- ----- -----------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE -------------- -- ----------------------- ----- -----------DEV001 01B6 RW 0 0 RW 0326 WD 0 0 A..X DEV002 01B7 RW 0 0 RW 0327 WD 0 0 A..X DEV003 01B8 RW 0 0 RW 0328 WD 0 0 A..X DEV004 01B9 RW 0 0 RW 0329 WD 0 0 A..X DEV005 01BA RW 0 0 RW 032A WD 0 0 A..X DEV006 01BB RW 0 0 RW 032B WD 0 0 A..X
272
RDF PairDevice
Consistent Consistent Consistent
RDF PairDevice
Consistent Consistent Consistent Consistent Consistent Consistent
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
DEV007 DEV008
01BC RW 01BD RW
0 0
0 RW 032C WD 0 RW 032D WD
0 0
0 A..X 0 A..X
Consistent Consistent
Configuring a Package using SRDF/Asynchronous MSC The following section describes configuring a package using SRDF/Asynchronous MSC for first-time installation or pre-existing installations. Initial installation of Metrocluster with EMC SRDF using SRDF/Synchronous If you are installing Metrocluster with EMC SRDF using SRDF/Synchronous for the first time, complete the following procedure: 1. 2.
Copy the template file from /opt/cmcluster/toolkit/SGSRDF/srdf.env to the package directory. Change the value of the RDF_MODE variable to Asynchronous. RDF_MODE=async
3.
Change the value of the CONSISTENCYGROUPS variable to 1. CONSISTENCYGROUPS = 1
Metrocluster with EMC SRDF is already installed If Metrocluster SRDF is already installed and the Serviceguard applications use SRDF/Asynchronous data replication, you must make the following changes in the srdf.env file: 1.
Change the value of the CONSISTENCYGROUPS variable to 1. CONSISTENCYGROUPS = 1
2.
Clear the value set for the SYMCLI_MODE environment variable, if set previously.
Setting up the RDF Daemon The cycle switch process required for SRDF/A MSC is provided by the Solutions Enabler software executing a RDF daemon that implements the MSC functionality. You can enable or disable this RDF daemon on each host using the SYMAPI_USE_RDFD option in the SYMAPI file. The use of this RDF daemon is enabled or disabled on each host using the SYMAPI options file. The default value of the SYMAPI_USE_RDFD option is Disable. To enable the RDF daemon, make the following changes: #cd /var/symapi/config# vi optionsSYMAPI_USE_RDFD = ENABLE Setting this option to ENABLE activates the RDF daemon for SRDF/A MSC. It is recommended that you enable the daemon on multiple hosts.
Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication
273
Starting and Stopping the Daemon There are multiple ways in which you can start the RDF daemon. You must start the daemon on all nodes in the cluster. If you have enabled the RDF daemon, Solutions Enabler software automatically starts the daemon. Alternatively, you can manually start the daemon using the stordaemon command: stordaemon start storrdfd [-wait Seconds] By default, the stordaemon command waits 30 seconds to verify that the daemon is running. To override this default value, use the -wait option. In addition, you can set the daemon to start automatically every time the local host is started. You can set this condition for the daemon using the following command: stordaemon install storrdfd -autostart To stop the RDF daemon, run the following command: stordaemon stop storrdfd [-wait Seconds]
Building a Continental Cluster Solution with EMC SRDF The following section describes how to configure a continental cluster solution using EMC SRDF, which requires the Metrocluster with EMC SRDF product.
Setting up a Primary Package on the Primary Cluster Use the procedures in this section to configure a primary package on the primary cluster. Consult the Managing Serviceguard user’s guide for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. 1.
2. 3.
If this was not done previously, split the EMC SRDF logical links for the disks associated with the application package. See the script, Samples/pre.cmquery (edit to the SRDF groups configured) for an example of how to automate this task. The script must be customized with the Symmetrix device group names. Create and test a standard Serviceguard cluster using the procedures described in the Managing Serviceguard user’s guide. Install Continentalclusters on all the cluster nodes in the primary cluster (Skip this step if the software has been pre installed) NOTE:
Serviceguard should already be installed on all the cluster nodes.
Run swinstall(1m)to install Continentalclusters and Metrocluster with EMC SRDF products from an SD depot. 4.
When swinstall(1m) has completed, create a directory as follows for the new package in the primary cluster. # mkdir /etc/cmcluster/
274
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
5.
Copy the environment file template /opt/cmcluster/toolkit/SRDF/ srdf.env to the package directory, naming it pkgname_srdf.env: # cp /opt/cmcluster/toolkit/SGSRDF/srdf.env \ /etc/cmcluster/pkgname/pkgname_srdf.env
6.
Create an Serviceguard Application package configuration file. # cd /etc/cmcluster/ # cmmakepkg -p .conf Customize it as appropriate to your application. Be sure to include Node names, and the path name of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Also change AUTO_RUN (PKG_SWITCHING_ENABLED in Serviceguard A.11.09) to NO. This will ensure that the application packages will not start automatically. (the ccmonpkg will be set to yes) Define the service (as required)
7.
Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater.
8.
9.
Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Edit the environment file _srdf.env as follows: a. Add the path where the EMC Solutions Enabler software binaries have been installed to the PATH environment variable. The default location is /usr/ symcli/bin. b. Uncomment AUTO*environment variables. It is recommended to retain the default values of these variables unless there is a specific business requirement to change them. See Appendix B for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to /etc/cmcluster/. d. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device group names given in the ’symdg list’ command. The DEVICE_GROUP variable may also contain the consistency group name if using a M by N configuration. e. Uncomment the RETRY and RETRYTIME variables. The defaults should be used for the first package. The values should be slightly different for other packages. Building a Continental Cluster Solution with EMC SRDF
275
RETRYTIME should increase by two seconds for each package. The product of RETRY * RETRYTIME should be approximately five minutes. These variables are used to decide how often and how many times to retry the Symmetrix status commands. For example, if there are three packages with data on a particular Symmetrix pair (connected by SRDF), then the values for RETRY and RETRYTIME might be as follows: Table 5-3 RETRY and RETRYTIME Values RETRYTIME
RETRY
pkgA
60
5
pkgB
43
7
pkgC
33
9
f. Uncomment the CLUSTER_TYPE variable and set it to “continental”. g. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application. 10. Edit the remaining control script variables (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application as it runs on the primary cluster. See the Managing Serviceguard manual for more information on these variables. 11. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Serviceguard manual for more information on these functions. 12. Distribute EMC SRDF package configuration, environment, and control script files to other nodes in the primary cluster by using ftp or rcp. # rcp -p /etc/cmcluster//.cntl \ other_node:/etc/cmcluster//.cntl When using ftp, be sure to make the file executable on any destination systems. 13. Verify that each host in both clusters in the continental cluster has the following files in the directory /etc/cmcluster/: • • • •
.cntl (EMC SRDF package control script) .conf (Serviceguard package ASCII config file) .sh (Package monitor shell script, if applicable) _srdf.env (Metrocluster EMC SRDF environment file)
14. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmqueryfor an example of how to automate this task. The script must be customized with the Symmetrix device group names. 276
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
15. Apply the Serviceguard configuration using the cmapplyconf command or SAM. 16. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and failover. 17. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapply (after recovery cluster is completed in next section) for an example of how to automate this task. The script must be customized with the Symmetrix device group names. The primary cluster is now ready for the Continentalclusters operation.
Setting up a Recovery Package on the Recovery Cluster The installation of EMC SRDF, Serviceguard, and Continentalclusters software is exactly the same as in the previous section. The procedures below will install and configure a recovery package on the recovery cluster. Consult the Managing Serviceguard user’s guide for instructions on setting up a Serviceguard cluster (that is, LAN, VG, LV,...etc). 1.
2.
Split the EMC SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be edited to refer to the SRDF groups configured and customized with the Symmetrix device group names. Generate a cluster ASCII file. # cmquerycl -n node1 -n node2 -C CClusterNY.ascii Edit the file CClusterNY.ascii. Be sure to select a primary cluster lock disk that is not a lock disk on the recovery cluster. Edits include spreading HEARTBEAT_IP on all user LANs, and setting MAX_PACKAGES.
3.
Check the configuration. # cmcheckconf -C CClusterNY.ascii
4.
Create the cluster binary. # cmapplyconf -C CClusterNY.ascii
5.
Test the cluster. # cmruncl -v # cmviewcl -v Does the cluster come up? If so, then stop the cluster: # cmhaltcl -f
6.
Copy the package files from the primary cluster to a bkpkgXXX directory, and rename it to .cntl and _srdf.env.
Building a Continental Cluster Solution with EMC SRDF
277
Edit the recovery package control file from the primary cluster for the secondary cluster. Change the subnet, relocatable IP, and nodes. Be sure to set AUTO_RUN to NO in the package ASCII file. 7.
Edit the recovery package environment file _srdf.env as follows: a. Add the path for EMC Solutions Enabler software binaries. b. Make sure that all AUTO* variables are uncommented. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to /etc/cmcluster/. d. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device group names given in the symdg list command. The DEVICE_GROUP variable may also contain the consistency group name if using a M by N configuration. e. Uncomment the RETRY and RETRYTIME variables. f. Make sure the CLUSTER_TYPE variable is set to “continental”. g. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application.
8.
Edit the remaining application package control script variables in the package control script (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these variables. Change the Subnet IP from ftp copy. Verify that each host in both clusters in the continental cluster has the following files in the directory /etc/cmcluster/:
9.
.cntl (continental cluster package control script) .conf (Serviceguard package ASCII config file) .sh (Package monitor shell script, if applicable) _srdf.env (Metrocluster SRDF environment file) 10. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names. 11. Apply the Serviceguard configuration using the cmapplyconf command or SAM for the recovery cluster. 12. Test the cluster and packages. # cmruncl # cmmodpkg -e bkpkgCCA # cmviewcl -v 278
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Note that cmmodpkg is used to manually start the application packages. Do all application packages start? If so, then issue the following command. # cmhaltcl -f NOTE: Application packages cannot run on R1 and R2 at the same time. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted to prevent data corruption. 13. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapply for an example of how to automate this task. The script must be customized with the Symmetrix device group names. The recovery cluster is now ready for continental cluster operation.
Setting up the Continental Cluster Configuration The procedures below will configure Continentalclusters and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2: “Designing a Continental Cluster”. 1.
2.
Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names. Generate the Continentalclusters configuration using the following command: # cmqueryconcl -C cmconcl.config
3.
Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. Note that when data replication is done using EMC SRDF, there are no data sender and receiver packages. Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step.
4. 5.
Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/ scripts to /etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The monitor package should start automatically when the cluster is formed.
Building a Continental Cluster Solution with EMC SRDF
279
6.
Apply the monitor package to both cluster configurations. # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config
7.
Restore the logical SRDF links for the package. See the script Samples/ post.cmapplyfor an example of how to automate this task. The script must be customized with the appropriate Symmetrix device group names. Example: # Samples/post.cmapply
8.
Generate the cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/ cmclconfig nor is there an equivalent file for Continentalclusters. Example: # cmapplyconcl -C cmconcl.config
9.
Start the monitor package on both clusters. The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status.
10. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. 11. Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster. 12. View the status of the continental cluster primary and recovery clusters, including configured event data. # cmviewconcl -v The continental cluster is now ready for testing. See Chapter 2: “Designing a Continental Cluster”, section “Testing the Continental Cluster” (page 91).
Switching to the Recovery Cluster in Case of Disaster It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following: • • •
During an alert, the cmrecovercl will not start the recovery packages unless the -foption is used. During an alarm, the cmrecovercl will start the recovery packages without the -foption. When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up. Verify SRDF links are Up.
280
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
# symrdf list
Failback Scenarios There is no failback counterpart to the “pushbutton” failover from the primary cluster to the recovery cluster. Failback is dependent on the original nature of the failover, the state of primary and secondary Symmetrix SRDF volumes (R1 and R2) and the condition of the primary cluster. In Chapter 2: “Designing a Continental Cluster”, there is a discussion of failback mechanisms and methodologies in the section “Restoring Disaster Tolerance” (page 99). The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following discussion addresses some of those failures and suggests recovery approaches applicable to the environments using data replication provided by Symmetrix Disk Arrays and Symmetrix Remote Data Facility SRDF. Scenario 1 The primary site has lost power, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the Symmetrix or the operating systems of the systems at the primary site. After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the status of the R1 and R2 paired group volumes. The command symrdflist will display status of the device group. Source (R1) View Target (R2) View MODES ------------------------------------------------------- ----- -----------ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE -------------------------------- -- ------------------------ ----- -----------DEV001 DEV002
009F WD 00A0 WD
0 0
0 NR 00A5 RW 0 NR 00A6 RW
0 0
0 S.. 0 S..
Failed Over Failed Over
After power is restored to the primary site, the Symmetrix device groups may be in the status of Failed Over. The procedure to move the application packages back to the primary site are different depending on the status of the device groups. The following procedure applies to the situation where the device groups have a status of “Failed Over”: 1.
Halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg Building a Continental Cluster Solution with EMC SRDF
281
This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the device groups will remain “Synchronized” at the recovery site and “Failed Over” at the primary site. 2. 3.
4.
Halt the cluster, which also halts the monitor package ccmonpkg. Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically. Manually start the Continentalclusters primary packages at the primary site. # cmrunpkg or # cmmodpkg -e The control script is programmed to handle this case. The control script will issue an SRDF failback command to move the device group back to the R1 side and to resynchronize the R1 from the R2 side. Until the resynchronization is complete, the SRDF “read-through” feature will ensure that any reads on the R1 side will be current, by reading data through the SRDF link from the R2 side. NOTE: If the system administrator does not want synchronization performed from the remote (recovery) site, the device groups should be split and recreated manually.
5. 6.
Ensure that the monitor packages at the primary and recovery sites are running. Verify device group is synchronized. # symrdf list
7.
Manually bring the package back if the package does not come up, and the device group status is “failed over.” # symrdf -g pkgCCB_r1 failback Execute an RDF ’Failback’ operation for device group ’pkgCCB_r1’ (y/[n]) ? y An RDF ’Failback’ operation execution is in progress for device group ’pkgCCB_r1’. Please wait... Write Disable device(s) on RA at target (R2)..............Done. Suspend RDF link(s).......................................Done. Merge device track tables between source and target.......Started. Device: 001 ............................................. Merged. Merge device track tables between source and target.......Done. Resume RDF link(s)........................................Done. Read/Write Enable device(s) on SA at source (R1)..........Done. The RDF ’Failback’ operation successfully executed for device group ’pkgCCB_r1’.
282
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
8.
During the resync; the status goes from failed over > invalid > SyncInProg. Example: ftsys1a# symrdf list Symmetrix ID: 000183500021 Local Device View -----------------------------------------------------------------------STATUS M O D E S RDF S T A T E S Sym RDF --------- ------------ R1 Inv R2 Inv----------------------Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair --- ---- ----- --------- ------------ ----------- --- ----------------000 001
000 001
R2:2 RW WD RW R2:2 RW WD RW
SYN SYN
DIS OFF DIS OFF
0 12
0 WD 0 WD
RW WD
Synchronized Invalid
ftsys1a# symrdf list Symmetrix ID: 000183500021 Local Device View ---------------------------------------------------------------------------STATUS M O D E S RDF S T A T E S Sym RDF --------- ------------ R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair --- ---- ----- --------- ------------ ----------- --- ---------------000 001
9.
000 001
R2:2 RW WD RW R2:2 RW WD RW
SYN SYN
DIS OFF DIS OFF
0 2
0 WD 0 WD
RW RW
Synchronized SyncInProg
Halt the recovery cluster and restart it. # cmhaltcl -f (if the cluster is not already down) # cmruncl
10. Verify the data for data consistency and currency. Scenario 2 The primary site Symmetrix experienced a catastrophic hardware failure and all data was lost on the array. After the reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the status of the Symmetrix SRDF paired volumes. Since the systems at the primary site are accessible, but the Symmetrix is not, the control file will evaluate the paired volumes with a local status of “failed over”. The control file script is programmed to handle this condition and will enable the volume groups, mount the logical volumes, assign floating IP addresses and start any processes as coded into the script. After the primary site Symmetrix is repaired and configured, use the following procedure to move the application package back to the primary site.
Building a Continental Cluster Solution with EMC SRDF
283
1.
Manually create the Symmetrix device groups and gatekeeper configurations device groups. Re-run the scripts mk3symgrps* and mk4gatekpr* which do the following: # date >ftsys1.group.list # symdg create -type RDF1 pkgCCA_r1 # symld -g pkgCCA_r1 add pd /dev/rdsk/c7t0d0 # symgate define pd /dev/rdsk/c7t15d0 # symgate define pd /dev/rdsk/c7t15d1 # symgate -g pkgCCA_r1 associate pd /dev/rdsk/c7t15d0
2.
Halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the paired volumes will be SPLIT at both the recovery and primary sites.
3. 4.
5.
Halt the Cluster, which also halts the monitor package ccmonpkg. Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically. Since the paired volumes have a status of SPLIT at both the primary and recovery sites, the EMC views the two halves as unmirrored. Issue the following command: # symrdf -g pkgCCB_r1 failback Since the most current data will be at the remote or recovery site, this command to synchronize from the remote site). Wait for the synchronization process to complete before progressing to the next step. Failure to wait for the synchronization to complete will result in the package failing to start in the next step.
6.
Manually start the Continentalclusters primary packages at the primary site using # cmrunpkg The control script is programmed to handle this case. The control script recognizes the paired volume is synchronized and will proceed with the programmed package startup.
7.
Verify the device group is synchronized. # symrdf list
8.
284
Ensure that the monitor packages at the primary and recovery sites are running.
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Maintaining the EMC SRDF Data Replication Environment Normal Startup The following is the normal Continentalclusters startup procedure. On the primary cluster: 1.
Start the primary cluster. # cmruncl -v The primary cluster comes up with ccmonpkg up. The application packages are down, and ccmonpkg is up.
2.
Manually start application packages on the primary cluster. # cmmodpkg -e
3.
Confirm primary cluster status. # cmviewcl -v and # cmviewconcl -v
4.
Verify SRDF Links. # symrdf list
On the recovery cluster, do the following: 1.
Start the recovery cluster. # cmruncl -v The recovery cluster comes up with ccmonpkg up. The application packages (bkpkgX) stay down, and ccmonpkg is up.
2. 3.
Do not manually start application packages on the recovery cluster; this will cause data corruption. Confirm recovery cluster status. # cmviewcl -v and # cmviewconcl -v
Normal Maintenance There might be situations where a package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Continentalclusters with EMC SRDF data replication:
Building a Continental Cluster Solution with EMC SRDF
285
1.
Shut down the package with the appropriate command. Example: # cmhaltpkg
2.
Distribute the package configuration changes. Example: # cmapplyconf - P (Primary cluster) # cmapplyconf -P (Recovery cluster)
3.
Start up the package with the appropriate Serviceguard command. Example: # cmmodpkg -e (Primary cluster) CAUTION: Never enable package switching on both the primary package and the recovery package.
4.
Halt the monitor package. # cmhaltpkg ccmonpkg
5.
To apply the new continental cluster configuration. # cmapplyconcl -C
6.
Restart the monitor package. # cmrunpkg ccmonpkg
286
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
6 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture This chapter describes Three Data Center architecture through the following topics: • • • • • • •
Overview of Three Data Center Concepts Overview of HP XP StorageWorks Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Configuring an XP Three Data Center Solution HP StorageWorks RAID Manager Configuration Package Configuration in a Three Data Center Environment Failback Scenarios
NOTE: For additional information, refer to the Release Notes for your metropolitan and continental cluster products and the documentation for your storage solution.
Overview of Three Data Center Concepts A Three Data Center solution integrates Serviceguard, Metrocluster Continuous Access XP, Continentalclusters and HP StorageWorks XP 3DC Data Replication Architecture. This configuration provides high availability in a disaster tolerant solution by using the data consistency of synchronous replication and the long distance capability of Continuous Access journaling replication to protect against local and wide-area disasters. A Three Data Center configuration of consists of two Serviceguard clusters. The first cluster, which is a Metrocluster, has two data centers that make up both the Primary data center (DC1) and Secondary data center (DC2). The second cluster, typically located at a long distance from the Metrocluster sites, is the Third Data Center (DC3). These two clusters are configured in a Continentalclusters environment, which is made up of a Metrocluster configured as a primary cluster and the third data center (DC3) cluster is configured as a recovery cluster as shown in Figure 6-1. A Three Data Center configuration uses HP StorageWorks 3DC Data Replication Architecture in order to replicate data over three data centers. Three Data Center provides complete data currency and protects against both local and wide-area disasters. Also, three Data Center concurrently supports short-distance Continuous Access synchronous replication within the Metrocluster, and long-distance Continuous Access journal replication between the Metrocluster and recovery cluster. Figure 6-1 depicts a Three Data Center solution in which DC1 and DC2 are physically configured as a Metrocluster and DC3 is an independent Serviceguard cluster. The entire environment is configured as a Continentalclusters solution. Within the Metrocluster, packages can failover and failback automatically, but the recovery cluster Overview of Three Data Center Concepts
287
DC3 only supports a semi-automatic package failover. Failing back a package from DC3 to DC1 or DC2 is done manually. See “Failback Scenarios” (page 318) for more information on the failback process. Figure 6-1 Three Data Center Solution Overview
MetroCluster Region Single SG Cluster
SG Heartbeat and CA-Sync link
VOL
DWDM XP12000
DC1
VOL
DWDM
VOL
JNL
CAJNL Link
XP12000
DC2
ICAP Converter
JNL
FC/IP Converter
XP12000
DC3
Continental Region
The Three Data Center solution provides the following benefits: •
• •
•
288
Maintains high performance. Using synchronous replication over a short distance in a Metrocluster environment provides the highest level of data currency and application availability without significant impact to application performance. Allows swift recovery. Metrocluster implementation allows for fast automated failovers after a local area disaster occurred. Allows recovery even when a disaster exceeds regional boundaries or extended duration. A wide-area disaster could disable both data centers DC1 and DC2, but with semi-automatic functionality the operations can be shifted to DC3 and continue unaffected by the disaster. Allows for additional staff at the remote data center outside the disaster area. A wide-area disaster affects people located within the disaster area, both professionally and personally. By moving operations out of the main data centers
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
to a remotely located recovery data center, operational responsibilities shift to people not directly affected by the disaster.
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP A Three Data Center configuration uses a disaster tolerant architecture made up of two data centers which are located locally in a Metrocluster and a third data center located remotely. These form separate Serviceguard clusters, which are configured in a Continentalclusters configuration. This solution is designed to only work with the HP StorageWorks XP Disk Arrays. Primary Data Center (DC1) contains one or more HP-UX servers that are connected to one XP Disk Array located in DC1. The Secondary Data Center (DC2) contains an equal number of HP-UX servers connected to a second XP Disk Array. Continuous Access Synchronous data replication must be established to replicate data between DC1 and DC2. The distance between DC1 and DC2 is limited by Serviceguard heartbeat latency requirements or Continuous Access Sync distance requirements, whichever is smaller. When DC1 and DC2 form a Metrocluster a third site is required where arbitrator and Quorum nodes need to be kept. The arbitrator nodes are needed in order to meet quorum resolution requirements during cluster reformation when all heartbeat networks fail between DC1 and DC2. DC1, DC2 and the arbitrator site must be in the same subnet according to the Serviceguard network. In a Continentalclusters environment, the Metrocluster would be the primary cluster for packages configured in a three data center solution. The Third Data Center (DC3), which is normally located at a long distance from the Metrocluster sites, contains one or more HP-UX servers connected to a third XP Disk Array. These HP-UX servers form a separate Serviceguard cluster and require a quorum server or cluster lock disk. Continuous Access Journal data replication must be established between one of the XP Disk Arrays located in the Metrocluster and the XP Disk Array located in DC3. In a Continentalclusters environment, DC3 is the recovery cluster for packages configured in a three data center solution. It is recommended to maintain a consistent copy of the volume at the remote site, using HP StorageWorks Business Copy XP (BC-XP). This is particularly useful in case of a rolling disaster, which is a disaster that occurs before the cluster is able to recover from a non-disastrous failure. An example is a data replication link that fails, then, as it is being restored and data is being resynchronized, a disaster causes the primary data center to fail resulting in an incomplete resynchronization and inconsistent data at the remote data center. In the case of a rolling disaster, Metrocluster Continuous Access XP and XP Continuous Access software are able to detect the data is inconsistent and do not allow the application package to start. A good copy of the data must be restored before restarting the application. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
289
The following are additional disaster tolerant architecture requirements for a Three Data Center solution: •
•
•
In the disaster tolerant cluster architecture, it is expected that each Metrocluster data center is self-contained such that the loss of one data center does not cause the entire cluster to fail. It is important that all single points of failure (SPOF) be eliminated so that surviving systems continue to run in the event that one or more systems fail. It is also expected that the IP network and SAN equipment between and within the data centers are redundant and routed in such a way that the loss of any one component does not cause the IP network or SAN to fail. Exclusive activation must be used for all LVM volume groups or VxVM disk groups associated with packages that use the XP Disk Array.
The following are restrictions in a Three Data Center solution: •
• • •
Shared LVM, CVM, and CFS are not supported. The design of the Three Data Centers solution assumes that only one system in the cluster will have a VG activated at any time. Multi-instance applications are not supported. Device Group Monitor support is not available. However, packages configured for two data centers can still use the Device Group monitor feature. Continentalclusters bi-directional recovery is not supported.
Figure 6-2 shows a typical configuration of a Disaster Tolerant Three Data Center architecture.
290
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
Figure 6-2 Three Data Center Architecture Arbitrator or QS Node
DC1 NS
Arbitrator 0r Quorum server node-third location
NS
NS
QS Node
DC2 NS
WAN
WAN
Node A
Node B
Node C D W D M
FCS
FCS
Storage
Cluster Heartbeat network and storage connections redundant DWDM Links routed differently
DC3
NodeDB
D W D M
NS
NS
FC/IP Converter FCS
FCS
Storage
Node C
NodeDB
FCS
FCS
FC/IP Converter
Data Replication over IP
Storage
Overview of HP XP StorageWorks Three Data Center Architecture HP XP StorageWorks Three data center architecture enables data to be replicated over three data centers concurrently using a combination of Continuous Access Synchronous and Continuous Access Journaling data replication. In a XP 3DC design there are two configurations available; Multi-Target and Multi-Hop. The XP 3DC configuration can switch between the Multi-Target and Multi-Hop configurations at any time during a normal operation. These configurations may be implemented with either two or three Continuous Access links between the data centers. In the case of two Continuous Access links, one link is a Continuous Access Sync and the other is a Continuous Access Jnl data replication link. As both supported configurations use two Continuous Access links, they are also referred to as Multi-Hop-Bi-Link and Multi-Target-Bi-Link. Whether the configuration is multi-hop or multi-target is determined by two factors: where data enters the system (that is, where the application is running) and in what direction the data flows between the XP arrays. XP 3DC Multi-Target Bi-Link Configuration In an XP 3DC Multi-Target Bi-Link configuration the data enters the system on a specific XP array and is replicated into multiple directions. One direction is the synchronous
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
291
replication to the XP array in DC2, and the other is the journaling replication to the XP array in DC3. As shown in Figure 6-3, the data is replicated from DC1 to DC2 using Continuous Access Synchronous. The data is also replicated to DC3 using Continuous Access Journaling. Both Continuous Access-Sync and Continuous Access-JNL replication pairs can remain in active (PAIR) status at all times. Figure 6-3 XP Three Data Center Multi-Target Bi-Link Configuration Data Replication
292
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
Figure 6-4 3DC Multi-Hop Bi-Link Configuration Data Replication
Three Data Center Multi-Hop Bi-Link Configuration In an XP 3DC Multi-Hop Bi-Link configuration the data enters the system on one XP array, is replicated synchronously to the next XP array, and from there is replicated to the last XP array. Typically, the starting point of the operation indicates the data center or host that runs the application under normal conditions, with the secondary data center being the cluster failover site and the recovery data center being the remote recovery site. As shown in Figure 6-4, data is replicated from DC1 to DC2 using Continuous Access Synchronous. The data is then automatically recorded on DC2 and replicated to DC3 using Continuous Access Journaling. Both Continuous Access-Sync and Continuous Access-JNL replication pairs can remain in active (PAIR) status at all times. No point-in-time operations or scripting operation is necessary to keep data on DC3 up-to-date and available.
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
293
Determining whether to setup a Multi-Target or Multi-Hop solution depends on your environment and business requirements. Normally, the system that runs the application the majority of the time, will determine the configuration. However, during a failover on the synchronous replication pair, a switch from Multi-Hop to Multi-Target or from Multi-Target to Multi-Hop occurs. For example, a package initially running on DC1 fails over to DC2, then the data source becomes DC2 in a Multi-Hop scenario. In this case, the data replication is altered to be DC2 -> DC1 and DC2 -> DC3, which changes from Multi-Hop, to Multi-Target Data replication. There are no recommendations on whether to use Multi-Hop rather than Multi-Target Data Replication. Both configurations have their own advantages and disadvantages. For additional documentation refer to HP StorageWorks XP 3DC Data Replication manuals available at www.docs.hp.com. HP StorageWorks Mirror Unit Descriptors Using the XP 3 Data Center Architecture, a volume can be configured to be replicated to up to seven other volumes at a time; three copies can be used for BC replication, three copies for Journal replication and one copy for Sync/Async/Journal replication. A mirror unit descriptor (MU#) is a special index number available with all volumes that provides an individual designator for each copy of the volume. The mirror unit descriptor is provided in the Raid Manager configuration files to indicate the nature of the copy. Out of seven mirror unit descriptors, three of these MU#'s are for local replication copies using Business Copy XP, and are represented in the HP StorageWorks RAID Manager XP (RM) configuration file by the values 0, 1, and 2. The RM configuration file assumes MU# of 0 when no MU# is specified in the configuration file. However to avoid confusion, the 0 should be explicitly defined in the configuration for Business Copy XP. The fourth MU # can be used for either Continuous Access XP Sync, or Continuous Access XP Async, or Journal. When it is used for Sync/Async, the MU# can be either 0 or left blank. However, it is always left blank in a Three Data Center environment. The remaining three MU#'s are for Continuous Access-XP Journal replication pairs only. These MU#'s are represented by h1, h2, and h3 values in the RM configuration file. The XP12000 and XP10000 only support one Continuous Access-XP Journal pair at any point in time per volume, and this one pair can use any of the four Continuous Access-XP Journal MU#'s. With XP arrays you can use Continuous Access-XP Journal in combination with Continuous Access-XP Sync to create two independent copies of the same source device to two different target devices. When creating this configuration the Continuous Access-XP Sync replication must use MU# 0 and the Continuous Access-XP Journal replication must use one of the remaining three MU# for remote replication.
294
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
Figure 6-5 shows all the available mirror unit descriptors for each data device, as well as the value to use in the HP StorageWorks RAID Manager XP (RM) configuration file to identify the specific replication instance for the device and any environment variables (for example, HORCC_MRCF) necessary to address the copy using RAID Manager XP. Figure 6-5 Mirror Unit Descriptors CA-MU#0 can be used for either for CA Sync/Async or CA-Jnl
CA-MU#0
sync ync/A CA S l CA J n
Jnl MU#2 P-jnl Jnl MU#3 P-jnl BC MU#0 BC MU#1 BC MU#2
S-jnl
P-jnl
Jnl MU#1 P-jnl
P-VOL
S-VOL RM MU#0 or omitted
CA Jnl
S-jnl
S-VOL
RM MU#h1
S-jnl
S-VOL
RM MU#h2
CA Jnl CA Jnl
S-jnl
S-VOL
RM MU#3h3
BC1
BC 2 BC 3
S-VOL
S-VOL
S-VOL
RM MU#0 or omitted HORCC_MRCF=1
RM MU#1 HORCC_MRCF=1 RM MU#2 HORCC_MRCF=1
Figure 6-6 depicts a typical Three Data Center pair configurations with MU# usage in Multi-Target and Multi-Hop topologies.
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
295
NOTE: The MU# h2 device group pair must be defined in the XP Three Data Center configuration, since it is being used as a bridge for the remote site pair state query. The MU# h2 device group pair is referred to as a Phantom Device Group since there has not been a physical Continuous Access link established. Figure 6-6 Mirror Unit Descriptor Usage
Configuring an XP Three Data Center Solution After the hardware set up is completed for all three data centers including the data replication links between data centers according to Multi-Hop-Bi-Link or Multi-Target-Bi-Link configuration the next step is the software installation and configuration. The cluster software used in a Three Data Center solution includes Serviceguard clusters, Metrocluster Continuous Access XP, Continentalclusters, and HP StorageWorks RAID Manager.
296
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
The following steps describe the process for configuring an XP Three Data Center Solution: 1. 2. 3. 4. 5. 6. 7.
Creating the Serviceguard Clusters Creating the Continental Cluster Creating the RAID Manager Configuration Creating Device Group Pairs LVM Volume Groups Configuration VxVM Configuration Package Configuration in a Three Data Center Environment
Creating the Serviceguard Clusters Install Serviceguard on all nodes participating in the 3DC solution. This would include arbitrator nodes if using this arbitration method. As previously described, 3DC solution includes two Serviceguard clusters. The resources in two data centers (Primary and Secondary sites) are managed by one cluster, and the third data center are managed by another cluster. Install Metrocluster Continuous Access XP on all nodes participating in the 3DC configuration. Create the clusters according to the process described in the Managing Serviceguard user's guide. For the Primary and Secondary data centers, create a single Serviceguard cluster with components on two sites and arbitrator nodes. In a 3DC solution all the packages in this cluster would be able to failover and failback automatically, which acts as a primary cluster in a Continentalclusters environment for the configured packages. Create another Serviceguard cluster with components in the third data center as described in the Managing Serviceguard user's guide. This cluster will act as a recovery cluster in the Continentalclusters environment. Creating the Continental Cluster Install Continentalclusters software on all nodes participating in the 3DC solution. To configure the continental cluster follow the process described in Chapter 2: “Designing a Continental Cluster”. Apply the continental cluster configuration. Package recovery groups can be added once all the package configurations are added to the primary and recovery clusters.
HP StorageWorks RAID Manager Configuration XP RAID Manager host based software is used to create and manage the device group pairs in a three data center configuration. The following section describes the RAID manager configuration process: Creating the RAID Manager Configuration Use the following steps to create the RAID manager configuration: Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
297
1.
Ensure that the XP Series disk arrays are correctly cabled to each host system that will run packages whose data reside on the arrays. Each XP Series disk array must be configured with redundant Continuous Access links, each of which is connected to a different LCP or RCP card. When using bi-directional configurations, where data center A is a backup for data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which you want to allow failback.
2.
Edit the /etc/services file, adding an entry for the Raid Manager instance to be used with the cluster. The format of the entry is: horcm /udp For more detail, see file/opt/cmcluster/toolkit/SGCA/Samples/ services.example
3.
4.
Use the ioscan command to determine what devices on the XP disk array have been configured as command devices. There must be two command devices; a primary one and a secondary one. Copy the default Raid Manager configuration file to an instance-specific name. # cp /etc/horcm.conf /etc/horcm0.conf
5.
Create a minimum Raid Manager configuration file by editing the following fields in the file created in the previous step: • HORCM_MON-enter the host-name of the system on which you are editing and the TCP/IP port number specified for this Raid Manager instance in the /etc/ services file. • HORCM_CMD-enter the primary and alternate link device file names for both the primary and redundant command devices (for a total of four raw device file names).
6.
If the Raid Manager protection facility is enabled, set the HORCPERM environment variable to the pathname of the HORCM permission file, then export the variable. # export HORCMPERM=/etc/horcmperm0.conf If the Raid Manager protection facility is not used or disabled, export the HORCPERM environment variable. # export HORCMPERM=MGRNOINST
7.
Start the Raid Manager instance by using horcmstart.sh # horcmstart.sh 0
8.
Export the environment variable that specifies the Raid Manager instance to be used by the Raid Manager commands, such as with the POSIX shell type. # export HORCMINST=
298
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
For example: # export HORCMINST=0 Next, use Raid Manager commands to get further information from the disk arrays. Verify the software revision of the Raid Manager and the firmware revision of the XP disk array. # raidqry -l NOTE: Check the minimum requirement level for XP, Raid Manager software, and firmware for your version in the Metrocluster Continuous Access XP Release Notes. To view a list of the available devices on the disk arrays use the raidscan command. The raidscan command must be invoked separately for each host interface connection to the disk array. For example, if there are two Fibre Channel host adapters. # raidscan -p CL1-A # raidscan -p CL1-B NOTE: There must also be alternate links for each device, and these must be on different busses inside the XP disk array. For example, these alternate links may be CL2-E and CL2-F. Unless the devices have been previously paired either on this or another host, the devices will show up as SMPL (simplex). Paired devices will show up as PVOL (primary volume) or SVOL (secondary volume). To identify HP-UX device files corresponding to each device represented by CU:LDEV run the following command: # ls /dev/rdsk/* | raidscan -find -fx NOTE: Only OPEN-V LUNs are supported in a three data center configuration. The ioscan output must be checked to verify which LUNs are OPEN-V LUNs. XP arrays (XP 10000/XP 12000 and beyond) support external attached storage devices to be configured as either P-VOL or S-VOL or both of a Continuous Access pair. From a Continuous Access perspective, there is no difference between a pair created from either internal or external devices.
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
299
Refer to the HP StorageWorks XP documentation for information on the configuration requirements of external storage devices attached to XP arrays and supported external storage devices at www.docs.hp.com. 9.
Determine which devices will be used by the application package. Define a device group that contains all of these devices. For normal Three Data Center operations, a package requires three different device groups for the configuration. For Multi-Hop-Bi-Link and Multi-Target-Bi-Link configurations, two device groups represent real Continuous Access-Sync and Continuous Access-Journal pairs. The third is a “phantom” device group that can be used as a bridge to communicate with the far site. In Raid Manager there is a total of three device groups that are independent, and the management on each is done without the knowledge of the other group. The XP Disk Array implements device sharing rules and fails RAID Manager operations whenever a rule is broken. As it is required to configure three different device groups for a package, it is recommended to follow a naming convention. The Continuous Access Sync device group could be named as dg. The Continuous Access Jnl device group could be named as dg_1 and the phantom device group could be named as dg_p. For example, if an Oracle single instance package is configured in a 3DC environment with Multi-Hop-Bi-Link data replication configuration. dgOracle would be the name of Continuous Access Sync device group between DC1 and DC2 dgOracle_1 would be the name of Continuous Access Jnl device group between DC2 and DC3 dgOracle_p would be the name of phantom device group between DC1 and DC3 Edit the Raid Manager configuration file (horcm0.conf) in the above example to include the devices and device group used by the application package. Only one device group may be specified for all of the devices that belong to a single application package. These devices are specified in the field HORCM_DEV. Also complete the HORCM_INST field, supplying the names of only those hosts that are attached to the XP disk array that is remote from the disk array directly attached to this host. The following are sample RAID Manager configuration files given for each data replication configuration. In the sample configuration files, the device groups names has been simplified for more clarity.
300
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
Multi-Target Raid Manager Configuration For a Multi-Target topology, the DC1, is configured as primary site of an application and is the source of the data replicating to the DC2 and DC3 as shown in Figure 6-7. Figure 6-7 Multi-Target Bi-Link (1:2)
Sample Raid Manager Configuration on a DC1 NodeA (multi-target bi-link) HORCM_MON # ip_address NodeA
service poll(10ms) horcm0 1000
HORCM_CMD # dev_name /dev/rdsk/c6t12d0 HORCM_DEV # dev_group dg
dev_name /dev/rdsk/c9t12d0
dev_name dg_d0
port# CL3-E
timeout(10ms) 3000
dev_name
TargetID 6
LU#
MU# 5
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
301
dg_1
dg_1_d0
CL3-E
6
HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg NodeB.dc2.net
5
h1
service horcm0
# communicate with DC3 nodes dg_1 NodeC.dc3.net
horcm0
Sample Raid Manager Configuration on a DC2 NodeB (multi-target bi-link) HORCM_MON # ip_address NodeB
service horcm0
HORCM_CMD # dev_name /dev/rdsk/c21t8d0
poll(10ms) 1000
timeout(10ms) 3000
dev_name /dev/rdsk/c24t8d0
HORCM_DEV # dev_group dev_name port# dg dg_d0
dev_name
TargetID CL1-A 13
LU#
# phantom device group dg_p dg_p_d0 CL1-A 13
1
HORCM_INST # dev_group ip_address # communicate with DC1 nodes dg NodeA.dc1.net
MU# 1
h2
service horcm0
# communicate with DC3 nodes dg_p NodeC.dc3.net
horcm0
Sample Raid Manager Configuration on a DC3 NodeC (multi-target bi-link) HORCM_MON # ip_address NodeC HORCM_CMD # dev_name /dev/rdsk/c6t2d0
service horcm0
timeout(10ms) 200
dev_name /dev/rdsk/c8t2d0
HORCM_DEV # dev_group dev_name port# dg_1 dg_1_d0 CL2-A # phantom device group dg_p dg_p_d0 CL2-A 0 302
poll(10ms) 1000
dev_name
TargetID 0
LU#
MU# h1
5
5
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
h2
HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg_p NodeB.dc2.net # communicate with DC1 nodes dg_1 NodeA.dc1.net
service horcm0
horcm0
Multi-Hop Raid Manager Configuration Figure 6-8 depicts a Multi-Hop topology where DC1 is configured as the primary site and is the data replicating source to DC2 and DC3. Figure 6-8 Multi-Hop Bi-Link (1:1:1)
Sample Raid Manager Configuration on a DC1 NodeA (multi-hop-bi-link) HORCM_MON # ip_address NodeA
service horcm0
poll(10ms) 1000
timeout(10ms) 3000
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
303
HORCM_CMD # dev_name /dev/rdsk/c6t12d0 HORCM_DEV # dev_group dg
dev_name /dev/rdsk/c9t12d0
dev_name dg_d0
dev_name
TargetID 6
CL3-E
6
# phantom device group dg_p dg_p_d0
port# CL3-E
HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg NodeB.dc2.net
LU# 5
MU#
5
h2
service horcm0
# communicate with DC3 nodes dg_p NodeC.dc3.net
horcm0
Sample Raid Manager Configuration on a DC2 NodeB (multi-hop-bi-link) HORCM_MON # ip_address NodeB
service horcm0
HORCM_CMD # dev_name /dev/rdsk/c21t8d0
poll(10ms) 1000
timeout(10ms) 3000
dev_name /dev/rdsk/c24t8d0
HORCM_DEV # dev_group dev_name port# TargetID dg dg_d0 CL1-A 13 dg_1 dg_1_d0 CL1-A 13 1
dev_name
LU#
MU# 1 h1
HORCM_INST # dev_group ip_address service # communicate with DC1 nodes dg NodeA.dc1.net horcm0 # communicate with DC3 nodes dg_1 NodeC.dc3.net horcm0 Sample Raid Manager Configuration on a DC3 NodeC (multi-hop-bi-link) HORCM_MON # ip_address NodeC HORCM_CMD # dev_name /dev/rdsk/c6t2d0
service horcm0
poll(10ms) 1000
dev_name /dev/rdsk/c8t2d0
timeout(10ms) 200
dev_name
HORCM_DEV 304
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
# dev_group dev_name port# dg_1 dg_1_d0 CL2-A
TargetID 0
# phantom device group dg_p dg_p_d0 CL2-A
0
HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg_1 NodeB.dc2.net # communicate with DC1 nodes dg_p NodeA.dc1.net
LU#
MU#
5
h1
5
h2
service horcm0
horcm0
Alternative to HORCM_DEV An alternative to HORCM_DEV section with Port, Target ID and LUN ID, is to use HORCM_LDEV with XP Storage Serial Number and CU:LDEVs. HORCM_LDEV # dev_group dev_name Serial# #dg dg_0 60095 # The following alternatives # dg dg_0 60095 # dg dg_0 60095
CU:LDEV(LDEV#) MU 01:04 0 are equivalents of the above entry 260 0 0x104 0
The HORCM_LDEV parameters are used to describe stable LDEV# and Serial# as another way of HORCM_DEV used ‘port#, Target-ID, LUN’. This parameter is used to describe the Serial number of XP Storage Array as follows:This parameter is used to describe the LDEV number in an XP Storage Array, and is supported by the three types of formats as follows: •
Specifying CU:LDEV in hex used by SVP or Web console Example for LDEV# 260 01:04
•
Specifying LDEV in hex used by inqraid -fxcommand Example for LDEV# 260 0x104
•
Specifying LDEV in decimal used by inqraid -fxcommand Example for LDEV# 260 260
If the SAN configuration is set up in which all nodes that are connected to a particular XP array, the Target IDs and LUN IDs in HORCM_DEV section will vary depending on hardware paths to the array. You may consider using HORCM_LDEV for more usability and readability, because HORCM_LDEV section will not vary for all nodes connected to a particular XP array. This is because Serial # of XP Array and CU:LDEVs are unique across all nodes.
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
305
11. Restart the Raid Manager instance so that the new information in the configuration file is read. # horcmshutdown.sh # horcmstart.sh 12. Repeat steps 2 through 11 on each host runs this particular application package. If a host runs more than one application package, you must incorporate device group and host information for each of these packages. NOTE: The Raid Manager configuration file must be different for each host, especially for the HORCM_MON and HORCM_INST fields. Creating Device Group Pairs An application configured for an XP Three Data Center solution contains two device groups; Continuous Access-Sync and Continuous Access-Journal device groups. Both device group pair relations must be established prior to the rest of the configurations and normal package operations.
306
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
For a Multi-Hop data replication topology, first create the Continuous Access-Sync pair. Then, create the Continuous Access-Journal pair after completion of the Continuous Access-Sync pair creation. To create the pairs use the following: •
Create Sync pairs of DC1-DC2 device group from any DC1 node: — paircreate -g dg -vl -f data -c 15 or — paircreate -g dg -vl -f never -c 15
•
Create Journal pairs of DC2-DC3 once the DC1-DC2 device group pairs are in “PAIR” state from any DC2 node: — paircreate -g dg_1 -vl -f async -c 15 -jp 2 -js 2
For a Multi-Target data replication topology, the Continuous Access-Sync and Continuous Access-Journal pairs can be created one followed by another. Or they can be created at the same time. Use the following to create the pairs: •
Create Sync pairs of DC1-DC2 device group from any DC1 node. — paircreate -g dg -vl -f data -c 15 or — paircreate -g dg -vl -f never -c 15
•
Create Journal pairs of DC1-DC3 device group from any DC1 node (You can verify whether they are in a “PAIR” state with the pairdisplay command). — paircreate -g dg_1 -vl -f async -c 15 -jp 2 -js 2 NOTE: Paired devices must be of compatible sizes and types. Only OPEN-V LUNs are supported for three data center configuration. NOTE: There is no need to issue the “paircreate” command on phantom device groups.
Identification of HP-UX device files
Before you create volume groups, you must determine the Device Special Files (DSFs) of the corresponding LUNs used in the XP array. To determine the legacy DSFs corresponding to the LUNs in the XP array: # ls /dev/rdsk/* | raidscan -find -fx Following is the output that is displayed: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdsk/c5t0d0 0 F CL3-E 0 0 10053 321 OPEN-3
This output displays the mapping between the legacy DSFs and the CU:LDEVs. In this output the value for LDEV specifies the CU:LDEV without the : mark.
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
307
To determine the agile DSFs that are supported from HP-UX 11i v3 and CU:LDEV mapping information run the following command: # ls /dev/rdisk/* | raidscan -find -fx Following is the output that you will see: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdisk/disk232 0 F CL4-E 0 0 10053 321 OPEN-3
NOTE: There must also be alternate links for each device, and these alternate links must be on different busses inside the XP disk array. For example, these alternate links may be CL2-E and CL2-F. LVM Volume Groups Configuration LVM Volume Groups using the application device group must be created (or imported) in all three data centers cluster nodes. Use the same way to create/import the volume groups as being used in a regular two-site Metrocluster setup. Create and export all LV Groups in one of the DC1 nodes and, import all the Volume Groups for the rest of the three data centers cluster nodes. Use the following procedure to create and export volume groups: 1.
Define the appropriate Volume Groups on all cluster nodes that run the application package. # mkdir /dev/vgxx # mknod /dev/vgxx/group c 64 0xnn0000 Where the VG name and minor number nn are unique for each volume group defined in the node.
2.
Create the Volume Group only on one node in primary data center (DC1). Use the following commands: # pvcreate -f /dev/rdsk/cxtydz # vgcreate /dev/vgname /dev/dsk/cxtydz
3. 4.
Create the logical volume(s) for the volume group on the node, and create any file systems required. Export the Volume Groups on the node without removing the special device files. # vgchange -a n # vgexport -s -p -m Make sure to copy the mapfiles to all of the three data centers nodes.
5.
Import the Volume Groups on all of the other nodes in DC1, DC2 and DC3 and backup the LVM configuration. # mkdir /dev/vgxx
308
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture
# mknod /dev/vgxx/group c 64 0xnn0000 # vgimport -s -m # vgchange -a y # vgcfgbackup # vgchange -a n VxVM Configuration Use the following procedure to create disk groups for VERITAS storage. The VxVM root disk group (rootdg) may need to be created depending on the VxVM version. If rootdg is required, make sure it has already been created on the system while configuring the storage. On one node in the primary data center (DC1) do the following: 1.
Initialize disks to be used with VxVM by running the vxdisksetup command only on one node. # /opt/VRTS/bin/vxdisksetup -i c5t0d0
2.
Create the disk group to be used with the vxdg command only on one node. # vxdg init logdata c5t0d0
3.
Verify the configuration. # vxprint -g logdata
4.
Use the vxassist command to create logical volumes. # vxassist -g logdata make logfile 2048m
5.
Verify the configuration. # vxprint -g logdata
6.
Make the filesystem. # newfs -F vxfs /dev/vx/rdsk/logdata/logfile
7.
Create a directory to mount the volume group. # mkdir /logs
8.
Mount the volume group. # mount /dev/vx/dsk/logdata/logfile /logs
9.
Check if file system exits, then unmount the file system. # umount /logs
10. Deport the disk group on the primary node. # vxdg deport logdata
Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP
309
Package Configuration in a Three Data Center Environment This procedure must be repeated on all the participating nodes for each Serviceguard package. As there are two Serviceguard clusters, packages must be configured individually in each cluster. Customizations include editing a package configuration file and an environment file to set environment variables, and customizing the package control script to include customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Refer to the Managing Serviceguard user's guide for more detailed instructions on how to start, halt, and move packages and their services between nodes within a cluster. 1.
Create a directory /etc/cmcluster/ for each package. # mkdir /etc/cmcluster/
2.
Create a package configuration file. # cd /etc/cmcluster/ # cmmakepkg -p .config Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ .cntl) for the RUN_SCRIPT and HALT_SCRIPTparameters.
3.
4.
In the .config file, list the node names in the order in which you want the package to fail over. It is recommended for performance reasons, to have the package fail over locally first, then to the remote data center. Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time required to obtain status from the XP Series disk array. Create a package control script. # cmmakepkg -s .cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide.
5.
Copy the environment file template /opt/cmcluster/toolkit/SGCA/ xpca.env to the package directory, naming it